Michael A. Covington, Ph.D.
Senior Research Scientist
Adjunct Professor of Computer Science
Member of the Linguistics and Engineering Faculties
Associate Director
Institute for Artificial Intelligence
The University of Georgia

Room 111, Boyd Graduate Studies Research Center
Athens, Georgia 30602-7415 U.S.A.

Michael A. Covington > Courses > Natural Language Processing Techniques

CSCI/LING 8570
Natural Language Processing Techniques


About this course

View syllabus

This course is designed for students in the M.S. program in Artificial Intelligence but is also open to other graduate students. It is usual to take CSCI 6540 (Symbolic Programming) the preceding semester.

In 2010, this course was taught using Python and the NLTK. In 2011, it reverted to being Prolog-based, although new material was introduced. It will remain Prolog-based.

This is a course in the hard parts of natural language processing, such as parsing and semantic modeling. There are plenty of shallow statistical methods that you can learn out of books, and I will acquaint you with them, but they aren't the main focus of the course. The main focus of the course is to equip you to implement sophisticated algorithms in Prolog (which is uniquely suitable for some of them), to understand parsing (so that you can look at the output of a parser and judge whether it is correct, and build special-purpose parsers for your own needs), and to understand semantic modeling (how to get from language to knowledge representation).

The historical context is that, from about 1997 to 2007, the whole field shifted toward shallow statistical methods, but now, with IBM's "Watson" and other developments, the "hard parts" are in demand again, as everyone learned that shallow methods can only go so far. That is the rationale for using an older textbook together with a lot of new supplementary material.

Students taking this course should know how to program a computer in Prolog and also in a general-purpose programming language of their own choosing such as Java, C#, or Python. If you do not already know Prolog, you are not prepared — no one has ever successfully "picked up Prolog" while taking this course.

Arrangements will be made for students who took the Python-based version and want to take the Prolog-based version for separate credit as ARTI 8800. Contact me if you want to do this.


Online journals and other literature

Some of this material is accessible only from on campus because it depends on UGA library subscriptions.

ACL publications (Computational Linguistics, ACL Conference Proceedings, etc.)

ACL Computational Linguistics Wiki (reference information and data)

Computational Linguistics

Natural Language Engineering

Literary and Linguistic Computing

Manning, Raghavan, and Schütze, Introduction to Information Retrieval (full text)

Natural Language Toolkit (NLTK) (Python-based)

International Journal of Computer Processing of Oriental Languages

ACM Transactions on Asian Language Information Processing (TALIP)

Index of online journals in UGA libraries

Index of books and printed journals in UGA libraries


Supplemental material for this course

Textbook corrections for Natural Language Processing for Prolog Programmers

Overview:
Some terms and resources
Terminology

Pragmatics

Prolog i/o predicates (to supplement Chapter 2 of Prolog Programming in Depth)

Text statistics

Text classification

Syntax: How do we know which of 2 trees is the right one?

Tagging:
General information
"Cheat sheet" based on Chapter 4

Lemmatization (How to lemmatize English)

The Penn Treebank:
General information
Our local Prolog-adapted version

ProNTo (Prolog Natural Language Tools, mostly student projects from 8570)

Latent Semantic Indexing:
Linear algebra refresher for 8570
Worked example of Latent Semantic Indexing

The R statistical software package:
Download your own copy of R
Using R to Compare Groups
Using R to Detect Changes in Individuals
Using R to Find Correlations
Files and Recordkeeping With R


The remainder of this page will continue like a blog, with materal added day by day.


Jan. 6, 2012 — A suggested linguistics book

If you have not had a linguistics course, I suggest reading a general linguistics book for background. You can do this very cheaply by buying an older edition of a textbook such as Fromkin and Rodman. Click here for some useful listings.


Jan. 17, 2012 — Meet the LINGUIST List

At linguistlist.org you can see, and subscribe to, LINGUIST List, which is a mailing list that often announces conferences and job openings that are of interest to us.