CMPT 825 - Fall 2010 - Natural Language Processing

Natural Language Processing (NLP) is the automatic analysis of human language by computer algorithms. This course will focus on text mining and statistical machine translation. These two aspects of NLP will be used to motivate and describe various computational and statistical models of language. The course will be mainly covering statistical machine learning methods for NLP. (This course will be in Area 3).

Announcements

Assignments

  1. Homework #1. Sep 15 to Oct 1. (Deadline extended to 10/3 on 9/26. Q3 update on 9/24)
  2. Homework #2. Oct 8 to Oct 25.
  3. Homework #3. Oct 25 to Nov 12.
  4. Homework #4. Nov 23 to Dec 6.

All the data and supporting material for the homeworks will be available from ~anoop/cmpt825 on any FAS machine (e.g. oak.fas.sfu.ca)

Syllabus

Weekly Schedule and Readings

  1. Introduction
  2. Language Modeling (9/10 to 9/20)
  3. The EM algorithm
  4. Hidden Markov models
  5. Syntax and Parsing
  6. Finite-state transducers
  7. Log-linear models
    • Notes on log-linear models: #1
  8. Statistical machine translation
  9. Non-parametric Bayes

Textbook and References

There is no formal textbook for this course. Most of the reading for this course is posted along with the topics and are research papers which are usually available online. However, if you would like to brush up on some of the basics you should refer to the following books:

Web Links


anoop at cs.sfu.ca