CMPT 825 - Spring 2008 - Natural Language Processing

Natural Language Processing (NLP) is the automatic analysis of human language by computer algorithms. This course will focus on text mining and statistical machine translation. These two aspects of NLP will be used to motivate and describe various computational and statistical models of language. The course will be mainly covering statistical machine learning methods for NLP. (This course will be in Area 3).



  1. Homework #1
  2. Homework #2
  3. Homework #3

Note on assignments: All homeworks are optional, so there is no deadline. However, doing the homeworks will probably help you substantially in your project work and in understanding the course material.

All materials for the homeworks will be available from ~anoop/cmpt825


  1. Scribe #1: Ajeet Grewal
  2. Scribe #2: Anton Venema
  3. Scribe #3: Milan Tofiloski
  4. Scribe #4: Javad Safaei
  5. Scribe #5: Mohsen Jamali
  6. Scribe #6: Steve Fagan
  7. Scribe #7: Winona Wu
  8. Scribe #8: Sankaran Baskaran
  9. Scribe #9: Mohammad Norouzi
  10. Scribe #10: Chris Nell
  11. Scribe #11: Louisa Harutyunyan

Scribes will take the lead in presenting the papers we are reading that week on the Wed/Fri class. On Mondays, I will present an introductory class on the topic for that week. The discussion can be led by using the blackboard, or in some cases (if the example is too long to draw on the board) you can use Powerpoint slides or equivalent. Please let me know if you will need the digital projector for any class.

Scribing instructions: you must use LaTeX to create your scribe document. Use scribe.sty as the LaTeX style file. A sample scribe document scribe_sample.tex is provided as an example document. On any of the CS/FAS Unix/Linux machines use the command pdflatex scribe_sample.txt to produce scribe_sample.pdf

Scribe deadline: the scribe notes must be submitted by Wed of the next week following the week being scribed. This will allow discussion of the scribed notes in the Fri class.

Syllabus and Readings

We will cover the following topics in this course. The weekly readings are listed below.


  1. Text Mining
  2. Machine Translation

Weekly Schedule and Readings

  1. Automata models of language: Finite-state transducers
  2. Text Mining with Hidden Markov Models
  3. The EM algorithm
  4. Language Modeling
  5. Machine Translation
  6. Discriminative learning for HMMs

Extra Papers

Papers that we did not have time to read in class but maybe useful in your project work. Most papers are available at the ACL Anthology.

Textbook and References

There is no formal textbook for this course. Most of the reading for this course is posted along with the topics and are research papers which are usually available online. However, if you would like to brush up on some of the basics you should refer to the following books:

Web Links

anoop at