CMPT-825: Natural Language Processing

Instructor: Dr. Anoop Sarkar
Location and Time: AQ 5005, Mon, Wed, Fri 12:30-1:20 PM

Mailing List: cmpt-825 _at_ sfu.ca (always prefix "cmpt-825: " to all messages sent to this list)
Mailing list archives

Office: TASC1 9427
Office hours: Tue, 10:30 AM - 12:00 PM

Natural Language Processing (NLP) is the automatic analysis of human language by computer algorithms. This course will focus on text mining and statistical machine translation. These two aspects of NLP will be used to motivate and describe various computational and statistical models of language. The course will be mainly covering statistical machine learning methods for NLP. (This course will be in Area 3).

Announcements

Grading for the course:

Assignments: 40%
Survey and project proposal paper: 20%
Class participation: 5%
Final project and paper: 35%

Important Dates:

Wed, Sep 8: First day of class
Fri, Nov 19: Proposal for projects due date
Fri, Dec 17: Final project paper and implementation due date
Mon, Dec 6: Last day of class

Assignments

Homework #1. Sep 15 to Oct 1. (Deadline extended to 10/3 on 9/26. Q3 update on 9/24)
Homework #2. Oct 8 to Oct 25.
Homework #3. Oct 25 to Nov 12.
Homework #4. Nov 23 to Dec 6.

Reading for HW4: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. Michael Collins. EMNLP 2002.

All the data and supporting material for the homeworks will be available from ~anoop/cmpt825 on any FAS machine (e.g. oak.fas.sfu.ca)

Syllabus

Language modeling (1 week)
Parsing and syntax (2 weeks)
The EM algorithm (2 weeks)
Statistical machine translation: alignment, decoding algorithms, syntax (3 weeks)
Bayesian methods -- sampling methods, variational methods, non-parametric Bayes (3 weeks)
Global linear models, online learning, randomized/sub-linear algorithms (2 weeks)

Weekly Schedule and Readings

Introduction
- Readings: (9/8)
  - Lillian Lee. I'm sorry Dave, I'm afraid I can't do that: Linguistics, Statistics, and Natural Language Processing circa 2001. The National Academies' study on the Fundamentals of Computer Science.
- Extra Readings:
  - Frederick Jelinek. Five speculations (and a divertimento) on the themes of H. Bourlard, H. Hermansky, and N. Morgan. Speech Communication, Volume 18, Issue 3, May 1996, Pages 242-246
  - Steven Abney. Statistical methods. Encyclopedia of Cognitive Science, Nature Publishing Group, Macmillian.
  - Steven Abney. Statistical Methods and Linguistics. In: Judith Klavans and Philip Resnik (eds.), The Balancing Act: Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA. 1996.
Language Modeling (9/10 to 9/20)
- Readings (9/10 to 9/15):
  - Kevin Knight. Sections 1-14 from Statistical machine translation workbook. manuscript.
  - Stanley Chen and Joshua Goodman. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report TR-10-98, Harvard University, Aug 1998.
- Readings (9/17):
  - Thorsten Brants; Ashok C. Popat; Peng Xu; Franz J. Och; Jeffrey Dean. Large Language Models in Machine Translation. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)
  - David Talbot and Miles Osborne. Smoothed Bloom filter language models: Tera-Scale LMs on the Cheap. EMNLP, Prague, Czech Republic 2007.
- Readings (9/20-9/24):
  - Peter Brown, Peter DeSouza, Robert Mercer, Vincent Della Pietra, and Jenifer C. Lai. Class-based n-gram models of natural language. Computational Linguistics. Volume 18, Number 4, December 1992.
The EM algorithm
- Readings (9/24 to 10/1):
  - Michael Collins. The EM Algorithm. manuscript. 1997.
  - Python code for the three coins problem: three_coins.py
- Readings (10/4):
  - Radford Neal and Geoffrey Hinton. A View of the EM Algorithm that Justifies Incremental, Sparse, and Other Variants. In M. I. Jordan (editor) Learning in Graphical Models, pp. 355-368, Dordrecht: Kluwer Academic Publishers. 1998. (scribe: ziyez)
  - Percy Liang and Dan Klein. Online EM for unsupervised models. North American Association for Computational Linguistics (NAACL), 2009. (scribe: shiyangy)
- Readings (10/6):
  - Kevin Knight and Kenji Yamada. A Computational Approach to Deciphering Unknown Scripts. Proceedings of the ACL Workshop on Unsupervised Learning in Natural Language Processing, 1999. (scribe: aca69)
  - J. Graça, K. Ganchev, and B. Taskar. Expectation Maximization and Posterior Constraints. Neural Information Processing Systems Conference (NIPS), Vancouver, BC, December 2007. (scribe: cwa39)
Hidden Markov models
- Readings (10/8 to 10/15):
  - Lawrence Rabiner. A tutorial on hidden markov models and selected applications in speech recognition Proc. of the IEEE vol. 77, no. 2, Feb 1989.
- Readings (10/20):
  - Regina Barzilay and Lillian Lee. Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. (scribe: yka47)
  - Daniel M. Bikel, Richard Schwartz and Ralph M. Weischedel. 1999. An Algorithm that Learns What's in a Name. Machine Learning Journal: Special Issue on Natural Language Learning. (scribe: mroth)
  - Trond Grenager, Dan Klein, and Christopher D. Manning. Unsupervised Learning of Field Segmentation Models for Information Extraction. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). (scribe: smottish)
- Software: Viterbi and Forward-Backward spreadsheets.
Syntax and Parsing
- Readings (10/18 and then 10/25-10/27):
  - Notes on parsing: #1, #2, #3.
  - Anoop Sarkar. Survey article on statistical parsing manuscript. 2010.
- Readings (Mon 11/1):
  - Joshua Goodman. Parsing Algorithms and Metrics. 34th Annual Meeting of the Association for Computational Linguistics (ACL '96). (scribe: hsadeghi)
  - Slav Petrov; Leon Barrett; Romain Thibaux; Dan Klein. Learning Accurate, Compact, and Interpretable Tree Annotation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. COLING-ACL '96. (scribe: mrazavi)
  - Extra: Fernando Pereira; Yves Schabes. Inside-outside reestimation from partially bracketed corpora. 30th Annual Meeting of the Association for Computational Linguistics (ACL '92).
  - Extra: Takuya Matsuzaki; Yusuke Miyao; Jun'ichi Tsujii. Probabilistic CFG with Latent Annotations. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL '05)
Finite-state transducers
- Notes on FSTs: #1, #2.
- Readings (11/5-11/12):
  - Lauri Kartunnen. Applications of Finite-State Transducers in Natural Language Processing. In Implementation and Application of Automata, Yu, S. and Paun, A. (eds.). Lecture Notes in Computer Science Volume 2088, pages 34-46, Springer Verlag, Heidelberg, 2001.
  - Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech recognition with weighted finite-state transducers. In Larry Rabiner and Fred Juang, editors, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition. Springer-Verlag, Heidelberg, Germany, 2008.
  - OpenFST tutorial at HLT/NAACL 2009
Log-linear models
- Notes on log-linear models: #1
Statistical machine translation
- Readings (11/15-11/19):
  - Kevin Knight. Statistical machine translation workbook. manuscript.
- Readings (11/22-11/26):
  - The Mathematics of Statistical Machine Translation: Parameter Estimation. Peter E Brown; Vincent J. Della Pietra; Stephen A. Della Pietra; Robert L. Mercer. Computational Linguistics, Volume 19, Number 2, June 1993
  - Statistical Machine Translation. Adam Lopez. In ACM Computing Surveys 40(3): Article 8, pages 1:49, August 2008.
- Readings (11/29)
  - Factored Translation Models. Philipp Koehn and Hieu Hoang, EMNLP 2007, pdf. (rahman)
  - A. de Gispert and J.B. Marino. (2008). On the impact of morphology in English to Spanish statistical MT. In Speech Communication, Volume 50, pp. 1034-1046, 2008. (ppatell)
  - Abby Levenberg, Chris Callison-Burch and Miles Osborne. Stream-based Translation Models for Statistical Machine Translation. NAACL, Los Angeles, USA, 2010. (shabnams)
- Readings (12/1)
  - An end-to-end discriminative approach to machine translation. Percy Liang, Alexandre Bouchard-Cote, Dan Klein, Ben Taskar. International Conference on Computational Linguistics and Association for Computational Linguistics (COLING/ACL), 2006. (zvaseqi)
  - A Discriminative Global Training Algorithm for Statistical MT. C Tillmann, T Zhang, Proc. of COLING'06 and ACL'06, 2006. (aasihaer)
  - Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation. George Foster; Cyril Goutte; Roland Kuhn. EMNLP 2010. (aershadi)
Non-parametric Bayes
- Readings (12/3 - 12/6)
  - Bayesian inference with tears. Kevin Knight. Tutorial workbook.
  - Structured Bayesian Nonparametric Models with Variational Inference. Dan Klein and Percy Liang. Tutorial slides from tutorial at ACL 2007.
  - Bayesian Methods for NLP. Hal Daume. Tutorial from HLT/NAACL 2006.
  - Gibbs Sampling for the Uninitiated. Philip Resnik and Eric Hardisty. Univ of Maryland Computer Science Department Technical Report. CS-TR-4956.
  - Parameter estimation for text analysis. Gregor Heinrich. Technical Note. Univ of Leipzig, Germany.
- Readings (for future reference)
  - Thomas L. Griffiths and Alan Yuille (2006). A primer on probabilistic inference. Trends in Cognitive Sciences. Supplement to special issue on Probabilistic Models of Cognition (volume 10, issue 7).
  - Daniel J. Navarro, Thomas L. Griffiths, Mark Steyvers, and Michael D. Lee (2006). Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 50, 101-122.
  - Sharon Goldwater (2006). Nonparametric Bayesian Models of Lexical Acquisition. Unpublished doctoral dissertation, Brown University, 2006. Chapters 2 and 3.

Textbook and References

There is no formal textbook for this course. Most of the reading for this course is posted along with the topics and are research papers which are usually available online. However, if you would like to brush up on some of the basics you should refer to the following books:

Reference Books:

Statistical Language Learning, Eugene Charniak, MIT Press, 1996
Foundations of Statistical Natural Language Processing, Manning and Schuetze, MIT Press, 1999
Speech and Language Processing, Jurafsky and Martin, Prentice Hall, 2000
Fundamentals of Speech Recognition, Rabiner and Juang, Prentice Hall, 1993
Machine Learning, Tom Mitchell, McGraw Hill, 1997
Lectures on Contemporary Syntactic Theories, Peter Sells, CSLI Lecture Notes No. 3, 1985
The Language Instinct, Steven Pinker, William Morrow, 1994

Web Links

anoop at cs.sfu.ca

CMPT 825 - Fall 2010 - Natural Language Processing

Announcements

Assignments

Syllabus

References

Weekly Readings

Announcements

Assignments

Syllabus

Weekly Schedule and Readings

Textbook and References

Web Links