CMPT 882-3 - Statistical Learning of Natural Language

Anoop Sarkar

Office: ASB 10859
Phone: 291-4933
Email: anoop at

CMPT 882-3: Fall 2002
Monday 4:00p - 5:50p in SCB 8662
Wednesday 3:30p - 4:20p in SCB 8662
Office Hrs by appointment (send me email)

In this course we will study basic algorithms that produce state of the art results on tasks involving natural language text. For each of these tasks, we will compare knowledge-rich approaches which use a lot of human supervision to knowledge-poor techniques which use parameter re-estimation or bootstrapping algorithms. We will also compare generative models (models which maximize likelihood of the training data) with discriminative models (models which minimize classification error rate).

For more details: Course Description for CMPT 882-3

Important Dates

  1. Sep 4: First class
  2. Oct 14: No Class: Thanksgiving
  3. Nov 11: No Class: Remembrance Day
  4. Dec 4: Final Project Presentations

Final Projects

Reading List

  1. Introduction to Statistical NLP and Supervised Decision List Learning

    Notes: Lecture #01 pdf

  2. Bootstrapping techniques in learning word meanings: word-sense disambiguation

    Unsupervised Word Sense Disambiguation Rivaling Supervised Methods (1995). David Yarowsky. Proceedings of ACL-95. pp. 189-196

    Notes: Lecture #02 pdf

    Additional Readings:



  3. Comparing Decision Lists to Naive Bayes

    Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval (1998). David Lewis. Proceedings of ECML-98 (10th meeting). pp. 4-15.

    Notes: Lecture #03 pdf

    Additional Readings:


  4. Hidden Markov Models and their Application to Sequence Analysis

    Chapters 3 and 4 of Statistical Language Learning. Eugene Charniak. MIT Press. 1993.

    Notes: Lecture #04 pdf

    Other Sources:

    Additional Readings:


  5. Using the Forward-Backward Algorithm

    Does Baum-Welch Re-estimation help taggers? (1994). David Elworthy. Proceedings of 4th ACL Conf on ANLP, Stuttgart. pp. 53-58.

    Notes: Lecture #05 pdf

    Additional Readings:


  6. Hidden Markov Models for Name Finding

    Nymble: a High-Performance Learning Name-finder (1997). Daniel M. Bikel, Scott Miller, Richard Schwartz, Ralph Weischedel. Proceedings of ANLP-97.

    Notes: Lecture #06 pdf

    Additional Readings:

    Other Applications of HMMs:

  7. The EM algorithm for hybrid models

    EM for hybrid models

    We will look at the use of the EM algorithm (a generalization of the forward-backward algorithm for HMMs) and apply it to the problem of finding the appropriate interpolation weights between a word-based and a part-of-speech based language model. is a simple Perl script that implements this idea.

    Additional Readings:

  8. Discriminative Methods, Transformation-Based Learning

    Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging (1995). Eric Brill. Computational Linguistics, volume 21, number 4, pp. 543-565.

    Notes: Lecture #07 pdf

    Additional Readings:


  9. Discriminative Methods, Maximum Entropy Models

    A simple introduction to maximum entropy models for natural language processing (1997). Adwait Ratnaparkhi. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania.

    A Maximum Entropy Model for Part-of-Speech Tagging (1996). Adwait Ratnaparkhi. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 133-142.

    Notes: Lecture #08 pdf

    Additional Readings:


  10. Hypothesis Testing: Unsupervised Learning of Lexical Knowledge

    Automatic Extraction of Subcategorization from Corpora (1997). Ted Briscoe and John Carroll. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP-97).

    Notes: Lecture #09 pdf

    Additional Readings:


  11. Learning Morphology

    Minimally supervised morphological analysis by multimodal alignment (2000). Yarowsky, D. and R. Wicentowski. In Proceedings of ACL-2000, pages 207-216.

    Unsupervised Learning of the Morphology of a Natural Language (2001). John Goldsmith. Computational Linguistics, Volume 27, Number 2.

    Notes: Lecture #10 pdf

    Additional Readings:

  12. HMM Redux: Almost Parsing

    Supertagging: An Approach to Almost Parsing (1999). Srinivas Bangalore and Aravind K. Joshi. Computational Linguistics, volume 25, number 2, pages 237-265.

    Notes: Lecture #11 pdf

    Additional Readings:

  13. Prepositional Phrase Attachment

    Coping with syntactic ambiguity or how to put the block in the box on the table (1982). Kenneth Church and Ramesh Patil. Computational Linguistics 8:139-49.

    Prepositional Phrase Attachment through a Backed-Off Model (1995). Michael Collins and James Brooks. Proceedings of the Third Workshop on Very Large Corpora WVLC-95.

    Notes: Lecture #12 pdf

    Additional Readings:

  14. Unsupervised Prepositional Phrase Attachment

    Structural Ambiguity and Lexical Relations (1993). Donald Hindle and Mats Rooth. Computational Linguistics. Volume 19, Number 1, March 1993, Special Issue on Using Large Corpora: I.

    Statistical Models for Unsupervised Prepositional Phrase Attachment (1998). Adwait Ratnaparkhi. In Proceedings of COLING-ACL 1998.

    Notes: Lecture #13 pdf

    Additional Readings:

  15. Statistical Parsing using a Treebank: Context-Free Grammars and Lexicalized Models

    Head-Driven Statistical Models for Natural Language Parsing. Michael Collins. PhD Dissertation, University of Pennsylvania, 1999. Read chapters 2 and 3, pages 31-102

    Statistical parsing with an automatically-extracted tree adjoining grammar (2000). David Chiang. In Proceedings of ACL 2000, Hong Kong, October 2000, pages 456-463.

    Notes: Lecture #14 pdf and additional slides (from Michael Collins' thesis presentation)

    Additional Readings:


  16. Parsing Algorithms and the Inside-Outside Algorithm for PCFGs

    Inside-Outside Reestimation from partially bracketed corpora. Fernando Pereira and Yves Schabes. In 30th Annual Meeting of the Association for Computational Linguistics, pages 128-135, Newark, Delaware, 1992.

    Applications of stochastic context-free grammars using the Inside-Outside algorithm. K. Lari and S. J. Young. Computer Speech and Language, 4:35-56, 1990.

    Additional Readings:


  17. Co-training

    Combining Labeled and Unlabeled Data with Co-training. Avrim Blum and Tom Mitchell. In Proc. of the Workshop on Computational Learning Theory (COLT98). 1998.

    Analyzing the Effectiveness and Applicability of Co-training. Kamal Nigam and Rayid Ghani. In Ninth International Conference on Information and Knowledge Management (CIKM-2000), pp. 86-93. 2000.

    Additional Readings:

  18. Active Learning, Sample Selection

    Committee-Based Sample Selection for Probabilistic Classifiers. Shlomo Argamon-Engelson and Ido Dagan. in Journal of Artificial Intelligence Research, 1999.

    On Minimizing Training Corpus for Parser Acquisition. Rebecca Hwa. In Proc. of Workshop on Computational Natural Language Learning. 2001.

    Additional Readings:

  19. Discriminative Models: Boosting

    A short introduction to boosting . Y. Freund and R. Schapire. Journal of the Japanese Society for Artificial Intelligence. 14(5), pages 771-780, 1999.

    Boosting Applied to Tagging and PP Attachment . Steven Abney, Robert E. Schapire, and Yoram Singer. Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 38-45. 1999.

    Additional Readings:

Misc Presentations

Links to Useful Software and Corpora


Last modified: $Date: 2003-09-03 00:53:00 $