## Announcements

- Grading for the course:

- Scribing and Class Participation: 30%
- Project proposal writing and reviewing: 40%
- Final paper and project: 30%
- Important Dates:
- Jan 9, 2006: First day of class
- Apr 7, 2006: Last day of class
**First project proposal:**(2 pages) due on Feb 7, 2006**Second project proposal:**(4 pages) due on Feb 23, 2006**Third project proposal:**(<8 pages) due on Mar 21, 2006**Summary of related work:**(<4 pages) due on Mar 28, 2006- Final Project due on: April 13, 2006

- Style files for final project write-up: latex style file, sample latex file, bibliography style file.
- Thu, Jan 5, 2006: Course web page created
- Fri, Jan 6, 2006: Scribing instructions: see Scribes section below.
- Fri, Jan 27, 2006: Date for submission of your 2-page 1st abstract describing your potential project is due in class on
**Feb 7, 2006**. A sample project abstract has been provided to help you write your own abstract.

## Assignments

- Homework #1
- Homework #2
- Homework #3
- Homework #4
- Homework #5. Location of files:
`/cs/natlang-a/data/zipf-expt/`

- Homework #6
- Homework #7

**Note on assignments**: All homeworks are optional, so there is no deadline.
However, doing the homeworks will probably help you substantially in your project work
and in understanding the course material.

## Scribes

- Lecture #1 by Gholamreza Haffari
- Lecture #2 by D. Song
- Lectures #3 and #4 by Gholamreza Haffari
- Lecture #5 by Maxim Roy
- Lecture #6 by Maxim Roy
- Lecture #7 by Akshay Gattani
- Lecture #10 by F. Hormozdiari
- Lecture #11 by Mehdi M. Kashani
- Lecture #16 and #17 by Javier Thaine
- Lecture #22 by Javier Thaine

Scribing instructions: you **must** use LaTeX to create your
scribe document. Use `scribe.sty`

as the LaTeX style file. A sample scribe document `scribe_sample.tex`

is
provided as an example document. On any of the CS/FAS Unix/Linux
machines use the command `pdflatex scribe_sample.txt`

to produce `scribe_sample.pdf`

## Syllabus and Readings

- Statistical Machine Translation, Basics and Evaluation Methods
- Kevin Knight MT Tutorial
- BLEU: A Method for Automatic Evaluation of Machine Translation. Kishore Papineni, Salim Roukos, Todd Ward and Wi-Jing Zhu. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. ACL-2002
- Introduction to SMT and the Bleu metric. Kishore Papineni. Presentation Slides. (for description of Bleu, jump to pages 57-75)
- Software:
- Further reading:
- BLANC: Learning Evaluation Metrics for MT. Lucian Lita; Monica Rogati; Alon Lavie. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing: HLT-EMNLP 2005, Vancouver, Canada, 2005.
- A learning approach to improving sentence level MT evaluation. Alex Kulesza and Stuart M. Shieber. In proceedings of the 10th Conference on Theoretical and Methodological issues in Machine Translation, Baltimore, 2004
- A Novel String-to-String Distance Measure With Applications to Machine Translation Evaluation. G. Leusch, N. Ueffing and H. Ney. In Proc. of MT Summit IX, 2003.
- Evaluation of Machine Translation and its Evaluation. Joseph P. Turian, Luke Shen, and I. Dan Melamed. MT Summit IX, New Orleans, LA, 2003
- A Paraphrase-Based Approach to Machine Translation Evaluation. Grazia Russo-Lassner, Jimmy Lin and Philip Resnik. University of Maryland Technical Report, Aug 2005.
- Statistical Significance Tests for Machine Translation Evaluation, Philipp Koehn, EMNLP 2004.
- Read Section 4.2 from: Clause Restructuring for Statistical Machine Translation. Michael Collins, Philipp Koehn, and Ivona Kucerova. In Proceedings of ACL 2005.

- SMT, IBM word-based models
- The Mathematics of Statistical Machine Translation: Parameter Estimation. Peter E Brown; Vincent J. Della Pietra; Stephen A. Della Pietra; Robert L. Mercer. Computational Linguistics, Volume 19, Number 2, June 1993
- HMM-Based Word Alignment in Statistical Translation. Stephan Vogel; Hermann Ney; Christoph Tillmann. COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics.
- Software:
- Further reading:
- Improving IBM Word Alignment Model 1. Robert Moore. In Proceedings of 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, pp. 519-526. 2004.
- Models of Translational Equivalence among Words. I. Dan Melamed. Computational Linguistics, Volume 26, Number 1, March 2000.
- A Maximum Entropy/Minimum Divergence Translation Model. G. Foster. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, ACL 2000.
- Cognates Can Improve Statistical Translation Models. Greg Kondrak, Daniel Marcu, and Kevin Knight. Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), Companion volume, pp. 46-48, Edmonton, May 2003.

- SMT, Phrase-based models
- Lecture notes on log-linear models. Anoop Sarkar.
- So-called "Model 6": A Systematic Comparison of Various Statistical Alignment Models. Franz Josef Och and Hermann Ney. Computational Linguistics, Volume 29, Number 1, March 2003
- The Alignment Template Approach to Statistical Machine Translation. Franz Josef Och; Hermann Ney. Computational Linguistics, Volume 30, Number 4, December 2004.
- Software:
- Further reading:
- Statistical Phrase-Based Translation. Philipp Koehn, Franz Joseph Och, and Daniel Marcu. Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), May 27-June 1, Edmonton, Canada.
- PORTAGE: A Phrase-Based Machine Translation System. Fatiha Sadat; Howard Johnson; Akakpo Agbago; George Foster; Roland Kuhn; Joel Martin; Aaron Tikuisis. Proceedings of the ACL Workshop on Building and Using Parallel Texts. ACL 2005.

- SMT, Decoding
- Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation. Christoph Tillmann and Hermann Ney. Computational Linguistics. Vol. 29, Issue 1 - March 2003, pp. 97-133
- Fast Decoding and Optimal Decoding for Machine Translation. Ulrich Germann; Michael Jahr; Kevin Knight; Daniel Marcu; Kenji Yamada. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. ACL-2001
- Greedy Decoding for Statistical Machine Translation in Almost Linear Time. Ulrich Germann. Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2003
- A Comparative Study on Reordering Constraints in Statistical Machine Translation. Richard Zens and Hermann Ney. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics: ACL-2003
- Software:
- SMT, Word-based Language Modelling and Smoothing
- An Empirical Study of Smoothing Techniques for Language Modeling. Stanley F. Chen and Joshua Goodman. Technical Report TR-10-98. Harvard University. 1998.
- Further reading:
- Relating Turing's Formula and Zipf's Law. Christer Samuelsson. Fourth Workshop on Very Large Corpora, WVLC. 1996.
- Always Good Turing: asymptotically optimal probability estimation, A. Orlitsky, N.P. Santhanam, and J. Zhang, Proceedings of the 44th Anual Symposium on Foundations of Computer Science (FOCS), October 2003.
- On the Convergence Rate of Good-Turing Estimators. David McAllester and Robert Schapire. Proceedings of the Thirteenth Annual Conference on Computational Learning Theory. (COLT), 2000.
- Interpolating between Types and Tokens by Estimating Power-Law Generators. Sharon Goldwater, Thomas L. Griffiths, and Mark Johnson. Proceedings of the 19th Conference on Neural Information Processing Systems (NIPS), Vancouver, 2005.
- A Bayesian Interpretation of Interpolated Kneser-Ney. Y.W. Teh. Technical Report TRA2/06. School of Computing, NUS, 2006.
- A. Nadas. On Turing's formula for word probabilities. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-33(6):1414-1416, Dec 1985.
- I. Witten and T. Bell. The zero frequency problem. IEEE Transactions on Information Theory, 37(4):1085-1094, 1991.

- Statistical Parsing, Basics and Evaluation Methods
- Statistical techniques for natural language parsing. Eugene Charniak. AI Magazine. 1997.
- Parsing algorithms and metrics. Joshua Goodman. In Proceedings of the 34th Annual Meeting of the ACL, pages 177-183, Santa Cruz, CA, June 1996.
- Do Homework #7 part (3) which introduces you to EVALB, the program that computes the Parseval scores for comparing parser output with the Treebank trees.
- Software:
- Parsing, Bi-lexical generative models, The EM algorithm
- Three Generative, Lexicalised Models for Statistical Parsing. Michael Collins. 1997. Proceedings of the 35th Annual Meeting of the ACL (jointly with the 8th Conference of the EACL), Madrid.
- Statistical parsing with a context-free grammar and word statistics. Eugene Charniak. Proceedings of the Fourteenth National Conference on Artificial Intelligence AAAI Press/MIT Press, Menlo Park (1997).
- The EM Algorithm. manuscript. Michael Collins.
- Software:
- Charniak parser
- Latest version of Charniak and Johnson parser (caution: link might get stale)
- Dan Bikel's parser
- Original Collins parser

- Further Reading:
- Coping with syntactic ambiguity or how to put the block in the box on the table (1982). Kenneth Church and Ramesh Patil. Computational Linguistics 8:139-49.
- Structural Ambiguity and Lexical Relations (1993). Donald Hindle and Mats Rooth. Computational Linguistics. Volume 19, Number 1, March 1993, Special Issue on Using Large Corpora: I.
- Prepositional Phrase Attachment through a Backed-Off Model (1995). Michael Collins and James Brooks. Proceedings of the Third Workshop on Very Large Corpora WVLC-95.
- Statistical Models for Unsupervised Prepositional Phrase Attachment (1998). Adwait Ratnaparkhi. In Proceedings of COLING-ACL 1998.
- An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words (2000). P. Pantel and D. Lin. In Proceedings of Association for Computational Linguistics 2000. pp. 101-108. Hong Kong.

- Parsing, Discriminative models (log-linear models, history-based models), Global linear models
- A Maximum Entropy Model for Part-Of-Speech Tagging. Adwait Ratnaparkhi. Conference on Empirical Methods in Natural Language Processing: EMNLP 1996.
- A Linear Observed Time Statistical Parser Based on Maximum Entropy Models. Adwait Ratnaparkhi. Second Conference on Empirical Methods in Natural Language Processing: EMNLP 1997.
- Learning to Resolve Natural Language Ambiguities: A Unified Approach. Dan Roth, AAAI (1998) pp. 806-813
- Maximum Entropy Markov Models for Information Extraction and Segmentation. Andrew McCallum, Dayne Freitag and Fernando Pereira. International Conference on Machine Learning: ICML-2000.
- Joint and Conditional Estimation of Tagging and Parsing Models. Mark Johnson. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics: ACL 2001.
- Lecture notes on log linear models (4 per page). Michael Collins and Regina Barzilay.
- Lecture notes on global linear models (4 per page). Michael Collins and Regina Barzilay.
- Lecture notes on global linear models (cont'd) (4 per page). Michael Collins and Regina Barzilay.
- Software:
- Further Reading:
- Conditional Structure versus Conditional Estimation in NLP Models. Dan Klein and Christopher D. Manning. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)
- A Maximum Entropy Approach to Natural Language Processing. Adam L. Berger, Vincent J. Della Pietra and Stephen A. Della Pietra. Computational Linguistics, Volume 22, Number 1, March 1996.
- Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. John Lafferty, Andrew McCallum and Fernando Pereira. In Proc. 18th International Conf. on Machine Learning: ICML 2001, pages 282-289.
- A Gaussian prior for smoothing maximum entropy models. S. Chen and R. Rosenfeld, Technical Report CMUCS-99-108, Carnegie Mellon University. 1999.
- Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. Michael Collins. EMNLP 2002.
- Ranking Algorithms for Named Entity Extraction: Boosting and the Voted Perceptron. Michael Collins. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics: ACL 2002.
- Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2003).
- The Use of Classifiers in Sequential Inference. V. Punyakanok and D. Roth, The Conference on Advances in Neural Information Processing Systems (NIPS) (2001) pp. 995-100.

- Parsing, Language Modelling, Semantic parsing and other applications
- SMT, Syntax-based models
- Introduction to Synchronous CFGs. David Chiang. unpublished course notes. (please do not redistribute without permission)
- A Hierarchical Phrase-Based Model for Statistical Machine Translation. David Chiang. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-2005).
- A Syntax-based Statistical Translation Model. Kenji Yamada and Kevin Knight. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. 2001.
- Syntax-based Language Models for Machine Translation. E. Charniak, K. Knight, and K. Yamada, Proc. MT Summit IX, 2003.
- Loosely Tree-Based Alignment for Machine Translation. Daniel Gildea. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. 2003.
- Software:
- Further reading:
- Learning Dependency Translation Models as Collections of Finite-State Head Transducers. Hiyan Alshawi; Shona Douglas; Srinivas Bangalore. Computational Linguistics, Volume 26, Number 1, March 2000.
- Bracketing and aligning words and constituents in parallel text using Stochastic Inversion Transduction Grammars. Dekai Wu. In Jean Veronis (ed.), Parallel Text Processing: Alignment and Use of Translation Corpora. Dordrecht: Kluwer. ISBN 0-7923-6546-1. Aug 2000.
- CLSP Final Report of the Johns Hopkins Summer Workshop 2003: Syntax for Statistical Machine Translation. Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, Dragomir Radev.
- Final Report of the 2005 Language Engineering Workshop on Statistical Machine Translation by Parsing. A. Burbank, M. Carpuat, S. Clark, M. Dreyer, P. Fox, D. Groves, K. Hall, M. Hearne, I. D. Melamed, Y. Shen, A. Way, B. Wellington, and D. Wu. 2005.

- Text segmentation
- Text coherence and Co-reference
- Text summarization
- Natural Languages, Formal Languages and Complexity: from regular to context-sensitive
- Finite-state transducers: computational phonology and text-to-speech
- Tree automata, Tree transducers: parsing and SMT

## Textbook and References

There is no formal textbook for this course. Most of the reading for this course is posted along with the topics below and are research papers which are usually available online. However, if you would like to brush up on some of the basics you should refer to the following books:

- Reference Books:

- Statistical Language Learning, Eugene Charniak, MIT Press, 1996
- Foundations of Statistical Natural Language Processing, Manning and Schuetze, MIT Press, 1999
- Speech and Language Processing, Jurafsky and Martin, Prentice Hall, 2000
- Fundamentals of Speech Recognition, Rabiner and Juang, Prentice Hall, 1993
- Machine Learning, Tom Mitchell, McGraw Hill, 1997
- Lectures on Contemporary Syntactic Theories, Peter Sells, CSLI Lecture Notes No. 3, 1985
- The Language Instinct, Steven Pinker, William Morrow, 1994