Prospective student? please also read this page for prospective students.
My research group at the SFU Natural Language Lab has been exploring statistical machine translation with a particular interest in morphologically complex languages and multi-domain and multi-source translation settings. My group has jointly participated with NRC Interactive Language Technologies Group in the NIST OpenMT evaluation 2012 in producing a system based on the NRC Portage system tailored for Arabic-English machine translation. My group has also participated in the annual Workshop of Machine Translation (WMT) 2012 shared task on machine translation, using an in-house implementation of hierarchical phrase-based translation, called Kriya, for the French-English and English-Czech translation task. My group is also looking at how best to combine training data for machine translation when they originate from different genres, along with multi-source translation, consensus decoding. The group is also looking into ensemble methods for machine learning in various tasks, including machine translation and natural language parsing. Two graduate students from the SFU NLP lab, Ann Clifton and Majid Razmara, were picked to participate in the annual NSF funded Johns Hopkins summer research workshop, to work on a project in domain adaptation for machine translation. My group has recently received NSERC Collaborative Research and Development Grant (CRD) based on industry contributions from Boeing Inc. and its Canadian subsidiary AeroInfo, Inc. This project is about visualization methods that are enabled through natural language processing methods.
My research is focused on machine learning algorithms applied to natural language processing: in the areas of natural language parsing and statistical machine translation. I am interested in unsupervised and weakly supervised learning algorithms and structured outputs such as tag sequences or parse trees.
In the past, I have worked with stochastic tree-adjoining grammars, a generalization of context-free grammars which allows for a computationally constrained and linguistically sophisticated analysis of natural language. Where possible, I like to work on statistical models and algorithms based on formal grammars, and both finite-state and tree transducers.
Important things to consider when deciding to spend time doing research:
Contemporary machine translation relies on unsupervised learning of an alignment between the source and target language. I am interested in extensions of this approach to syntax-aware statistical machine translation, that can be used to capture the content of the source sentence, while also producing a more fluent output target sentence. The challenge of learning more complex syntax-based transfer rules from large amounts of data, machine translation for resource-poor languages, domain adaptation in translation, and translating from English into other languages are some of my current interests in this area. I also work on the formal foundations of transducers and synchronous grammars.
I expect that most of my students will work on machine translation as the target application of their research.
How can we learn from very few examples? How can a machine continue learning forever? Perhaps the learner has not observed all the features needed for future predictions? I am interested in the study of bootstrapping algorithms known to perform well on various natural language processing tasks, and methods for the unsupervised or semi-supervised learning of parsers and taggers.
For more information about my recent research, read some of my recent research papers.
XTAG project, a wide-coverage
grammar and parser for English. He received his BS in Computer Science
from the University of Poona, India in 1991. From 1991 to 1993, he
worked as a research associate at the Centre for Development of Advanced
Computing in Poona, India.