May 17, 2004
Parsing 'A Verbless Post'
This is getting a bit ridiculous, but here goes:
A follow up to a previous post about Part-of-speech Tagging 'A Verbless Post' in which Geoff Pullum's post to the language log was analyzed for parts of speech. This post uses Eugene Charniak's statistical parser (parser03) to produce a syntactic analysis of the contents (in the Penn Treebank notation).
First thing to notice in the parser output is that the recall for humourous points scored is substantially reduced due to the fact that no verb to Thaler is produced:
(S1 (S (CC And) (PP (IN in) (NP (DT that) (NN case))) (, ,) (NP (NP (DT a) (NN word)) (PP (IN of) (NP (NP (NN gratitude)) (PP (TO to) (NP (NNP Thaler)))))) (VP (PRN (-LRB- -LCB-) (ADVP (RB otherwise)) (NP (DT an) (JJ unimportant) (NN screwball)) (-RRB- -RCB-))) (. .)))
However, overall the poor parser is strained by the lack of verbs more than the tagger seemed to be, mainly due to the added pressure of producing legitimate syntactic structures. Because verb phrases occur frequently in the training data, the parser produces structures with spurious VPs in some unfamiliar contexts:
(S1 (S (NP (IN Except)) (VP (VBZ ..)) (. .)))
and:
(VP (VBZ nouns) (NP (, ,) (NNS pronouns) ... )
Our experience in trying to parse the output of a statistical machine translation system on the NIST 02/03 data for Chinese to English translation led to similar issues of hallucinated verb phrases for some of the ungrammatical English sentences output by the system. This behaviour is documented in this paper (from HLT-NAACL, 2004).
Understanding the notation of these parse trees is likely to be more challenging for the layperson (I would hope). For the intrepid reader, a good start would be the Penn Treebank manuals.
If you examine the full output of the Charniak parser on Geoff Pullum's post (shown below), there are some strange errors in punctuations, and the usual prepositional phrase (PP) and coordination (CC) attachment errors. But, overall, the performance is very good, especially for some useful constituents like noun phrases (NPs) or parentheticals (PRN).
Posted by anoop at May 17, 2004 03:42 AM(S1 (NP (DT A) (JJ verbless) (NN novel) (. ?))) (S1 (FRAG (WRB Why) (. ?) (. ?))) (S1 (NP (NP (WP What) (NN reason)) (PP (IN for) (NP (NP (DT the) (NN accomplishment)) (PP (IN by) (NP (NP (DT this) (JJ showy) (NN fool)) (PP (IN in) (NP (NP (NNP France)) (, ,) (NP (NNP Michel)))))))))) (S1 (FRAG (NP (NNP Thaler)) (, ,) (NP (NP (PRP$ his) (NN effort)) (PP (IN at) (NP (NP (DT an) (JJ entire) (NN novel)) (PP (IN with) (NP (DT no) (NNS verbs))))) (PRN (-LRB- -LCB-) (NP (RB perhaps) (RB not) (NP (DT a) (ADJP (JJ wise) (CC or) (JJ lucrative)) (NN publication) (NN venture)) (, ,) (VP (VBN given) (NP (NP (DT the) (RB not) (JJ total) (NN incorrectness)) (PP (IN of) (NP (PRP$ my) (NNS speculations)))))) (-RRB- -RCB-)) (ADJP (RB recently) (JJ evident))) (PP (IN amongst) (NP (NP (DT the) (JJ vast) (FW efflux)) (PP (IN of) (NP (NP (JJ absurd) (JJ literary) (NN pretense)) (PP (IN in) (NP (DT the) (JJ French) (NN language))))))) (. ?))) (S1 (FRAG (INTJ (UH Well)) (, ,) (SBAR (WHNP (WDT whatever)) (S (NP (PRP$ his) (NNS reasons)) (, ,) (PP (IN in) (NP (NN response))) (, ,) (NP (PRP$ my) (JJ own) (NN contribution)) (: :) (NP (NP (DT a) (JJ verbless) (NN post)) (-LRB- -LCB-) (NP (NP (DT the) (JJ first)) (PP (IN on) (NP (NN Language) (NN Log)))) (-RRB- -RCB-)))) (. .))) (S1 (S (NP (NP (DT No) (NNS verbs)) (PP (IN at) (NP (NP (DT all)) (PP (IN in) (NP (NP (DT this) (NN book)) (PP (IN of) (NP (NP (NNP Thaler) (POS 's)) (, ,) (ADVP (RB just))))))))) (VP (VBZ nouns) (NP (, ,) (NNS pronouns) (, ,) (NNS adjectives) (, ,) (NNS adverbs) (, ,) (NNS prepositions) (, ,) (NNS subordinators) (, ,) (NNS coordinators) (, ,) (CC and) (PRN (: --) (INTJ (UH oh) (. !)) (: --)) (NNS interjections))) (. .))) (S1 (S (NP (PDT All) (DT those)) (PP (IN among) (NP (DT the) (JJ permissible) (PRN (-LRB- -LCB-) (CC and) (PP (IN for) (NP (PRP him))) (, ,)) (NN past))) (VP (VBZ participles) (ADVP (RB too)) (, ,) (PP (IN though) (NP (NP (DT no) (JJ participial) (NNS intrusions)) (PP (IN in) (NP (DT this) (NN post))))) (, ,) (NP (NP (NP (PDT such) (DT the) (JJ extreme) (NN character)) (PP (IN of) (NP (PRP$ my) (ADJP (JJ cruel) (CC and) (JJ unreasonable)) (JJ self-applicable) (NNS strictures) (-RRB- -RCB-)))) (, ,) (CC but) (RB never) (NP (CD one) (JJ single) (JJ solitary) (NN verb)))) (. .))) (S1 (S (CC And) (, ,) (ADVP (RB fantastically)) (, ,) (NP (PDT all) (DT this)) (VP (NP (NP (NP (DT a) (NN vision)) (PP (IN of) (NP (NP (DT some) (NN liberation)) (PP (IN for) (NP (NNS authors)))))) (, ,) (RB not) (NP (NP (DT an) (JJ absurd) (JJ literary) (NN straitjacket)) (PP (IN with) (NP (DT the) (NN writer))))) (PRN (-LRB- -LCB-) (PP (IN albeit) (NP (RB willingly))) (-RRB- -RCB-)) (VP (VBN imprisoned) (PP (IN within) (NP (PRP it))))) (. .))) (S1 (NP (NP (DT Some) (NN freedom)) (, ,) (NP (DT this)) (. .))) (S1 (FRAG (NP (NNP Thaler)) (: :) (S (NP (NNS nuts) (, ,) (NNS bonkers) (, ,)) (VP (VBP round) (DT the) (VP (VB bend)))) (. .))) (S1 (NP (NP (JJ Mad)) (PP (IN as) (NP (DT a) (NNP March) (NN hare))) (. .))) (S1 (S (NP (DT The) (NNP Liberman) (NN conjecture)) (PRN (-LRB- -LCB-) (PP (IN about) (NP (NP (NN survival)) (PP (IN of) (NP (NP (JJ high) (NN school) (JJ literary) (NN experimentation)) (PP (IN into) (NP (NP (NN adulthood)) (PP (IN because) (IN of) (NP (DT a) (ADJP (JJ dysfunctional) (JJ authoritarian)) (JJ French) (JJ educational) (NN system))))))))) (-RRB- -RCB-)) (: :) (S (ADVP (RB probably)) (ADJP (JJ true))) (. .))) (S1 (NP (NP (PRP$ My) (NN attitude)) (: :) (NP (NP (NN contempt)) (, ,) (ADVP (RB really))) (. .))) (S1 (S (NP (IN Except)) (VP (VBZ ..)) (. .))) (S1 (FRAG (PP (IN Unless) (NP (CD ..))) (. .))) (S1 (S (ADVP (RB Just) (RB possibly)) (, ,) (NP (NP (DT an) (NN exercise)) (, ,) (PP (IN for) (NP (NP (DT the) (NNS undergraduates)) (PP (IN in) (NP (NP (PRP$ my) (NN course)) (PP (IN on) (NP (NNP English)))))))) (VP (NN grammar) (NP (DT this) (NN fall) (NN quarter))) (. .))) (S1 (NP (NP (DT An) (NN effort)) (PP (IN at) (NP (NP (NN construction)) (PP (IN of) (NP (NP (JJ fifty) (NNS words)) (PP (IN of) (NP (NP (JJ coherent) (NN prose)) (PP (IN with) (NP (NP (ADVP (RB never)) (DT a) (NN verb)) (, ,) (PP (IN with) (NP (NP (RB only) (DT those)) (PP (IN in) (NP (NP (NN possession)) (PP (IN of) (NP (NP (JJ enough) (JJ grammatical) (NN knowledge)) (PP (IN for) (NP (NP (JJ verb) (NN identification)) (ADJP (JJ capable) (PP (IN of) (NP (NN success)))))))))))))))))))) (. .))) (S1 (FRAG (ADJP (JJ Worth) (S (NP (DT a) (NN try)))) (, ,) (ADVP (RB perhaps)) (. .))) (S1 (S (CC And) (PP (IN in) (NP (DT that) (NN case))) (, ,) (NP (NP (DT a) (NN word)) (PP (IN of) (NP (NP (NN gratitude)) (PP (TO to) (NP (NNP Thaler)))))) (VP (PRN (-LRB- -LCB-) (ADVP (RB otherwise)) (NP (DT an) (JJ unimportant) (NN screwball)) (-RRB- -RCB-))) (. .))) (S1 (FRAG (NP (RB Always) (DT that) (JJ extra) (NN possibility)) (: :) (S (NP (DT the) (NN idea)) (VP (VBP justifiable) (PP (RB not) (PP (IN because) (IN of) (NP (PRP$ its) (NN implementation))) (, ,) (CC but) (PP (IN in) (NP (NP (NN virtue)) (PP (IN of) (NP (NP (DT a) (ADJP (JJ complementary) (CC or) (JJ counterposed)) (NN idea) (NN emergent)) (PP (IN in) (NP (NP (DT the) (NN mind)) (PP (IN of) (NP (NP (NN someone) (RB else)) (: --) (NP (NP (JJ serendipitous) (JJ bastard) (NN offspring)) (PP (IN of) (NP (DT a) (JJ deranged) (JJ cognitive) (NN parent))))))))))))))) (. .))) (S1 (FRAG (RB So) (NP (NP (PRP$ my) (NN gratitude)) (PP (TO to) (NP (PRP you)))) (, ,) (NP (NNP Thaler)) (, ,) (NP (PRP you) (JJ pusillanimous) (NN poseur)) (, ,) (NP (PRP you) (JJ literary) (NN clown)) (. .))) (S1 (NP (DT A) (JJ new) (NN idea) (. !))) (S1 (FRAG (NP (PRP$ My) (NN idea)) (, ,) (NP (NP (DT all) (NN mine)) (PRN (-LRB- -LCB-) (NP (NP (ADJP (JJ accessible) (PP (ADVP (RB here)) (IN on) (NP (NN Language)))) (NN Log)) (PP (TO to) (NP (QP (RB just) (DT a) (JJ few) (CD thousand)) (JJ close) (NNS friends)))) (-RRB- -RCB-))) (. .))) (S1 (FRAG (NP (NNP Ooh)) (, ,) (NP (CD one) (JJ other) (NN thought)) (, ,) (PP (IN for) (NP (JJ computational) (NNS linguists))) (: :) (SBAR (WHNP (WP What)) (S (VP (NNS bets) (PP (IN on) (NP (NP (DT the) (NN performance)) (PP (IN of) (NP (NP (JJ part-of-speech) (VBG tagging) (NNS algorithms)) (PP (IN on) (NP (NN prose))) (PP (JJ such) (IN as) (NP (DT this)))))))))) (. ?)))