Get our free extension to see links to code for papers anywhere online!


Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

Add code

Aug 26, 1998
Claire Cardie, David Pierce


Share this with someone who'll enjoy it:


Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a ``treebank'' corpus; then the grammar is improved by selecting rules with high ``benefit'' scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.

* Proceedings of COLING-ACL'98, pages 218-224. 
* 7 pages; 2 eps figures; uses epsf, colacl 


   Access Paper Source



Share this with someone who'll enjoy it: