![]() |
|
CURUPIRA
|
![]() |
From 2002 to 2004
Goals
CURUPIRA aims at providing the set of all possible syntactic analyses for any
sentence written in Brazilian Portuguese.
Project's Features
CURUPIRA is a general-purpose robust parser for Brazilian Portuguese. It parses
sentences in a top-down left-to-right manner through a context-free
constrained-relaxed functional grammar for standard written Brazilian
Portuguese and a broad-coverage extensive lexicon for Brazilian Portuguese. The
latter is a set of 1.5 million free forms (including inflected and derived
forms), comprising morpho-syntactic information (as part-of-speech, number,
person, gender, tense, aspect, transitivity, etc.). The former is a hand-made
grammar that can be defined by the 5-uple <S, V, T, P, W>, where 'S'
stands for the initial symbol (i.e., any sequence of words between two sentence
boundaries, mainly punctuation marks); 'V' stands for non-terminal vocabulary
(a tag set of syntactic functions as close as possible to NGB - the official
Brazilian grammar terminology); 'T' stands for terminal vocabulary (a tag set
of morpho-syntactic information ascribed to entries in the lexicon); 'P' stands
for the set of production rules, written in a special formalism; and 'W' stands
for the priority of application of production rules.
CURUPIRA was a former part of ReGra - the grammar and style checker developed by NILC - and thus primarily driven to parse strings of words irrespective of their grammaticality. No government, agreement and other dependency relations are checked. Except for function words (as articles and prepositions), no lexical disambiguation is carried out either. Decisions on the best part-of-speech candidate are taken by the parser itself, as it fulfils the highly-ranked syntactic structures first.
The input of CURUPIRA can be either an isolated sentence or a text (that is going to be splitted in many sentences) and should follow the standard written Brazilian Portuguese syntax for better results. Topicalizations, clefts and syntactic inversions cannot be handled by the tool in this first version. The output of CURUPIRA follows the special notation that has been developed by NILC.
Expected Results
CURUPIRA is not committed to generating the right syntactic tree for a given
sentence, but the most common surface combinations for any sequence of
morpho-syntactic classes. No semantic interpretation and syntactic
disambiguation is carried out and parse results are expected to be ranked
solely according to the priority application of rules (that is supposed to
provide the most appropriate tree for checking purposes).
Team (2004)
Maria Graças Volpe Nunes (coordinator)
Ricardo Hasegawa
Ronaldo Teixeira Martins
Contact
Ricardo Hasegawa: rh@icmc.usp.br
Ronaldo Martins: rtmartin@.uol.com.br
Related Publications
Martins, R. T.; Hasegawa, R.; Nunes, M.G.V. Curupira: um parser functional para
o português. NILC-TR-02-26, Dezembro 2002.
download zip file
Martins, R. T.; Hasegawa, R.; Nunes, M.G.V. Curupira: a functional parser for Brazilian Portuguese. In Nuno J. Mamede, Jorge Baptista, Isabel Trancoso, Maria das Graças Volpe Nunes (Eds.): Computational Processing of the Portuguese Language, 6th International Workshop, PROPOR 2003, Faro, Portugal, June 26-27, 2003. Proceedings. Lecture Notes in Computer Science 2721 Springer 2003, ISBN 3-540-40436-8