The corpus Propbank-Br is one of the results of a post-doctoral project undertaken by Magali Sanches Duran under the supervision of Sandra Maria Aluísio. The project was developed in the Center of Computational Linguistics (NILC) at Universidade de São Paulo (USP). The research received financial support of FAPESP (São Paulo Research Foundation) under the process number 2009/07394-9.
Propbank.Br: Semantic role labeling annotation in a corpus of Brazilian Portuguese
Abstract. The project Propbank-Br aims to add a layer of semantic role labels (SRL) in a treebank of Brazilian Portuguese. The first phase of such annotation provided a training corpus that is currently being used to develop SRL classifiers. In order to take profit of SRL experience for English language, we decided to follow the guidelines of Propbank (Palmer et al. 2005). We annotated over the syntactic trees generated by the parser Palavras (Bick, 2000) in the Brazilian portion of corpus Bosque, a manually revised subcorpus of Floresta Sinta(c)tica (Afonso et al, 2002). Wherever possible, the annotation decisions were based on the Annotation Guidelines written by Olga Babko-Malaya[i]. Additionally, we consulted Propbank lexical resource (Verb Index[ii]) to observe the argument structure of English equivalents to the verbs being annotated in Portuguese. The experience evidenced language-specific issues, which required specific decisions (Duran e Aluísio, 2011).
[i] http://verbs.colorado.edu/~mpalmer/projects/ace/PBguidelines.pdf
[ii] http://verbs.colorado.edu/verb-index/index.php
References:
Afonso S. ; Bick, E. ; Haber, E. ; Santos, D. (2002) Floresta sintá(c)tica: a treebank for Portuguese. In: Proceedings of LREC-2002.
Bick, E. (2000). The Parsing System Palavras Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus, Denmark, Aarhus University Press.
Duran, M. S.; Aluísio, S. M. Propbank-Br: a Brazilian Portuguese corpus annotated with semantic role labels. In the Proceedings of the 8th Symposium in Information and Human Language Technology, October 24-26, Cuiabá/MT, Brazil.
Palmer, M.; Gildea, D.; Kingsbury, P. (2005) The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31:1., pp. 71-105, March, 2005.
Publications:
Duran, M. S.; Aluísio, S. M. Propbank-Br: a Brazilian Portuguese corpus annotated with semantic role labels. In the Proceedings of the 8th Symposium in Information and Human Language Technology, October 24-26, Cuiabá/MT, Brazil.
Duran, M. S.; Aluísio, S. M. Propbank-Br: a Brazilian Treebank annotated with semantic role labels. In the Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey.
Links:
PropBank: http://verbs.colorado.edu/~mpalmer/projects/ace.html
Unified Verb Index: http://verbs.colorado.edu/verb-index/index.php
Contact: Magali Sanches Duran, PhD - email: magali.duran at uol.com.br