PropBank.Br

Supervisor

...

Sandra Maria Aluísio

Postdoctoral Student

...

Magali Sanches Duran

Propbank-Br aims to provide semantic role labeling (SRL) annotation in corpus in order to constitute training corpus for SRL classifiers. The annotation scheme is very similar to Propbank’s one (Palmer et al. 2005) and is conceived to facilitate machine learning. The SRL annotation is over syntactic trees provided by Palavras parser (Bick, 2000). Propbank-Br has currently three annotated corpus. The first one is the Brazilian portion of corpus Bosque (Afonso et al, 2002), which has the parse trees fully revised by linguists. The second corpus is an extraction of sentences from the corpus PLN-Br (Bruckschen et al., 2008); the SRL annotation is over non-revised parse trees. The third corpus is a sample of Buscapé corpus (Hartmann et al. 2014), annotated on the same conditions of the second corpus.

Contains 8.350 instances annotated with semantic role labels. The instances were extracted from the journalistic corpus PLN-Br (Bruckschen et al., 2008) and analysed by parser PALAVRAS (Bick, 2000). The syntactic trees of this version were not revised by humans, unlike the Propbank-Br v. 1 (Duran & Aluísio, 2012) annotated on the Bosque corpus (Afonso et al. 2002).

Sample annotated to enable evaluation of Semantic Role Labels classifiers. Contains 840 instances annotated with semantic role labels over syntactic trees generated by parser Palavras (Bick, 2000). The instances were extracted from corpus Buscapé (Hartmann et al. 2014) a corpus of products reviews. The syntactic trees of this sample are not revised.

Afonso S.; Bick, E. ; Haber, E. ; Santos, D. (2002) Floresta sintá(c)tica: a treebank for Portuguese. In: Proceedings of LREC 2002.

Bick, E. (2000). The parsing system Palavras. Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus University Press Aarhus.

Bruckschein, M., Muniz, F., Souza, J. G. C., Fuchs, J. T., Infante, K., Muniz, M., ... & Aluisio, S. M. (2008). Anotação linguıstica em XML do corpus PLN-BR. Série de relatórios do NILC. NILC-ICMC-USP.

Duran, M.S., Aluísio, S.M. (2012). Propbank-Br: a Brazilian Treebank annotated with semantic role labels. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey.

Hartmann, N. S.; Avanço. L.; Balage, P. P.; Duran, M. S.; Nunes, M. G. V.; Pardo, T.; Aluísio, S. (2014). A Large Opinion Corpus in Portuguese - Tackling Out-Of-Vocabulary Words. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014).

Palmer, M., D. Gildea, and P. Kingsbury (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics 31(1), 71–106.