Núcleo Interinstitucional de Lingüística Computacional
An Interinstitutional Center for Research and Development in Computational Linguistics

Stemmer

Porter Stemmer for Brazilian Portuguese

Description

This stemmer was developed at LABIC and follows Porter's algorithm. It works for Brazilian Portuguese language, identifying the stem of words by incrementally removing their suffix/termination.

Publications

Caldas Junior, J.; Imamura, C.Y.M.; Rezende, S.O. (2001). Avaliação de um Algoritmo de Stemming para o Língua Portuguesa. In the Proceedings of the 2nd Congress of Logic Applied to Technology, Vol. 2, pp. 267–274.

Downloads

Stemmer for words

The stemmer must be executed in command line, getting as input a word. For instance:

stemmer.exe word

The stem of the word will be shown in the screen. To put it in a file, execute the following:

stemmer.exe word > myfile.txt

Stemmer for files

The stemmer must be executed in command line, getting as input a file name. For instance:

stemmer.exe myfile.txt

The stem of the word will be stored in a file with the same name + '.stemmed' (for instance, myfile.txt.stemmed)