Trained Models

Here, you find trained models ready to be used with nlpnet. Model files can be decompressed anywhere, and when using nlpnet, the path to it must be supplied (using the --data argument in the nlpnet-tag script or the nlpnet.set_data_dir function).

If you have trained nlpnet to perform any task in another language, please enter in contact and we add provide a link to your models.

Word Embeddings (Portuguese)

Word embeddings


This is only useful for training new models. If you want to use a pre-trained POS or SRL model, you don’t need the embeddings.

These word embeddings can be used to train new nlpnet models (check the Training Section for details on how to use them). The archive contains a vocabulary file and an embeddings file. The latter is a NumPy matrix whose i-th row corresponds to the vector representation of the i-th word in the vocabulary. The embeddings were obtained applying word2embeddings over a corpus of around 240 million tokens, composed of the Portuguese Wikipedia and news articles.

POS (Portuguese)

State-of-the-art POS tagger

Performance: 97.33% token accuracy, 93.66% out-of-vocabulary token accuracy (evaluated on the revised Mac-Morpho test section)

SRL (Portuguese)

Semantic Role Labeling model

This SRL model doesn’t use any feature besides word vectors. You can use it without a parser or a chunker. However, due to the small size of PropBank-Br, its performance is lower than what SENNA obtains for English.

Performance: 66.19% precision, 59.78% recall, 62.82 F-1 (evaluated on PropBank-Br test section)

Dependency and POS (English)

Dependency Parser model

This dependency parser includes a POS tagger. Performance is unfortunately still below state-of-the-art.

Performance: 91.5% unlabeled attachment score (UAS), 89.1% labeled attachment score (LAS) (evaluated on the Penn Treebank)

Table Of Contents

Previous topic


Next topic


This Page