nlpnet includes standalone scripts that may be called from a command line. They are copied to the scripts subdirectory of your Python installation, which can be included in the system PATH variable. There are three such scripts:
Each of them is explained below.
This is the simplest nlpnet script. It simply runs the system for a given text input. It should be called with the following syntax:
$ nlpnet-tag.py TASK
Where TASK is either pos or srl. It has also the following command line options:
-v | Verbose mode. |
-t | Disables built-in tokenizer. Tokens are assumed to be separated by whitespace and one sentence per line. |
--lang | Sets the tokenkizer language (ignored if -t is used). Currently, it only accepts pt and en. |
--no-repeat | Forces the classification step to avoid repeated argument labels (SRL only). |
--data | The directory with the trained models (defaults to the current one). |
For example:
$ nlpnet-tag.py pos --data /path/to/nlpnet-data/ --lang pt
O rato roeu a roupa do rei de Roma.
O_ART rato_N roeu_V a_ART roupa_N do_PREP+ART rei_N de_PREP Roma_NPROP ._PU
Or with semantic role labeling:
$ nlpnet-tag.py srl --data /path/to/nlpnet-data/ --lang pt
O rato roeu a roupa do rei de Roma.
O rato roeu a roupa do rei de Roma .
roeu
A1: a roupa do rei de Roma
A0: O rato
V: roeu
The first line was typed by the user, and the second one is the result of tokenization.
This script is much simpler. It evaluates the system performance against a gold standard.
The arguments below are valid for both tasks.
--task TASK | Task for which the network should be used. Either pos or srl. |
-v | Verbose mode |
--gold FILE | File with gold standard data |
--data DIRECTORY | |
Directory with trained models |
--oov FILE | Analyze performance on the words described in the given file. |
The --oov option requires a UTF-8 file containing one word per line. Actually, this option is not exclusive for OOV (out-of-vocabulary) words, but rather any word list you want to evaluate.
SRL evaluation is performed in different ways, depending on whether it is aimed at argument identification, classification, predicate detection or all of them. In the future, there may be a more standardized version for this test.
--id | Evaluate only argument identification (SRL only). The script will output the score. |
--class | Evaluate only argument classification (SRL only). The script will output the score. |
--preds | Evaluate only predicate identification (SRL only). The script will output the score. |
--2steps | Execute SRL with two separate steps. The script will output the results in CoNLL format. |
--no-repeat | Forces the classification step to avoid repeated argument labels (2 step SRL only) |
--auto-pred | Determines SRL predicates automatically. Only used when evaluating the full process (identification + classification) |
The CoNLL output can be evaluated against a gold file using the official SRL eval script (see http://www.lsi.upc.edu/~srlconll/soft.html).