.. _scripts:

==================
Standalone Scripts
==================

:mod:`nlpnet` includes standalone scripts that may be called from a command line. They are 
copied to the `scripts` subdirectory of your Python installation, which can be included 
in the system PATH variable. There are three such scripts:

**nlpnet-train**
  Script to train a new model or further train an existing one. See :ref:`training` for detailed information on how to use it.

**nlpnet-load-embeddings**
  Script to load word embeddings trained externally. It accepts different formats. See :ref:`embeddings` for detailed information on how to use it.
  
**nlpnet-test**
  Script to measure the performance of a model against a gold data set.

**nlpnet-tag**
  Script to call a model and tag some given text.

Each of them is explained below.

.. contents::  
  :local:  
  :depth: 1  


nlpnet-tag
==========

This is the simplest :mod:`nlpnet` script. It simply runs the system for a given text input. 
It should be called with the following syntax:

.. code-block:: bash

    $ nlpnet-tag.py TASK

Where ``TASK`` is either ``pos`` or ``srl``. It has also the following command line options:

-v  Verbose mode.
-t  Disables built-in tokenizer. Tokens are assumed to be separated by whitespace and one sentence per line.
--lang  Sets the tokenkizer language (ignored if ``-t`` is used). Currently, it only accepts ``pt`` and ``en``. 
--no-repeat  Forces the classification step to avoid repeated argument labels (SRL only).
--data  The directory with the trained models (defaults to the current one).

For example:

.. code-block:: bash

    $ nlpnet-tag.py pos --data /path/to/nlpnet-data/ --lang pt
    O rato roeu a roupa do rei de Roma.
    O_ART rato_N roeu_V a_ART roupa_N do_PREP+ART rei_N de_PREP Roma_NPROP ._PU

Or with semantic role labeling:

.. code-block:: bash

    $ nlpnet-tag.py srl --data /path/to/nlpnet-data/ --lang pt
    O rato roeu a roupa do rei de Roma.
    O rato roeu a roupa do rei de Roma .
    roeu
        A1: a roupa do rei de Roma
        A0: O rato
        V: roeu

The first line was typed by the user, and the second one is the result of tokenization.


nlpnet-test
===========

This script is much simpler. It evaluates the system performance against a gold standard. 

General options
---------------

The arguments below are valid for both tasks.

--task TASK  Task for which the network should be used. Either ``pos`` or ``srl``.
-v  Verbose mode
--gold FILE  File with gold standard data
--data DIRECTORY  Directory with trained models

POS
---

--oov FILE  Analyze performance on the words described in the given file.

The ``--oov`` option requires a UTF-8 file containing one word per line. Actually, this option
is not exclusive for OOV (out-of-vocabulary) words, but rather any word list you
want to evaluate.

SRL
---

SRL evaluation is performed in different ways, depending on whether it is aimed at
argument identification, classification, predicate detection or all of them.
In the future, there may be a more standardized version for this test.

--id  Evaluate only argument identification (SRL only). The script will output the score.
--class  Evaluate only argument classification (SRL only). The script will output the score.
--preds  Evaluate only predicate identification (SRL only). The script will output the score.
--2steps  Execute SRL with two separate steps. The script will output the results in CoNLL format.
--no-repeat  Forces the classification step to avoid repeated argument labels (2 step SRL only)
--auto-pred  Determines SRL predicates automatically. Only used when evaluating the full process (identification + classification)

The CoNLL output can be evaluated against a gold file using the official SRL eval script (see http://www.lsi.upc.edu/~srlconll/soft.html).