ASSIN (Avaliação de Similaridade Semântica e INferência textual) is a dataset with semantic similarity score and entailment annotations. It was used in a shared task in the PROPOR 2016 conference.
The full dataset has 10,000 sentence pairs, half of which in Brazilian Portuguese and half in European Portuguese. It can be downloaded here, with the same train/test splits used in the shared task. You can also see the list of annotators who took part in the creation of the dataset.
Evaluation Script and Baselines
You can find the official ASSIN evaluation script and baseline implementations in the GitHub repository. They are written in Python and require NumPy, SciPy and sklearn. One of the baselines also requires NLTK.
The evaluation script evaluates accuracy and macro F1 (the mean of the F1 scores of all classes) for textual entailment recognition and Pearson correlation and mean squared error for semantic similarity. It can be run as follows:
python assin-eval.py gold-file.xml system-file.xml
Or see its usage instructions with:
python assin-eval.py -h