ASSIN (Avaliação de Similaridade Semântica e INferência textual) is a dataset with semantic similarity score and entailment annotations. It was used in a shared task in the PROPOR 2016 conference.

The full dataset has 10,000 sentence pairs, half of which in Brazilian Portuguese and half in European Portuguese, and can be downloaded here. Either language variant has 2,500 pairs for training, 500 for validation and 2,000 for testing. This is different from the split used in the shared task, in which the training set had 3,000 pairs and there was no validation set. The shared task training set can be reconstructed by simply merging both sets.

You can also see the list of annotators who took part in the creation of the dataset.

Evaluation Script and Baselines

You can find the official ASSIN evaluation script and baseline implementations in the GitHub repository. They are written in Python and require NumPy, SciPy and sklearn. One of the baselines also requires NLTK.

The evaluation script evaluates accuracy and macro F1 (the mean of the F1 scores of all classes) for textual entailment recognition and Pearson correlation and mean squared error for semantic similarity. It can be run as follows:

python assin-eval.py gold-file.xml system-file.xml

Or see its usage instructions with:

python assin-eval.py -h

Published Results on ASSIN

This is a list of published results we are aware of on ASSIN, besides the results of the participants of the shared task. Task indicates whether the paper is on Text Entailment (TE), Semantic Similarity (SS) or both.

Title Task Year
Avaliando a similaridade semântica entre frases curtas através de uma abordagem híbrida SS 2017
Análise de Medidas de Similaridade Semântica na Tarefa de Reconhecimento de Implicação Textual TE 2017
Statistical and Semantic Features to Measure Sentence Similarity in Portuguese SS 2017
Gradually Improving the Computation of Semantic Textual Similarity in Portuguese SS 2017
Recognizing Textual Entailment and Paraphrases in Portuguese TE 2017
Recognizing textual entailment: Challenges in the Portuguese language TE 2018
ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese SS 2018
Syntactic Knowledge for Natural Language Inference in Portuguese TE 2018