CSTNews: a collection of texts annotated according the CST (Cross-document Structure Theory) model

The CSTNews corpus contains 50 Brazilian Portuguese text collections. Each collection has approximately 3 documents on the same subject but from different sources. Each collection is also accompanied by its human summary. The corpus was annotated by 4 computational linguists and produced satisfactory annotation agreement.

Please log in to browse and download the corpus