Dataset. We will not employ microtask workers to annotate our training articles for labour-rights reasons and personal niggardliness. A small subset of the articles has been annotated by volunteers.

The data are compiled dynamically and will update with new annotations as we receive them.

download … annotated only (json) annotated & unannotated (json)

Meaning of the file: you will find those sentences judged by readers to be controversial in the highlights array of each article. Each entry in the array is one person's reading of the article. Within each reading, there is another array that lists the indices of the sentences of the article that were judged controversial. Please use our tokeniser to separate the full article into sentences so that these indices are meaningful (we do some unusual things).

Articles are © The New York Times Company. Assume all rights reserved or, for content produce by our research team, CC BY-NC-SA 4.0 (see here for details).