Best-worst scaling more reliable than rating scales: a case study on sentiment intensity annotation

  1. (PDF, 359 KB)
DOIResolve DOI:
AuthorSearch for: ; Search for:
Proceedings titleProceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Volume 2: Short Papers
Conference55th Annual Meeting of the Association for Computational Linguistics, 30 July - 4 August, 2017, Vancouver, BC. Canada
AbstractRating scales are a widely used method for data annotation; however, they present several challenges, such as difficulty in maintaining inter- and intra-annotator consistency. Best–worst scaling (BWS) is an alternative method of annotation that is claimed to produce high-quality annotations while keeping the required number of annotations similar to that of rating scales. However, the veracity of this claim has never been systematically established. Here for the first time, we set up an experiment that directly compares the rating scale method with BWS. We show that with the same total number of annotations, BWS produces significantly more reliable results than the rating scale.
Publication date
PublisherAssociation for Computational Linguistics
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number23002278
Export citationExport as RIS
Report a correctionReport a correction
Record identifierb132b0af-2ae0-4964-ac3a-493e7292a37a
Record created2017-09-28
Record modified2017-12-15
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: