Best-worst scaling more reliable than rating scales: a case study on sentiment intensity annotation

Download
  1. (PDF, 359 KB)
DOIResolve DOI: http://doi.org/10.18653/v1/P17-2074
AuthorSearch for: ; Search for:
TypeArticle
Proceedings titleProceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Volume 2: Short Papers
Conference55th Annual Meeting of the Association for Computational Linguistics, 30 July - 4 August, 2017, Vancouver, BC. Canada
Pages465470
AbstractRating scales are a widely used method for data annotation; however, they present several challenges, such as difficulty in maintaining inter- and intra-annotator consistency. Best–worst scaling (BWS) is an alternative method of annotation that is claimed to produce high-quality annotations while keeping the required number of annotations similar to that of rating scales. However, the veracity of this claim has never been systematically established. Here for the first time, we set up an experiment that directly compares the rating scale method with BWS. We show that with the same total number of annotations, BWS produces significantly more reliable results than the rating scale.
Publication date
PublisherAssociation for Computational Linguistics
LanguageEnglish
AffiliationInformation and Communication Technologies; National Research Council Canada
Peer reviewedYes
NPARC number23002278
Export citationExport as RIS
Report a correctionReport a correction
Record identifierb132b0af-2ae0-4964-ac3a-493e7292a37a
Record created2017-09-28
Record modified2017-12-15
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)
Date modified: