Towards Standard Criteria For Human Evaluation Of Chatbots: A Survey | Awesome LLM Papers Add your paper to Awesome LLM Papers

Towards Standard Criteria For Human Evaluation Of Chatbots: A Survey

Hongru Liang, Huaqing Li . Arxiv 2021 – 93 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Content Enrichment Evaluation Survey Paper

Human evaluation is becoming a necessity to test the performance of Chatbots. However, off-the-shelf settings suffer the severe reliability and replication issues partly because of the extremely high diversity of criteria. It is high time to come up with standard criteria and exact definitions. To this end, we conduct a through investigation of 105 papers involving human evaluation for Chatbots. Deriving from this, we propose five standard criteria along with precise definitions.

Similar Work