USR: An Unsupervised And Reference Free Evaluation Metric For Dialog Generation | Awesome LLM Papers Contribute to Awesome LLM Papers

USR: An Unsupervised And Reference Free Evaluation Metric For Dialog Generation

Shikib Mehri, Maxine Eskenazi . Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020 – 135 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
ACL Uncategorized

The lack of meaningful automatic evaluation metrics for dialog has impeded open-domain dialog research. Standard language generation metrics have been shown to be ineffective for evaluating dialog models. To this end, this paper presents USR, an UnSupervised and Reference-free evaluation metric for dialog. USR is a reference-free metric that trains unsupervised models to measure several desirable qualities of dialog. USR is shown to strongly correlate with human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces interpretable measures for several desirable properties of dialog.

Similar Work