Iterative Policy Learning In End-to-end Trainable Task-oriented Neural Dialog Models | Awesome LLM Papers Contribute to Awesome LLM Papers

Iterative Policy Learning In End-to-end Trainable Task-oriented Neural Dialog Models

Bing Liu, Ian Lane . 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2017 – 98 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
ASRU Uncategorized

In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models.

Similar Work