PIPPA: A Partially Synthetic Conversational Dataset | Awesome LLM Papers Add your paper to Awesome LLM Papers

PIPPA: A Partially Synthetic Conversational Dataset

Tear Gosling, Alpin Dale, Yinhe Zheng . No Venue 2023

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Applications Compositional Generalization Datasets Interdisciplinary Approaches Multimodal Semantic Representation Productivity Enhancement Question Answering

With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.

Similar Work