Text Embeddings Reveal (almost) As Much As Text

John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush . Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023 – 40 citations

[Code] [Paper]
EMNLP Human Ai Collaboration

How much private information do text embeddings reveal about the original text? We investigate the problem of embedding \textit{inversion}, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a na"ive model conditioned on the embedding performs poorly, a multi-step method that iteratively corrects and re-embeds text is able to recover (92%) of (32\text{-token}) text inputs exactly. We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes. Our code is available on Github: \href{https://github.com/jxmorris12/vec2text}{github.com/jxmorris12/vec2text}.

Awesome LLM Papers

Stay Updated

Text Embeddings Reveal (almost) As Much As Text

John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush . Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing 2023 – 40 citations

Similar Work