Generalizing Word Embeddings Using Bag Of Subwords | Awesome LLM Papers Add your paper to Awesome LLM Papers

Generalizing Word Embeddings Using Bag Of Subwords

Jinman Zhao, Sidharth Mudgal, Yingyu Liang . Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018 – 51 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
EMNLP Interdisciplinary Approaches

We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character (n)-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model’s ability in capturing the relationship between words’ textual representations and their embeddings.

Similar Work