An Analysis Of Neural Language Modeling At Multiple Scales | Awesome LLM Papers Add your paper to Awesome LLM Papers

An Analysis Of Neural Language Modeling At Multiple Scales

Stephen Merity, Nitish Shirish Keskar, Richard Socher . Arxiv 2018 – 162 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Datasets Interdisciplinary Approaches Multimodal Semantic Representation

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Similar Work