Inltk: Natural Language Toolkit For Indic Languages | Awesome LLM Papers Add your paper to Awesome LLM Papers

Inltk: Natural Language Toolkit For Indic Languages

Gaurav Arora . Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS) 2020 – 65 citations

[Code] [Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Datasets Has Code Image Text Integration Interdisciplinary Approaches Multimodal Semantic Representation Neural Machine Translation Tools Training Techniques Visual Contextualization

We present iNLTK, an open-source NLP library consisting of pre-trained language models and out-of-the-box support for Data Augmentation, Textual Similarity, Sentence Embeddings, Word Embeddings, Tokenization and Text Generation in 13 Indic Languages. By using pre-trained models from iNLTK for text classification on publicly available datasets, we significantly outperform previously reported results. On these datasets, we also show that by using pre-trained models and data augmentation from iNLTK, we can achieve more than 95% of the previous best performance by using less than 10% of the training data. iNLTK is already being widely used by the community and has 40,000+ downloads, 600+ stars and 100+ forks on GitHub. The library is available at https://github.com/goru001/inltk.

Similar Work