Spanish Pre-trained BERT Model And Evaluation Data | Awesome LLM Papers Add your paper to Awesome LLM Papers

Spanish Pre-trained BERT Model And Evaluation Data

José Cañete, Gabriel Chaperon, Rodrigo Fuentes, Jou-Hui Ho, Hojin Kang, Jorge Pérez . Arxiv 2023 – 245 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization Evaluation Fine Tuning Has Code Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Training Techniques

The Spanish language is one of the top 5 spoken languages in the world. Nevertheless, finding resources to train or evaluate Spanish language models is not an easy task. In this paper we help bridge this gap by presenting a BERT-based language model pre-trained exclusively on Spanish data. As a second contribution, we also compiled several tasks specifically for the Spanish language in a single repository much in the spirit of the GLUE benchmark. By fine-tuning our pre-trained Spanish model, we obtain better results compared to other BERT-based models pre-trained on multilingual corpora for most of the tasks, even achieving a new state-of-the-art on some of them. We have publicly released our model, the pre-training data, and the compilation of the Spanish benchmarks.

Similar Work