Improving BERT Fine-tuning Via Self-ensemble And Self-distillation | Awesome LLM Papers Add your paper to Awesome LLM Papers

Improving BERT Fine-tuning Via Self-ensemble And Self-distillation

Yige Xu, Xipeng Qiu, Ligao Zhou, Xuanjing Huang . Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020 – 61 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
Compositional Generalization EMNLP Efficiency Fine Tuning Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation

Fine-tuning pre-trained language models like BERT has become an effective way in NLP and yields state-of-the-art results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, re-designing the pre-train tasks, and leveraging external data and knowledge. The fine-tuning strategy itself has yet to be fully explored. In this paper, we improve the fine-tuning of BERT with two effective mechanisms: self-ensemble and self-distillation. The experiments on text classification and natural language inference tasks show our proposed methods can significantly improve the adaption of BERT without any external data or knowledge.

Similar Work