BAM! Born-again Multi-task Networks For Natural Language Understanding | Awesome LLM Papers Add your paper to Awesome LLM Papers

BAM! Born-again Multi-task Networks For Natural Language Understanding

Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le . Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019 – 222 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
ACL Compositional Generalization Efficiency Evaluation Fine Tuning Interdisciplinary Approaches Model Architecture Neural Machine Translation Training Techniques Variational Autoencoders

It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

Similar Work