LRC-BERT: Latent-representation Contrastive Knowledge Distillation For Natural Language Understanding | Awesome LLM Papers Add your paper to Awesome LLM Papers

LRC-BERT: Latent-representation Contrastive Knowledge Distillation For Natural Language Understanding

Hao Fu, Shaojun Zhou, Qihong Yang, Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li . Proceedings of the AAAI Conference on Artificial Intelligence 2021 – 46 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
AAAI Compositional Generalization Content Enrichment Datasets Efficiency Evaluation Image Text Integration Interactive Environments Interdisciplinary Approaches Model Architecture Multimodal Semantic Representation Neural Machine Translation Productivity Enhancement Question Answering Security Training Techniques

The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts of memory and the consumption of inference time, which makes it difficult to deploy them on edge devices. In this work, we propose a knowledge distillation method LRC-BERT based on contrastive learning to fit the output of the intermediate layer from the angular distance aspect, which is not considered by the existing distillation methods. Furthermore, we introduce a gradient perturbation-based training architecture in the training phase to increase the robustness of LRC-BERT, which is the first attempt in knowledge distillation. Additionally, in order to better capture the distribution characteristics of the intermediate layer, we design a two-stage training method for the total distillation loss. Finally, by verifying 8 datasets on the General Language Understanding Evaluation (GLUE) benchmark, the performance of the proposed LRC-BERT exceeds the existing state-of-the-art methods, which proves the effectiveness of our method.

Similar Work