A Comparative Study On End-to-end Speech To Text Translation | Awesome LLM Papers Add your paper to Awesome LLM Papers

A Comparative Study On End-to-end Speech To Text Translation

Parnia Bahar, Tobias Bieschke, Hermann Ney . 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019 – 80 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
ASRU Compositional Generalization Content Enrichment Interdisciplinary Approaches Neural Machine Translation Training Techniques Variational Autoencoders Visual Question Answering

Recent advances in deep learning show that end-to-end speech to text translation model is a promising approach to direct the speech translation field. In this work, we provide an overview of different end-to-end architectures, as well as the usage of an auxiliary connectionist temporal classification (CTC) loss for better convergence. We also investigate on pre-training variants such as initializing different components of a model using pre-trained models, and their impact on the final performance, which gives boosts up to 4% in BLEU and 5% in TER. Our experiments are performed on 270h IWSLT TED-talks En->De, and 100h LibriSpeech Audiobooks En->Fr. We also show improvements over the current end-to-end state-of-the-art systems on both tasks.

Similar Work