Awesome Contextualization of E2E ASR
Curated list of awesome papers on Contextualization of E2E ASR.
The purpose of contextualizating ASR outputs is to bias the results towards tokens, generally proper nouns or rare words or jargon, which are thought likely to be produced given the context of an audio signal. Correct transcription of these tokens might have an outsized impact on the value of the output, and incorrect transcription might otherwise be likely.
To add items to this page, open up a pull request according to our contributing guide.
Contents
Deep Contextualization
End to end approaches, integrated neural modules
Contextual LAS (CLAS)
- Deep context: end-to-end contextual speech recognition
- Contextual Speech Recognition with Difficult Negative Training Examples
- Phoebe: Pronunciation-aware Contextualization for End-to-end Speech Recognition
- Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
- Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
Contextual Transducer (“RNNTs”)
- Contextual RNN-T For Open Domain ASR
- Multistate Encoding with End-To-End Speech RNN Transducer Network
- Deep Shallow Fusion for RNN-T Personalization
- Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
- Context-Aware Transformer Transducer for Speech Recognition
- Contextual Adapters for Personalized Speech Recognition in Neural Transducers
- Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer
2021
- Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition
- Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition
2022
- Improving End-to-End Contextual Speech Recognition with Fine-grained Contextual Knowledge Selection
- End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system
- Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems
External Contextualization
External modules such as Language Models, Error Correction models, and weighted FSTs applied to hypotheses of E2E ASR systems
2012
2015
- Composition-based on-the-fly rescoring for salient n-gram biasing
- Improved recognition of contact names in voice commands
2016
2017
2018
- Contextual speech recognition in end-to-end neural network systems using beam search
- Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
- End-to-end contextual speech recognition using class language models and a token passing decoder
2019
- Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition
- Shallow-Fusion End-to-End Contextual Biasing
- Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities
2020
- Joint Contextual Modeling for ASR Correction and Language Understanding
- Bangla Voice Command Recognition in end-to-end System Using Topic Modeling based Contextual Rescoring
- Fast and Robust Unsupervised Contextual Biasing for Speech Recognition
- Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model
- Incorporating Written Domain Numeric Grammars into End-To-End Contextual Speech Recognition Systems for Improved Recognition of Numeric Sequences
- Class LM and word mapping for contextual biasing in End-to-End ASR
- Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator
- Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition
- Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion
2021
- Domain-Aware Neural Language Models for Speech Recognition
- Personalization Strategies for End-to-End Speech Recognition Systems
- A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems
- Spell my name: keyword boosted speech recognition