Awesome Contextualization of E2E ASR

Curated list of awesome papers on Contextualization of E2E ASR.

The purpose of contextualizating ASR outputs is to bias the results towards tokens, generally proper nouns or rare words or jargon, which are thought likely to be produced given the context of an audio signal. Correct transcription of these tokens might have an outsized impact on the value of the output, and incorrect transcription might otherwise be likely.

To add items to this page, open up a pull request according to our contributing guide.

Deep Contextualization
External Contextualization
- 2012
- 2015
- 2016
- 2017
- 2018
- 2019
- 2020
- 2021
- 2022

Deep Contextualization

End to end approaches, integrated neural modules

Contextual LAS (CLAS)

Contextual Transducer (“RNNTs”)

2021

2022

External Contextualization

External modules such as Language Models, Error Correction models, and weighted FSTs applied to hypotheses of E2E ASR systems

2012

A Specialized WFST Approach for Class Models and Dynamic Vocabulary

2015

2016

Personalized Speech recognition on mobile devices

2017

Keyword spotting for Google assistant using contextual speech recognition

2018

2019

2020

2021

2022

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Awesome Contextualization of E2E ASR

A curated list of awesome papers on contextualizing E2E ASR outputs

Awesome Contextualization of E2E ASR

Contents

Deep Contextualization

Contextual LAS (CLAS)

Contextual Transducer (“RNNTs”)

2021

2022

External Contextualization

2012

2015

2016

2017

2018

2019

2020

2021

2022