Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Ghai, Bhavya; Ramanan, Buvana; Mueller, Klaus

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1910.13488 (eess)

[Submitted on 29 Oct 2019 (v1), last revised 20 Nov 2019 (this version, v2)]

Title:Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Authors:Bhavya Ghai, Buvana Ramanan, Klaus Mueller

View PDF

Abstract:Automatic speech recognition (ASR) systems play a key role in many commercial products including voice assistants. Typically, they require large amounts of clean speech data for training which gives an undue advantage to large organizations which have tons of private data. In this paper, we have first curated a fairly big dataset using publicly available data sources. Thereafter, we tried to investigate if we can use publicly available noisy data to train robust ASR systems. We have used speech enhancement to clean the noisy data first and then used it together with its cleaned version to train ASR systems. We have found that using speech enhancement gives 9.5\% better word error rate than training on just noisy data and 9\% better than training on just clean data. It's performance is also comparable to the ideal case scenario when trained on noisy and its clean version.

Comments:	Accepted to AAAI conference of Artificial Intelligence 2020 (abstract)
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:1910.13488 [eess.AS]
	(or arXiv:1910.13488v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1910.13488

Submission history

From: Bhavya Ghai [view email]
[v1] Tue, 29 Oct 2019 19:23:16 UTC (499 KB)
[v2] Wed, 20 Nov 2019 05:53:39 UTC (41 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Does Speech enhancement of publicly available data help build robust Speech Recognition Systems?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators