[PDF][PDF] Contribution of RMS-Level-Based Speech Segments to Target Speech Decoding Under Noisy Conditions.

L Wang, EX Wu, F Chen - Interspeech, 2020 - isca-archive.org
L Wang, EX Wu, F Chen
Interspeech, 2020isca-archive.org
Human listeners can recognize target speech streams in complex auditory scenes. The
cortical activities can robustly track the amplitude fluctuations of target speech with auditory
attentional modulation under a range of signal-to-masker ratios (SMRs). The root-mean-
square (RMS) level of the speech signal is a crucial acoustic cue for target speech
perception. However, in most studies, the neural-tracking activities were analyzed with the
intact speech temporal envelopes, ignoring the characteristic decoding features in different …
Abstract
Human listeners can recognize target speech streams in complex auditory scenes. The cortical activities can robustly track the amplitude fluctuations of target speech with auditory attentional modulation under a range of signal-to-masker ratios (SMRs). The root-mean-square (RMS) level of the speech signal is a crucial acoustic cue for target speech perception. However, in most studies, the neural-tracking activities were analyzed with the intact speech temporal envelopes, ignoring the characteristic decoding features in different RMS-level-specific speech segments. This study aimed to explore the contributions of high-and middle-RMS-level segments to target speech decoding in noisy conditions based on electroencephalogram (EEG) signals. The target stimulus was mixed with a competing speaker at five SMRs (ie, 6, 3, 0,-3, and-6 dB), and then the temporal response function (TRF) was used to analyze the relationship between neural responses and high/middle-RMS-level segments. Experimental results showed that target and ignored speech streams had significantly different TRF responses under conditions with the high-or middle-RMS-level segments. Besides, the high-and middle-RMS-level segments elicited different TRF responses in morphological distributions. These results suggested that distinct models could be used in different RMS-level-specific speech segments to better decode target speech with corresponding EEG signals.
isca-archive.org
Showing the best result for this search. See all results