[PDF][PDF] ZCU-NTIS speaker diarization system for the DIHARD 2018 challenge
Proc. Interspeech, 2018•isca-archive.org
In this paper, we present the system developed by the team from the New Technologies for
the Information Society (NTIS) research center of the University of West Bohemia, for the
First DIHARD Speech Diarization Challenge. The base of our system follows the currently-
standard approach of segmentation, i-vector extraction, clustering, and resegmentation.
Here, we describe the modifications to the system which allowed us to apply it to data from a
range of different domains. The main contribution to our achievement is a Neural Network …
the Information Society (NTIS) research center of the University of West Bohemia, for the
First DIHARD Speech Diarization Challenge. The base of our system follows the currently-
standard approach of segmentation, i-vector extraction, clustering, and resegmentation.
Here, we describe the modifications to the system which allowed us to apply it to data from a
range of different domains. The main contribution to our achievement is a Neural Network …
Abstract
In this paper, we present the system developed by the team from the New Technologies for the Information Society (NTIS) research center of the University of West Bohemia, for the First DIHARD Speech Diarization Challenge. The base of our system follows the currently-standard approach of segmentation, i-vector extraction, clustering, and resegmentation. Here, we describe the modifications to the system which allowed us to apply it to data from a range of different domains. The main contribution to our achievement is a Neural Network (NN) based domain classifier, which categorizes each conversation into one of the ten domains present in the development set. This classification determines the specific system configuration, such as the expected number of speakers and the stopping criterion for the hierarchical clustering. At the time of writing of this abstract, our best submission achieves a DER of 26.90% and an MI of 8.34 bits on the evaluation set (gold speech/nonspeech segmentation).
isca-archive.org
Showing the best result for this search. See all results