Music Source Separation in the Waveform Domain

Défossez, Alexandre; Usunier, Nicolas; Bottou, Léon; Bach, Francis

Computer Science > Sound

arXiv:1911.13254 (cs)

[Submitted on 27 Nov 2019 (v1), last revised 28 Apr 2021 (this version, v2)]

Title:Music Source Separation in the Waveform Domain

Authors:Alexandre Défossez (FAIR, SIERRA, PSL), Nicolas Usunier (FAIR), Léon Bottou (FAIR), Francis Bach (DI-ENS, PSL, SIERRA)

View PDF

Abstract:Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other this http URL to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we compare two waveform domain architectures. We first adapt Conv-Tasnet, initially developed for speech source separation,to the task of music source separation. While Conv-Tasnet beats many existing spectrogram-domain methods, it suffersfrom significant artifacts, as shown by human evaluations. We propose instead Demucs, a novel waveform-to-waveform model,with a U-Net structure and bidirectional this http URL on the MusDB dataset show that, with proper data augmentation, Demucs beats allexisting state-of-the-art architectures, including Conv-Tasnet, with 6.3 SDR on average, (and up to 6.8 with 150 extra training songs, even surpassing the IRM oracle for the bass source).Using recent development in model quantization, Demucs can be compressed down to 120MBwithout any loss of this http URL also provide human evaluations, showing that Demucs benefit from a large advantagein terms of the naturalness of the audio. However, it suffers from some bleeding,especially between the vocals and other source.

Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as:	arXiv:1911.13254 [cs.SD]
	(or arXiv:1911.13254v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1911.13254

Submission history

From: Alexandre Defossez [view email] [via CCSD proxy]
[v1] Wed, 27 Nov 2019 13:50:45 UTC (247 KB)
[v2] Wed, 28 Apr 2021 14:37:48 UTC (113 KB)

Computer Science > Sound

Title:Music Source Separation in the Waveform Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Music Source Separation in the Waveform Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators