UNETR: Transformers for 3D Medical Image Segmentation

Hatamizadeh, Ali; Tang, Yucheng; Nath, Vishwesh; Yang, Dong; Myronenko, Andriy; Landman, Bennett; Roth, Holger; Xu, Daguang

Electrical Engineering and Systems Science > Image and Video Processing

arXiv:2103.10504 (eess)

[Submitted on 18 Mar 2021 (v1), last revised 9 Oct 2021 (this version, v3)]

Title:UNETR: Transformers for 3D Medical Image Segmentation

Authors:Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger Roth, Daguang Xu

View PDF

Abstract:Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard. Code: this https URL

Comments:	Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV) 2022
Subjects:	Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2103.10504 [eess.IV]
	(or arXiv:2103.10504v3 [eess.IV] for this version)
	https://doi.org/10.48550/arXiv.2103.10504

Submission history

From: Ali Hatamizadeh [view email]
[v1] Thu, 18 Mar 2021 20:17:15 UTC (1,167 KB)
[v2] Wed, 29 Sep 2021 15:37:43 UTC (1,952 KB)
[v3] Sat, 9 Oct 2021 17:25:43 UTC (1,953 KB)

Electrical Engineering and Systems Science > Image and Video Processing

Title:UNETR: Transformers for 3D Medical Image Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Image and Video Processing

Title:UNETR: Transformers for 3D Medical Image Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators