Purpose: Positron emission tomography (PET) has been widely used in various clinical applications. PET is a type of emission computed tomography and operates by positron annihilation radiation. With magnetic resonance imaging (MRI) providing anatomical information, joint PET/MRI reduces the radiation exposure risk of patients. Improved hardware and imaging algorithms have been proposed to further decrease the dose from radioactive tracers or the bed duration, but few methods focus on denoising low-count PET with MRI input. The existing methods are based on fixed conventional convolution and local attention, which do not sufficiently extract and fuse contextual and complementary information from multimodal input. There is still much room for improvement. Therefore, we propose a novel deep learning method for low-count PET/MRI denoising called the spatial-adaptive and transformer fusion network (STFNet), which consists of a Siamese encoder with a spatial-adaptive block (SA-block) and the transformer fusion encoder (TFE).
Methods: Our proposed STFNet consists of a Siamese encoder with an SA-block, TFE, and two branches of the decoder. First, in the encoder, we adapt the SA-block in the Siamese encoder. The SA-block comprises deformable convolution with fusion modulation (DCFM) and two convolutional operations, which can promote network extraction of more relative and long-range contextual features. Second, the pixel-to-pixel TFE helps the network establish a local and global relationship between high-level feature maps of PET and MRI. In the decoder part, we design two branches for PET denoising and MRI translation, and predictions are obtained by trainable weighted summation. This proposed algorithm is implemented to predict synthetic standard-dose neck PET images from low-count neck PET images and MRI. Additionally, this method is compared with the existing U-Net and residual U-Net methods with and without MRI input.
Results: To demonstrate the advantages of our method, we introduce configuration studies about TFE, ablation studies, and empirical comparative studies. Quantitative analyses are based on root mean square error (RSME), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and Pearson correlation coefficient (PCC). Additionally, qualitative results show the comparisons between our proposed method and other existing methods. All experimental results and visualizations show that our method achieves state-of-the-art performance in quantification and qualification.
Conclusions: Based on our experiments, STFNet performs better than existing methods in measurement and visualization. However, our proposed method may still be suboptimal because we apply only the L1 loss to train our data set, and the data set includes corrupted PET with different low counts. In the future, we may exploit a generative adversarial network (GAN)-based paradigm in our STFNet to further improve the visual quality.
Keywords: STFNet; U-Net; deep learning; low-count PET; magnetic resonance imaging (MRI); positron emission tomography (PET); residual U-Net.
© 2021 American Association of Physicists in Medicine.