Enforcing perceptual consistency on Generative Adversarial Networks by using the Normalised Laplacian Pyramid Distance
DOI:
https://doi.org/10.7557/18.5124Keywords:
Perception, Generative Adversarial NetworkAbstract
In recent years there has been a growing interest in image generation through deep learning. While an important part of the evaluation of the generated images usually involves visual inspection, the inclusion of human perception as a factor in the training process is often overlooked. In this paper we propose an alternative perceptual regulariser for image-to-image translation using conditional generative adversarial networks (cGANs). To do so automatically (avoiding visual inspection), we use the Normalised Laplacian Pyramid Distance (NLPD) to measure the perceptual similarity between the generated image and the original image. The NLPD is based on the principle of normalising the value of coefficients with respect to a local estimate of mean energy at different scales and has already been successfully tested in different experiments involving human perception. We compare this regulariser with the originally proposed L1 distance and note that when using NLPD the generated images contain more realistic values for both local and global contrast.
References
H. B. Barlow. Possible principles underlying the transformation of sensory messages. Sensory Communication, pages 217–234, 1961.
P. Burt and E. Adelson. The laplacian pyramid as a compact image code. IEEE Transactions on communications, 31(4):532–540, 1983.
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset for semantic urban scene understanding. In IEEE CVPR, pages 3213–3223, 2016.
A. Dosovitskiy and T. Brox. Generating images with perceptual similarity metrics based on deep networks. In NIPS, pages 658–666, 2016.
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, pages 2672–2680, 2014.
A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image analogies. In ACMCGIT, pages 327–340. ACM, 2001.
P. Isola, J. Zhu, T. Zhou, and A. Efros. Image- to-image translation with conditional adversarial networks. In IEEE CVPR, 2017.
V. Laparra, J. Ballé, A. Berardino, and E. P. Simoncelli. Perceptual image quality assessment using a normalized laplacian pyramid. Electronic Imaging, 2016(16):1–6, 2016.
V. Laparra, A. Berardino, J. Ballé, and E. P. Simoncelli. Perceptually optimized image rendering. JOSA, 2017.
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In IEEE CVPR, pages 3431–3440, 2015.
A. Mittal, A. K. Moorthy, and A. C. Bovik. No-reference image quality assessment in the spatial domain. IEEE TIP, 21(12):4695–4708, 2012.
A. Mittal, R. Soundararajan, and A. C. Bovik. Making a” completely blind” image quality analyzer. IEEE Signal Process. Lett., 20(3):209–212, 2013.
A. Olmos and F. A. A. Kingdom. A biologically inspired algorithm for the recovery of shading and reflectance images. Perception, 33(12):1463–1473, 2004.
A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. CoRR, abs/1511.06434, 2016.
A. Theis, Land Oord and M. Bethge. A note on the evaluation of generative models. ICLR, 2015.
R. Tyleček and R. Šára. Spatial pattern templates for recognition of objects with regular structure. In GCPR, pages 364–374. Springer, 2013.
T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In IEEE CVPR, pages 8798–8807, 2018.
Z. Wang, E. P. Simoncelli, and A. C. Bovik. Multiscale structural similarity for image quality assessment. In ACSSC, volume 2, pages 1398–1402. Ieee, 2003.
R. Zhang, P. Isola, A. Efros, E. Shechtman, and O. Wang. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE CVPR, pages 586–595, 2018.