1. Introduction
Machine vision algorithms for high-level automatic visual recognition tasks in real-world systems appear to be unsuitable in adverse weather conditions owing to the absorption and scattering of incoming light waves. For example, a turbid atmosphere significantly reduces the visibility of captured scenes, posing severe problems for surveillance cameras or autonomous driving vehicles, and possibly causing undesired consequences. Haze removal algorithms have been used because revisiting deployed algorithms to consider the detrimental effects of the elements is seemingly impractical. In this context, image dehazing methods preprocess an input image/video stream to restore the scene radiance for subsequent algorithms. Nevertheless, because haze occurs occasionally, the unconditional removal of haze may become unfavorable when the input image is clear. Consequently, haze density estimation has attracted considerable interest from researchers over the past decade.
One of the first efforts to predict the hazy image’s visibility is to exploit multiple images that are captured under different weather conditions [
1] or different polarization degrees [
2]. However, these early models have been facing practical difficulties in obtaining sufficient images and configuring experimental equipment. Therefore, Hautiere et al. [
3] proposed an automatic method for detecting the presence of haze and estimating the visibility distance using side geographical information that was obtained from an onboard camera. Although this method eliminates the requirement for multiple images, in practice it remains difficult to deploy. The main reason is the tradeoff between accuracy and algorithmic complexity. Creating an accurate three-dimensional model is a non-trivial task that is inappropriate for visibility estimation, which is supposed to be computationally efficient and compact. Conversely, using an approximated model similar to that proposed by Hautiere et al. [
3] significantly affects the accuracy. Furthermore, this method is inapplicable to general hazy scenes because it is based on certain assumptions, for example, those regarding moving vehicles. Subsequently, Kopf et al. [
4] presented a deep photography system to enhance the visibility of hazy images. Nevertheless, their method requires an existing georeferenced digital terrain and urban models to function correctly.
A more appealing approach is to exploit only a single hazy image; this method appears challenging, but it is highly promising for real-world applications. In this context, most dehazing algorithms utilize prior information regarding the scene radiance to compensate for the lack of external knowledge. Tan [
5] assumed that the scene radiance had higher local contrast than the observed intensity. This assumption is suitable for estimating the scene albedo by maximizing the local contrast while assuming a smooth airlight; however, the recovered scene radiance tends to be overly saturated, which results in halo artifacts. He et al. [
6] presented a pioneering study regarding the dark channel prior, which states that outdoor non-sky images possess extremely dark pixels in at least one color channel around local patches. Consequently, the dark channel prior can effectively estimate the raw transmission map, which inversely quantifies the haze density. He et al. [
6] initially utilized soft matting to refine the raw transmission map and later sped up the refinement using a guided filter [
7]. In contrast, Tarel and Hautiere [
8] proposed a fast solution using an edge-preserving median of the median along a line filter. Although the algorithmic complexity is only a linear function of the pixel number, halo artifacts also affect the results. Kim et al. [
9] developed a more sophisticated filtering technique, known as the modified hybrid median filter, to reduce the halo artifacts. Recently, Berman et al. [
10] introduced the non-local haze-line prior, which postulates that a few tight clusters in the Red-Green-Blue (RGB) color space approximate to the haze-free image’s real color. However, a tradeoff between the restoration quality and run-time hinders the broad application of this prior.
Raikwar and Tapaswi [
11] rearranged the atmospheric scattering model to estimate the transmission map based on the difference of minimum color channels in order to further improve visibility restoration. They adopted a bounding function to model this difference and exploited the regression technique to estimate the bounding function. Jiang et al. [
12] proposed predicting the optical depth as a polynomial combination of haze-relevant features, in which sensitivity and error analyses were applied to reduce the model complexity. These two methods utilize synthetic datasets for estimation; hence, the domain shift problem may affect them when applied to real-world images. Wu et al. [
13] formulated visibility restoration as a variational model for jointly achieving noise diminution and accuracy improvement. However, this method is computationally expensive and it may be affected by heterogeneous lighting conditions. Therefore, more efficient denoising methods [
14,
15] can be considered for reducing the computational complexity. Tang et al. [
16] utilized another machine learning technique, which is known as random forest regression, to estimate the transmission map from a set of haze-relevant features. Similarly, Ngo et al. [
17] optimized an objective function quantifying four haze-relevant features, including contrast energy, image entropy, local standard deviation, and normalized dispersion, to estimate the transmission map. Even though the restored visibility is impressive, the high computational cost precludes the broad application of these methods. Schechner and Averbuch [
18] adopted adaptive regularization to develop a filtering approach for visibility restoration, but background noise affected the result in the distant region. Recently, Wu et al. [
19] investigated the side effects of noise on visibility estimation. Subsequently, they proposed utilizing the interleaved cascade of shrinkage fields for noise diminution in the joint recovery of the scene radiance and transmission map. However, this method is also computationally expensive.
Furthermore, deep neural networks can be exploited to predict the haze density and scene radiance. Cai et al. [
20] presented the first attempt to estimate the transmission map from a single image while using a three-layer convolutional neural network known as DehazeNet. The first layer extracts haze-relevant features, while the second layer processes these features at different scales to achieve spatial invariance. The last layer combines the results in a nonlinear manner to estimate the transmission map. However, DehazeNet does not demonstrate impressive performance because of its shallow architecture and simple learning strategy. Being inspired by DehazeNet, Li et al. [
21] developed a lightweight all-in-one dehazing network (AOD-Net) for estimating the transmission map and atmospheric light in a unified manner. This type of estimation allows for the two latent variables to refine each other, consequently reducing the reconstruction error. Zhang and Tao [
22] leveraged the compact architecture of the AOD-Net and the multiscale image fusion to design the FAMED-Net. This sophisticated network undoubtedly outperforms the AOD-Net in the visibility restoration task. It is also noteworthy that the AOD-Net and FAMED-Net can attain real-time processing when running on graphics processing unit platforms, which opens up a promising dimension toward deploying deep neural networks on edge devices. Huang et al. [
23] devised a dual architecture comprising restoration and detection networks for the joint learning of three tasks: visibility enhancement, object classification, and object localization. However, this dual network is costly in terms of computational resources. Recent studies leveraged efficient encoder—decoder frameworks and more sophisticated loss functions to improve the estimation accuracy. Li et al. [
24] exploited the encoder–decoder framework to develop a task-oriented network for haze removal, a refinement network for haze residual compensation, and a fusion network for fusing the previous two networks’ results. They also employed a loss function consisting of the mean absolute error, total variation, and dual composition losses.
The generative adversarial network (GAN), which is one of the most interesting technologies in computer science, can also be used to predict the scene radiance in hazy weather. Li et al. [
25] presented a conditional GAN to mitigate unstable learning processes in GANs. Meanwhile, Pan et al. [
26] developed a physics-based GAN to solve various ill-posed image restoration problems. Nevertheless, all of the deep-learning-based models share a common lack of complete and reliable training datasets for two main reasons: the sheer impracticality of capturing the same scene under different weather conditions and the unreliable performance of current depth cameras. Consequently, researchers have hitherto utilized synthetic datasets, in which hazy images or depth maps are synthesized from collected haze-free images or random distributions, respectively. This deficiency gives rise to the domain shift problem. Ignatov et al. [
27] pioneered an effort to address this problem by loosening the strict requirement for paired datasets of supervised learning. In this context, they utilized two GANs that corresponded to forward and inverse mappings. The results generated by the forward GAN are converted back to the input domain by the inverse GAN, and the content consistency loss is exploited to ensure that the re-generated results exhibit similar characteristics as input images. Additionally, the forward GAN’s results are discriminated from the true data distribution on the basis of color and textual information. This innovative work enables network training using an unpaired dataset.
Previously, image fusion is a viable alternative for restoring the scene visibility in poor weather. This scheme yields a single image from several images, which can be generated from a sole input or captured from different cameras. Image dehazing in this manner offers considerable advantages, for example, few patch-based artifacts and a fast processing time. These benefits are attributable to the pixel-wise operation and the elimination of transmission map estimation. Ancuti et al. [
28] exploited multiscale fusion for day and night-time single-image dehazing. The airlight is estimated in a patch-based manner using two different patch sizes because of the difference in the lighting conditions between the day and night-time scenes. Subsequently, two corresponding dehazed results, coupled with the discrete Laplacian of the original image, are fused to obtain the final result. The corresponding weight maps are derived from three essential features: image contrast, saturation, and saliency. Despite the satisfactory dehazing performance, up and down-sampling operations in the multiscale fusion hinder its broad application. Ngo et al. [
29] recently demonstrated the insignificant performance gap between single and multiscale fusions, which favors the hardware implementation for real-time processing. It is also worth noting that Choi et al. [
30] proposed an efficient method for haze density estimation, which is known as the fog aware density evaluator (FADE). The FADE predicts the haze density by exploiting the measurable deviations from the statistical regularities that were observed in real hazy and haze-free images. However, this metric is not in a normalized range, thereby resulting in difficulties in evaluating the haze density in general. Based on the comprehensive investigation when developing the FADE, Choi et al. [
30] also devised a multiscale dehazing method, but it is computationally expensive.
Among all of the aforementioned methods, none of them are seemingly capable of removing haze judiciously. In this context, dehazing algorithms invariably attempt to remove haze from the input image, regardless of whether it is hazy or haze-free. Although researchers widely use the term "haze-free” to refer to clean images, it is noteworthy that these images are not completely free of haze. In practice, the atmosphere does contain microscopic aerosols, even in the clear weather, which gives rise to the inevitable existence of distant haze. However, this phenomenon is important for the human visual system to perceive depth information. Therefore, the absolute removal of haze may result in unnatural images, which may cause observers to lose the feeling of depth. This issue demands a visibility assessment tool quantifying the image’s haze density, which helps to classify hazy and haze-free images, and correspondingly perform image dehazing. In general, human subjective assessments are the most accurate method, despite being burdensome and non-repeatable. Accordingly, objective image quality assessment (IQA) algorithms are a possible alternative. Nevertheless, most of the existing IQA metrics require ground-truth references to assess visibility distortions; hence, they are inappropriate for the demanded task. In contrast, the FADE and optical depth prediction proposed by Jiang et al. [
12] have been applied to visibility assessment from a single image; thus, they are used as benchmark methods in this study.
This study proposes a knowledge-driven approach for predicting haze density from a single image. It first explores several haze-relevant features and then selects three computationally efficient features based on a correlation and computation analysis. With these features, this study formulates an objective function for maximizing the scene radiance’s saturation, brightness, and sharpness while minimizing the dark channel. Afterwards, this study exploits analytical optimization to derive a closed-form expression of the proposed haziness degree evaluator (HDE). Additionally, it discusses three applications of HDE in hazy/haze-free image classification, dehazing performance assessment, and single image dehazing. Notably, the experimental results on hazy/haze-free image classification demonstrate that the proposed HDE is superior to the two aforementioned benchmark methods. The three main contributions of this study are as follows:
This study presents a simple correlation and computation analysis to select image features that are haze-relevant and computationally efficient.
With the selected features, this study formulates an analytically solvable objective function that simultaneously maximizes the scene radiance’s saturation, brightness, and sharpness, and minimizes the dark channel, which yields a closed-form formula for quantifying haze density from a single image.
This study demonstrates that applying the proposed HDE to a particular task of hazy/haze-free image classification results in an accuracy of approximately , which surpasses those of two benchmark metrics and human observers.
5. Discussion
The proposed HDE is a knowledge-driven approach, which is, it does not require any training on collected data prior to its deployment. By contrast, the FADE and DF are data-driven approaches, wherein data collection for the pre-calculation of their local parameters is essential. Specifically, an offline calculation for obtaining the mean vectors and covariance matrices of the corresponding hazy and haze-free image corpora is indispensable because the FADE estimates the haze density based on the Mahalanobis distance in a haze-relevant feature space. Meanwhile, the DF estimates the haze density based on the optical depth, which is the output of a regression model whose parameters are derived from least-squares estimation on a synthetic training dataset.
Figure 13 depicts the block diagrams of these two benchmark evaluators and highlights the offline calculation in pink. Conversely, the proposed HDE does not require any offline calculation. Instead, it estimates the haze density directly from a single input image and it is more computationally efficient and convenient.
Table 8 demonstrates the run-time comparison between three haze density evaluators. The experimental results tabulated therein are measured in the MATLAB R2019a environment, running on a computer with an Intel Core i7-9700 (3.0 GHz) CPU and 32 GB RAM. In relation to the FADE and DF, the aforementioned offline calculation does not affect the run-time, because it is performed in advance. Accordingly, the FADE and DF exhibit relatively fast processing speeds. However, they are still slower than the proposed HDE. On the one hand, the FADE’s time-consuming parts are haze-relevant feature extraction and Mahalanobis distance calculation. The former extracts as many as twelve features, despite the fact that some of them correlate with each other. Meanwhile, the latter is slow, owing to matrix manipulation. On the other hand, although the DF has reduced the number of features through sensitivity and error analyses, it is still not as fast as the HDE due to the use of mutual combinations between features in the regression model. In contrast, the HDE is the fastest method among the three evaluators. This high speed is attributed to the closed-form formula supporting haze density prediction from a single image.
Nevertheless, the three evaluators share some common drawbacks, such as
FNs and
FPs, as illustrated in
Figure 14a and
Figure 14b, respectively. In
Figure 14a, the FADE, DF, and HDE have incorrectly classified thin-haze and night-time images as haze-free images. In relation to the thin-haze image, it can be observed that the HDE value is close to the decision value. Because the classification of images whose HDE value is close to the decision value is ambiguous, the failure of the HDE is explicable. However, the same interpretation does not hold for the FADE and DF. Regarding the night-time image, incorrect classification is a typical shortcoming among three evaluators. One possible reason is that the atmospheric light estimate utilized in the HDE’s calculation does not reflect the heterogeneous illumination of night-time scenes. Therefore, it is determined that utilizing the local estimate of atmospheric light may be a viable solution. In this context, the local estimate can be obtained using the novel maximum reflectance prior, as proposed by Zhang et al. [
56,
57] for night-time image dehazing. However, because a more comprehensive investigation has to be done before discovering the exact reason, this failure in night-time scenes is left for future studies.
Similarly, the
FP cases presented in
Figure 14b demonstrate that all three evaluators have incorrectly classified haze-free images as hazy images. This failure occurs owing to the large sky region and smooth background. These haze-like regions pose a challenging problem for discriminating them from the actual hazy region. In that case, a thorough investigation into the image’s cumulative distribution function may provide useful insights. Moreover, leveraging semantic information may also be a viable alternative that is worthy of further investigation. These valuable pieces of information can be used to guide the final average pooling to produce a robust estimate. However, this issue also requires a more detailed investigation in future studies, similar to the
FN case on the night-time image.
Finally,
Figure 14c illustrates some cases where the proposed HDE is superior to the FADE and DF. It is clear that the two images that are depicted in
Figure 14c are obscured a considerable amount of haze. However, the FADE and DF have incorrectly classified these two as haze-free images with a substantial degree of confidence, as represented by relatively large distances to the decision values. Conversely, the proposed HDE has yielded
TPs and, hence, is superior to the FADE and DF.
6. Conclusions
This paper presented an HDE for haze density estimation from a single image. The proposed approach is knowledge-driven, as opposed to data-driven evaluators, such as the FADE and DF. Firstly, a simple correlation and computation analysis was presented to select image features that are highly pertinent to haze and are computationally efficient. An analytically solvable objective function, whose optimization is analogous to maximizing the image’s saturation, brightness, and sharpness, while minimizing the dark channel, was then formulated from these features. Optimizing this objective function resulted in an HDE’s closed-form formula. This paper also demonstrated three HDE-based applications, including hazy/haze-free image classification, dehazing performance assessment, and single image dehazing. In relation to the classification application, the experimental results showed that the proposed HDE achieved an impressive accuracy of 96%, outperforming the benchmark evaluators as well as human observers. Equipped with this superiority, the proposed evaluator can accurately quantify the image’s haze density; consequently, it can benefit the quantitative assessment of dehazing algorithms. Additionally, the proposed evaluator and its byproduct (that is, the optimal transmission map) can be exploited to improve dehazing algorithms’ performance in both hazy and clear weather conditions.
Nevertheless, a challenging problem arises when predicting the haze density of images under specific circumstances, for example, hazy night-time images or haze-free images containing a smooth background or a broad sky. This is attributable to the heterogeneous illumination of night-time scenes or the low-frequency constituent components of a smooth background or a broad sky. In addressing the former problem, leveraging the novel maximum reflectance prior information to obtain a spatially adaptive estimate of the atmospheric light might be a feasible solution. Meanwhile, a comprehensive investigation into the image’s cumulative distribution function and semantic information may provide helpful insights into addressing the latter problem. However, there are strict requirements for algorithmic complexity since haze density prediction and visibility restoration are widely considered preprocessing steps in high-level applications. Therefore, we will seek efficient and straightforward techniques to surmount those challenging problems in future studies.