This section details the interference detection and segmentation experiments on the collected Sentinel-1 measured data to verify the efficiency of IQDN and the accuracy of SISNet. Furthermore, qualitative and quantitative metrics are involved in the evaluation of the performance of diverse intelligent algorithms on interference detection and segmentation. Real SAR images from the Sentinel-1 satellite are built as the MID and PAIS datasets and are used in the following experiments.
5.2. Results of the Interference Segmentation Experiment
In this section, measured Sentinel-1 quick-look images of the PAIS dataset are used for validation and comparison. The RFI artifacts are annotated for ground truth. By conducting the aforementioned designs, the semantic segmentation capacity of SISNet is advanced. Then, the proposed SISNet is trained. To test the semantic segmentation performance of the proposed SISNet, we select some existing semantic segmentation networks such as FCN, Mask RCNN, UNet++, and U2Net, and train these networks with the same parameters; the results obtained are used as comparative test results. The training is carried out on an HP workstation in the school laboratory. The GPU used is a GeForce RTX 3060 12 GB, the CPU is an Intel i7-10700, and the computer has 64 GB of RAM. Due to the limitations of hardware conditions, the U2Net network parameters are too large, leaving us no other options but to run the U2NetP network, which is the lightweight version of U2Net, on the terminal. Therefore, the SISNet trained on the workstation is also formed based on U2NetP and the improvements mentioned above. Data augmentation automatically expands the training set by providing modified data from the original data. To estimate the quality of the segmentation, we use the five-fold cross-validation technique (i.e., training and validating five times) and present the averaged results. Thus, the dataset is randomly divided into a training subset and validation subset at a ratio of 4:1 in each training stage. To quantitatively prove the superiority of SISNet, we conduct hypothesis testing on the differences among indicators from test samples. All these experiments are performed in a similar pattern. For fair comparison, all state-of-the-art models are set with default, official, original parameters, such as the number of neurons and the number of network layers. In addition, the initial learning rate of training is 0.05 and descends in a polynomial form, and the model iterates for 50 epochs. During training, each epoch is iterated and becomes valid once, and the final result is the mIoU, accuracy, precision, and F1 score of the epoch when the mIoU is the highest.
The average mIoU of the proposed network and the networks used in the comparison test are calculated after the five-fold cross-validation training, as shown in
Figure 17b below, and the change in the average loss is shown in
Figure 17a. It can be concluded that the mIoU curve of SISNet fluctuates less, indicating the model can fit the training data well and has a stronger learning ability for unbalanced datasets. In addition, the mIoU curve of SISNet completely exceeds the mIoU curve of UNet and U2Net after the 30th epoch, showing that it can achieve better segmentation performance than U2Net. At last, the results show that after the 40th epoch, the error bars of SISNet’s mIoU and F1 curves do not overlap with those of other models, indicating that our method significantly improves segmentation performance.
In addition, when training SISNet, the five-fold cross-validation experiments are performed, and we present the mIoU results of five subtests with the average mIoU results in
Figure 18a. The corresponding training loss of subtests and the average loss are show in
Figure 18b. The combined average mIoU results of SISNet compared to different networks are added with the standard deviation in the form of vertical lines in individual epochs (see
Figure 19). The common sense is that when the difference between the means of two samples is greater than two standard deviation, the change has significance. It can be further concluded from the figure that SISNet significantly improves RFI segmentation performance compared to other semantic segmentation networks.
After cross-validation experiments, the results of mIoU, accuracy, precision, and F1 of each network are shown in
Table 5 following. The standard deviation is calculated from each subtest of the five-fold cross-validation method. In addition, the results of all indicators are selected based on the epoch with the best average score of mIoU.
The attained results in
Table 5 point out that FCN exhibits poor performance for the RFI artifact segmentation task. The comparative validation results of UNet++ and UNet show that deepening the network does not necessarily improve the segmentation performance. At the same time, the experimental results of UNet and U2Net show that the lightweight U2Net can achieve similar performance under the premise of using fewer parameters, demonstrating the superiority of the U2Net network for the RFI artifact segmentation task. Finally, comparing the results of SISNet and U2Net, we can reach the conclusion that the improvements proposed in this paper can further improve the performance of U2Net.
To better demonstrate the good effect of SISNet on interference segmentation, we select six typical scenarios that were not used in training. These scenarios contain various land covers, such as mountains, lakes, seaside, and islands, as well as different types of RFI artifacts. The data acquisition orbit numbers of these scenarios are shown in
Table 6.
During the experiment, we input the resized quick-look images of these scenarios into SISNet and other networks for comparative experiments. The output segmentation results obtained by each network are shown in
Figure 20.
Through the comparison of the test results, we find that the segmentation mask output by SISNet is closest to the label image, and no matter how the size of the area occupied by the RFI artifact changes in the test image, the prediction mask output by the model has better performance in terms of mask integrity and numerical accuracy of RFI instances.
To illustrate the significant improvement the proposed model possesses over the comparison models, we conduct statistical analysis based on hypothesis testing on the tested indicator results. Because the samples in the test set are independent random samples, the sampling distribution conforms to the normal distribution as the sample size reaches 100, and indicators such as IoU are continuous variables, while the comparison between models is a categorical variable; it is suitable for hypothesis testing for two-sample cases. We conduct test experiments on 20 data samples randomly selected from the test set of the PAIS dataset and obtain the sample mean, sample standard deviation, etc. of the IoU and other indicators of the test set samples under different models. Take the IoU of SISNet and U2Net as an example. With small samples, the
t distribution is used to establish the critical region. The formulas of the
t distribution are expressed as follows:
where
,
stands for the sample mean of SISNet’s test IoU results, and the standard deviation is
. The sample mean of U2Net’s test IoU results is
, and the standard deviation is
. We set the confidence level at 95%, and the corresponding critical region is
in a two-tailed test. The null hypothesis
can be established as
, and the alternative hypothesis
is
. If the obtained
t score falls beyond the critical region, the null hypothesis is rejected, which means that
is accepted and the mean of the IoU of the two groups is significantly different. The specific values of the sample mean and sample standard deviation of the IoU results of the above two tests are shown in
Table 7 below. The
t score obtained at this time is
.
Based on the statistics, it can be concluded that falls out of the critical region, which means that the hypothesis is rejected. As a result, there exists a significant difference between the IoU sample means of the two groups of test results. Given the direction of the difference, we can also note that the average IoU result of SISNet outperforms U2Net’s.
Similarly, we can calculate the
t score of each indicator of SISNet and compare it to other models one by one. The confidence level is set to 95%. By looking up to the
t distribution table, we find that the critical region
. The calculated
t scores are shown in
Table 8 following, wherein the numbers with superscripts represent the significance of the test indicators of SISNet relative to the comparison models.
From the above results, since most values fall outside of the critical region, it is obvious that the proposed SISNet significantly outperforms the other models in most indicators representing segmentation performance. Specifically, SISNet surpasses U2Net in IoU, accuracy and F1 score, while it is slightly inferior to U2Net in precision. Moreover, SISNet performs better than FCN8s, UNet, and UNet++ in all indicators, but the accuracy does not improve significantly in contrast to UNet++.
While achieving excellent segmentation performance, the proposed model has a smaller number of parameters and less computation than FCN, UNet, and UNet++, as shown in
Table 9. Compared to the original U2Net, the SISNet proposed achieved a large improvement in image segmentation indicators such as mIoU and F1 score without significantly increasing the model parameters and computational load, which shows the superiority of SISNet.
It is worth mentioning that owing to the limited computing ability, the lightweight version of U2Net network is used in the experiment, but it still achieves better segmentation performance. If the improved technologies proposed in this article can be applied to the full version of U2Net, better results should be obtained in theory.
The ablation experiment exhibits the results of a neural network after removing essential modules to acknowledge the effect of these modules on the whole network. Therefore, we perform ablation experiments by discarding the RFB-d modules and attention modules. The experiments also conduct five-fold cross-validation in the training stage.
Table 10 summarizes the average scores of mIoU, accuracy, precision, and F1 scores of diverse networks, which validates the usefulness of these modules.