Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (348)

Search Parameters:
Keywords = semantic augmentation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 7007 KiB  
Article
LEM-Detector: An Efficient Detector for Photovoltaic Panel Defect Detection
by Xinwen Zhou, Xiang Li, Wenfu Huang and Ran Wei
Appl. Sci. 2024, 14(22), 10290; https://doi.org/10.3390/app142210290 - 8 Nov 2024
Viewed by 386
Abstract
Photovoltaic panel defect detection presents significant challenges due to the wide range of defect scales, diverse defect types, and severe background interference, often leading to a high rate of false positives and missed detections. To address these challenges, this paper proposes the LEM-Detector, [...] Read more.
Photovoltaic panel defect detection presents significant challenges due to the wide range of defect scales, diverse defect types, and severe background interference, often leading to a high rate of false positives and missed detections. To address these challenges, this paper proposes the LEM-Detector, an efficient end-to-end photovoltaic panel defect detector based on the transformer architecture. To address the low detection accuracy for Crack and Star crack defects and the imbalanced dataset, a novel data augmentation method, the Linear Feature Augmentation (LFA) module, specifically designed for linear features, is introduced. LFA effectively improves model training performance and robustness. Furthermore, the Efficient Feature Enhancement Module (EFEM) is presented to enhance the receptive field, suppress redundant information, and emphasize meaningful features. To handle defects of varying scales, complementary semantic information from different feature layers is leveraged for enhanced feature fusion. A Multi-Scale Multi-Feature Pyramid Network (MMFPN) is employed to selectively aggregate boundary and category information, thereby improving the accuracy of multi-scale target recognition. Experimental results on a large-scale photovoltaic panel dataset demonstrate that the LEM-Detector achieves a detection accuracy of 94.7% for multi-scale defects, outperforming several state-of-the-art methods. This approach effectively addresses the challenges of photovoltaic panel defect detection, paving the way for more reliable and accurate defect identification systems. This research will contribute to the automatic detection of surface defects in industrial production, ultimately enhancing production efficiency. Full article
Show Figures

Figure 1

22 pages, 46624 KiB  
Article
Autonomous Extraction Technology for Aquaculture Ponds in Complex Geological Environments Based on Multispectral Feature Fusion of Medium-Resolution Remote Sensing Imagery
by Zunxun Liang, Fangxiong Wang, Jianfeng Zhu, Peng Li, Fuding Xie and Yifei Zhao
Remote Sens. 2024, 16(22), 4130; https://doi.org/10.3390/rs16224130 - 5 Nov 2024
Viewed by 481
Abstract
Coastal aquaculture plays a crucial role in global food security and the economic development of coastal regions, but it also causes environmental degradation in coastal ecosystems. Therefore, the automation, accurate extraction, and monitoring of coastal aquaculture areas are crucial for the scientific management [...] Read more.
Coastal aquaculture plays a crucial role in global food security and the economic development of coastal regions, but it also causes environmental degradation in coastal ecosystems. Therefore, the automation, accurate extraction, and monitoring of coastal aquaculture areas are crucial for the scientific management of coastal ecological zones. This study proposes a novel deep learning- and attention-based median adaptive fusion U-Net (MAFU-Net) procedure aimed at precisely extracting individually separable aquaculture ponds (ISAPs) from medium-resolution remote sensing imagery. Initially, this study analyzes the spectral differences between aquaculture ponds and interfering objects such as saltwater fields in four typical aquaculture areas along the coast of Liaoning Province, China. It innovatively introduces a difference index for saltwater field aquaculture zones (DIAS) and integrates this index as a new band into remote sensing imagery to increase the expressiveness of features. A median augmented adaptive fusion module (MEA-FM), which adaptively selects channel receptive fields at various scales, integrates the information between channels, and captures multiscale spatial information to achieve improved extraction accuracy, is subsequently designed. Experimental and comparative results reveal that the proposed MAFU-Net method achieves an F1 score of 90.67% and an intersection over union (IoU) of 83.93% on the CHN-LN4-ISAPS-9 dataset, outperforming advanced methods such as U-Net, DeepLabV3+, SegNet, PSPNet, SKNet, UPS-Net, and SegFormer. This study’s results provide accurate data support for the scientific management of aquaculture areas, and the proposed MAFU-Net method provides an effective method for semantic segmentation tasks based on medium-resolution remote sensing images. Full article
Show Figures

Figure 1

22 pages, 13099 KiB  
Article
Efficient Small Object Detection You Only Look Once: A Small Object Detection Algorithm for Aerial Images
by Jie Luo, Zhicheng Liu, Yibo Wang, Ao Tang, Huahong Zuo and Ping Han
Sensors 2024, 24(21), 7067; https://doi.org/10.3390/s24217067 - 2 Nov 2024
Viewed by 706
Abstract
Aerial images have distinct characteristics, such as varying target scales, complex backgrounds, severe occlusion, small targets, and dense distribution. As a result, object detection in aerial images faces challenges like difficulty in extracting small target information and poor integration of spatial and semantic [...] Read more.
Aerial images have distinct characteristics, such as varying target scales, complex backgrounds, severe occlusion, small targets, and dense distribution. As a result, object detection in aerial images faces challenges like difficulty in extracting small target information and poor integration of spatial and semantic data. Moreover, existing object detection algorithms have a large number of parameters, posing a challenge for deployment on drones with limited hardware resources. We propose an efficient small-object YOLO detection model (ESOD-YOLO) based on YOLOv8n for Unmanned Aerial Vehicle (UAV) object detection. Firstly, we propose that the Reparameterized Multi-scale Inverted Blocks (RepNIBMS) module is implemented to replace the C2f module of the Yolov8n backbone extraction network to enhance the information extraction capability of small objects. Secondly, a cross-level multi-scale feature fusion structure, wave feature pyramid network (WFPN), is designed to enhance the model’s capacity to integrate spatial and semantic information. Meanwhile, a small-object detection head is incorporated to augment the model’s ability to identify small objects. Finally, a tri-focal loss function is proposed to address the issue of imbalanced samples in aerial images in a straightforward and effective manner. In the VisDrone2019 test set, when the input size is uniformly 640 × 640 pixels, the parameters of ESOD-YOLO are 4.46 M, and the average mean accuracy of detection reaches 29.3%, which is 3.6% higher than the baseline method YOLOv8n. Compared with other detection methods, it also achieves higher detection accuracy with lower parameters. Full article
(This article belongs to the Special Issue Smart Image Recognition and Detection Sensors)
Show Figures

Figure 1

28 pages, 2887 KiB  
Article
Leveraging Large Language Models for Enhancing Literature-Based Discovery
by Ikbal Taleb, Alramzana Nujum Navaz and Mohamed Adel Serhani
Big Data Cogn. Comput. 2024, 8(11), 146; https://doi.org/10.3390/bdcc8110146 - 25 Oct 2024
Viewed by 1009
Abstract
The exponential growth of biomedical literature necessitates advanced methods for Literature-Based Discovery (LBD) to uncover hidden, meaningful relationships and generate novel hypotheses. This research integrates Large Language Models (LLMs), particularly transformer-based models, to enhance LBD processes. Leveraging LLMs’ capabilities in natural language understanding, [...] Read more.
The exponential growth of biomedical literature necessitates advanced methods for Literature-Based Discovery (LBD) to uncover hidden, meaningful relationships and generate novel hypotheses. This research integrates Large Language Models (LLMs), particularly transformer-based models, to enhance LBD processes. Leveraging LLMs’ capabilities in natural language understanding, information extraction, and hypothesis generation, we propose a framework that improves the scalability and precision of traditional LBD methods. Our approach integrates LLMs with semantic enhancement tools, continuous learning, domain-specific fine-tuning, and robust data cleansing processes, enabling automated analysis of vast text and identification of subtle patterns. Empirical validations, including scenarios on the effects of garlic on blood pressure and nutritional supplements on health outcomes, demonstrate the effectiveness of our LLM-based LBD framework in generating testable hypotheses. This research advances LBD methodologies, fosters interdisciplinary research, and accelerates discovery in the biomedical domain. Additionally, we discuss the potential of LLMs in drug discovery, highlighting their ability to extract and present key information from the literature. Detailed comparisons with traditional methods, including Swanson’s ABC model, highlight our approach’s advantages. This comprehensive approach opens new avenues for knowledge discovery and has the potential to revolutionize research practices. Future work will refine LLM techniques, explore Retrieval-Augmented Generation (RAG), and expand the framework to other domains, with a focus on dehallucination. Full article
Show Figures

Figure 1

13 pages, 10871 KiB  
Communication
Spatial Resolution Enhancement Framework Using Convolutional Attention-Based Token Mixer
by Mingyuan Peng, Canhai Li, Guoyuan Li and Xiaoqing Zhou
Sensors 2024, 24(20), 6754; https://doi.org/10.3390/s24206754 - 21 Oct 2024
Viewed by 512
Abstract
Spatial resolution enhancement in remote sensing data aims to augment the level of detail and accuracy in images captured by satellite sensors. We proposed a novel spatial resolution enhancement framework using the convolutional attention-based token mixer method. This approach leveraged spatial context and [...] Read more.
Spatial resolution enhancement in remote sensing data aims to augment the level of detail and accuracy in images captured by satellite sensors. We proposed a novel spatial resolution enhancement framework using the convolutional attention-based token mixer method. This approach leveraged spatial context and semantic information to improve the spatial resolution of images. This method used the multi-head convolutional attention block and sub-pixel convolution to extract spatial and spectral information and fused them using the same technique. The multi-head convolutional attention block can effectively utilize the local information of spatial and spectral dimensions. The method was tested on two kinds of data types, which were the visual-thermal dataset and the visual-hyperspectral dataset. Our method was also compared with the state-of-the-art methods, including traditional methods and deep learning methods. The experiment results showed that the method was effective and outperformed state-of-the-art methods in overall, spatial, and spectral accuracies. Full article
(This article belongs to the Collection Remote Sensing Image Processing)
Show Figures

Figure 1

23 pages, 4654 KiB  
Article
Effective Acoustic Model-Based Beamforming Training for Static and Dynamic Hri Applications
by Alejandro Luzanto, Nicolás Bohmer, Rodrigo Mahu, Eduardo Alvarado, Richard M. Stern and Néstor Becerra Yoma
Sensors 2024, 24(20), 6644; https://doi.org/10.3390/s24206644 - 15 Oct 2024
Viewed by 624
Abstract
Human–robot collaboration will play an important role in the fourth industrial revolution in applications related to hostile environments, mining, industry, forestry, education, natural disaster and defense. Effective collaboration requires robots to understand human intentions and tasks, which involves advanced user profiling. Voice-based communication, [...] Read more.
Human–robot collaboration will play an important role in the fourth industrial revolution in applications related to hostile environments, mining, industry, forestry, education, natural disaster and defense. Effective collaboration requires robots to understand human intentions and tasks, which involves advanced user profiling. Voice-based communication, rich in complex information, is key to this. Beamforming, a technology that enhances speech signals, can help robots extract semantic, emotional, or health-related information from speech. This paper describes the implementation of a system that provides substantially improved signal-to-noise ratio (SNR) and speech recognition accuracy to a moving robotic platform for use in human–robot interaction (HRI) applications in static and dynamic contexts. This study focuses on training deep learning-based beamformers using acoustic model-based multi-style training with measured room impulse responses (RIRs). The results show that this approach outperforms training with simulated RIRs or matched measured RIRs, especially in dynamic conditions involving robot motion. The findings suggest that training with a broad range of measured RIRs is sufficient for effective HRI in various environments, making additional data recording or augmentation unnecessary. This research demonstrates that deep learning-based beamforming can significantly improve HRI performance, particularly in challenging acoustic environments, surpassing traditional beamforming methods. Full article
(This article belongs to the Special Issue Advanced Sensors and AI Integration for Human–Robot Teaming)
Show Figures

Figure 1

16 pages, 2272 KiB  
Article
Augmenting Multimodal Content Representation with Transformers for Misinformation Detection
by Jenq-Haur Wang, Mehdi Norouzi and Shu Ming Tsai
Big Data Cogn. Comput. 2024, 8(10), 134; https://doi.org/10.3390/bdcc8100134 - 11 Oct 2024
Viewed by 697
Abstract
Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most [...] Read more.
Information sharing on social media has become a common practice for people around the world. Since it is difficult to check user-generated content on social media, huge amounts of rumors and misinformation are being spread with authentic information. On the one hand, most of the social platforms identify rumors through manual fact-checking, which is very inefficient. On the other hand, with an emerging form of misinformation that contains inconsistent image–text pairs, it would be beneficial if we could compare the meaning of multimodal content within the same post for detecting image–text inconsistency. In this paper, we propose a novel approach to misinformation detection by multimodal feature fusion with transformers and credibility assessment with self-attention-based Bi-RNN networks. Firstly, captions are derived from images using an image captioning module to obtain their semantic descriptions. These are compared with surrounding text by fine-tuning transformers for consistency check in semantics. Then, to further aggregate sentiment features into text representation, we fine-tune a separate transformer for text sentiment classification, where the output is concatenated to augment text embeddings. Finally, Multi-Cell Bi-GRUs with self-attention are used to train the credibility assessment model for misinformation detection. From the experimental results on tweets, the best performance with an accuracy of 0.904 and an F1-score of 0.921 can be obtained when applying feature fusion of augmented embeddings with sentiment classification results. This shows the potential of the innovative way of applying transformers in our proposed approach to misinformation detection. Further investigation is needed to validate the performance on various types of multimodal discrepancies. Full article
Show Figures

Figure 1

31 pages, 16306 KiB  
Article
The Identification and Quantification of Hidden Hazards in Small Scale Reservoir Engineering Based on Deep Learning: Intelligent Perception for Safety of Small Reservoir Projects in Jiangxi Province
by Zhiwei Zhou, Shibiao Fang, Weihua Fang, Yaozong Xu, Bin Zhu, Lei Li, Haixiang Ji and Wenrong Tu
Water 2024, 16(20), 2880; https://doi.org/10.3390/w16202880 - 10 Oct 2024
Viewed by 534
Abstract
This study aims to enhance the detection and assessment of safety hazards in small-scale reservoir engineering using advanced image processing and deep learning techniques. Given the critical importance of small reservoirs in flood management, water supply, and ecological balance, the effective monitoring of [...] Read more.
This study aims to enhance the detection and assessment of safety hazards in small-scale reservoir engineering using advanced image processing and deep learning techniques. Given the critical importance of small reservoirs in flood management, water supply, and ecological balance, the effective monitoring of their structural integrity is crucial. This paper developed a fully convolutional semantic segmentation method for hidden danger images of small reservoirs using an encoding–decoding structure, utilizing a deep learning framework of convolutional neural networks (CNNs) to process and analyze high-resolution images captured by unmanned aerial vehicles (UAVs). The method incorporated data augmentation and adaptive learning techniques to improve model accuracy under diverse environmental conditions. Finally, the quantification data of hidden dangers (length, width, area, etc.) were obtained by converting the image pixels to the actual size. Results demonstrate significant improvements in detecting structural deficiencies, such as cracks and seepage areas, with increased precision and recall rates compared to conventional methods, and the HHSN-25 network (Hidden Hazard Segmentation Network with 25 layers) proposed in this paper outperforms other methods. The main evaluation indicator, mIoU of HHSN-25, is higher than other methods, reaching 87.00%, and the Unet is 85.50%, and the Unet++ is 85.55%. The proposed model achieves reliable real-time performance, allowing for early warning and effective management of potential risks. This study contributes to the development of more efficient monitoring systems for small-scale reservoirs, enhancing their safety and operational sustainability. Full article
(This article belongs to the Section Urban Water Management)
Show Figures

Figure 1

15 pages, 1038 KiB  
Article
Military Equipment Entity Extraction Based on Large Language Model
by Xuhong Liu, Zhipeng Yu, Xiulei Liu, Lin Miao and Tao Yang
Appl. Sci. 2024, 14(19), 9063; https://doi.org/10.3390/app14199063 - 8 Oct 2024
Viewed by 691
Abstract
The technology of military equipment entity extraction, a crucial component in constructing military knowledge bases, holds significant research value and theoretical importance for guiding the development and improvement of equipment support forces. In the military domain, equipment entities exhibit a phenomenon of nesting, [...] Read more.
The technology of military equipment entity extraction, a crucial component in constructing military knowledge bases, holds significant research value and theoretical importance for guiding the development and improvement of equipment support forces. In the military domain, equipment entities exhibit a phenomenon of nesting, where one entity is contained within another, and abbreviations or codes are frequently used to represent these entities. To address this complexity, this paper proposes a method named CoTNER for extracting entities. Initially, a large-scale language model is used to perform data augmentation with chain-of-thought on the original dataset, providing additional semantic and contextual information. Subsequently, the augmented dataset is fine-tuned on a small-scale language model to adapt it to the task of military equipment entity extraction and to enhance its ability to learn complex rules specific to the domain of military equipment. Additionally, a high-quality data filtering strategy based on instruction-following difficulty scoring is proposed to address the catastrophic forgetting issue that may occur during the fine-tuning of large language models. The experimental results demonstrate that the proposed military equipment entity extraction method outperforms mainstream traditional deep learning methods, validating the effectiveness of CoTNER. Full article
Show Figures

Figure 1

22 pages, 783 KiB  
Article
Learned Query Optimization by Constraint-Based Query Plan Augmentation
by Chen Ye, Haoyang Duan, Hua Zhang, Yifan Wu and Guojun Dai
Mathematics 2024, 12(19), 3102; https://doi.org/10.3390/math12193102 - 3 Oct 2024
Viewed by 684
Abstract
Over the last decades, various cost-based optimizers have been proposed to generate optimal plans for SQL queries. These optimizers are key to achieving good performance in database systems and can speed up query execution. Still, they may need enormous expert efforts and perform [...] Read more.
Over the last decades, various cost-based optimizers have been proposed to generate optimal plans for SQL queries. These optimizers are key to achieving good performance in database systems and can speed up query execution. Still, they may need enormous expert efforts and perform poorly on complicated queries. Learning-based optimizers have been shown to achieve high-quality plans by learning from past experiences. However, these solutions treat each query separately and neglect the semantic equivalence among different queries. Intuitively, a high-quality plan may be obtained for a complicated query by discovering a simple equivalent query. Motivated by this, in this paper, we present Celo, a novel constraint-enhanced learned optimizer to directly integrate the equivalent information of queries into the learning-based model. We apply denial constraints to identify equivalent queries by replacing equivalent predicates. Given a query, we augment the query plans generated by the learning-based model with the high-quality plans of its equivalent queries. Then, a more potentially well-performed plan will be predicted among the augmented query plans. Extensive experiments using real-world datasets demonstrated that Celo outperforms the previous state-of-the-art (SOTA) results even with few constraints. Full article
(This article belongs to the Section Mathematics and Computer Science)
Show Figures

Figure 1

13 pages, 2999 KiB  
Article
Generative AI-Driven Data Augmentation for Crack Detection in Physical Structures
by Jinwook Kim, Joonho Seon, Soohyun Kim, Youngghyu Sun, Seongwoo Lee, Jeongho Kim, Byungsun Hwang and Jinyoung Kim
Electronics 2024, 13(19), 3905; https://doi.org/10.3390/electronics13193905 - 2 Oct 2024
Viewed by 775
Abstract
The accurate segmentation of cracks in structural materials is crucial for assessing the safety and durability of infrastructure. Although conventional segmentation models based on deep learning techniques have shown impressive detection capabilities in these tasks, their performance can be restricted by small amounts [...] Read more.
The accurate segmentation of cracks in structural materials is crucial for assessing the safety and durability of infrastructure. Although conventional segmentation models based on deep learning techniques have shown impressive detection capabilities in these tasks, their performance can be restricted by small amounts of training data. Data augmentation techniques have been proposed to mitigate the data availability issue; however, these systems often have limitations in texture diversity, scalability over multiple physical structures, and the need for manual annotation. In this paper, a novel generative artificial intelligence (GAI)-driven data augmentation framework is proposed to overcome these limitations by integrating a projected generative adversarial network (ProjectedGAN) and a multi-crack texture transfer generative adversarial network (MCT2GAN). Additionally, a novel metric is proposed to evaluate the quality of the generated data. The proposed method is evaluated using three datasets: the bridge crack library (BCL), DeepCrack, and Volker. From the simulation results, it is confirmed that the segmentation performance can be improved by the proposed method in terms of intersection over union (IoU) and Dice scores across three datasets. Full article
(This article belongs to the Special Issue Generative AI and Its Transformative Potential)
Show Figures

Figure 1

66 pages, 1555 KiB  
Article
Extracting Sentence Embeddings from Pretrained Transformer Models
by Lukas Stankevičius and Mantas Lukoševičius
Appl. Sci. 2024, 14(19), 8887; https://doi.org/10.3390/app14198887 - 2 Oct 2024
Viewed by 859
Abstract
Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates [...] Read more.
Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates sufficiently capture and represent the underlying meaning? After providing a comprehensive review of existing sentence embedding extraction and refinement methods, we thoroughly test different combinations and our original extensions of the most promising ones on pretrained models. Namely, given 110 M parameters, BERT’s hidden representations from multiple layers, and many tokens, we try diverse ways to extract optimal sentence embeddings. We test various token aggregation and representation post-processing techniques. We also test multiple ways of using a general Wikitext dataset to complement BERT’s sentence embeddings. All methods are tested on eight Semantic Textual Similarity (STS), six short text clustering, and twelve classification tasks. We also evaluate our representation-shaping techniques on other static models, including random token representations. Proposed representation extraction methods improve the performance on STS and clustering tasks for all models considered. Very high improvements for static token-based models, especially random embeddings for STS tasks, almost reach the performance of BERT-derived representations. Our work shows that the representation-shaping techniques significantly improve sentence embeddings extracted from BERT-based and simple baseline models. Full article
Show Figures

Figure 1

16 pages, 635 KiB  
Article
TAWC: Text Augmentation with Word Contributions for Imbalance Aspect-Based Sentiment Classification
by Noviyanti Santoso, Israel Mendonça and Masayoshi Aritsugi
Appl. Sci. 2024, 14(19), 8738; https://doi.org/10.3390/app14198738 - 27 Sep 2024
Viewed by 558
Abstract
Text augmentation plays an important role in enhancing the generalizability of language models. However, traditional methods often overlook the unique roles that individual words play in conveying meaning in text and imbalance class distribution, thereby risking suboptimal performance and compromising the model’s generalizability. [...] Read more.
Text augmentation plays an important role in enhancing the generalizability of language models. However, traditional methods often overlook the unique roles that individual words play in conveying meaning in text and imbalance class distribution, thereby risking suboptimal performance and compromising the model’s generalizability. This limitation motivated us to develop a novel technique called Text Augmentation with Word Contributions (TAWC). Our approach tackles this problem in two core steps: Firstly, it employs analytical correlation and semantic similarity metrics to discern the relationships between words and their associated aspect polarities. Secondly, it tailors distinct augmentation strategies to individual words based on their identified functional contributions in the text. Extensive experiments on two aspect-based sentiment analysis datasets demonstrate that the proposed TAWC model significantly improves the classification performances of popular language models, achieving gains of up to 4% compared with the case of data without augmentation, thereby setting a new standard in the field of text augmentation. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

21 pages, 713 KiB  
Article
Enhanced Prototypical Network with Customized Region-Aware Convolution for Few-Shot SAR ATR
by Xuelian Yu, Hailong Yu, Yi Liu and Haohao Ren
Remote Sens. 2024, 16(19), 3563; https://doi.org/10.3390/rs16193563 - 25 Sep 2024
Viewed by 515
Abstract
With the prosperous development and successful application of deep learning technologies in the field of remote sensing, numerous deep-learning-based methods have emerged for synthetic aperture radar (SAR) automatic target recognition (ATR) tasks over the past few years. Generally, most deep-learning-based methods can achieve [...] Read more.
With the prosperous development and successful application of deep learning technologies in the field of remote sensing, numerous deep-learning-based methods have emerged for synthetic aperture radar (SAR) automatic target recognition (ATR) tasks over the past few years. Generally, most deep-learning-based methods can achieve outstanding recognition performance on the condition that an abundance of labeled samples are available to train the model. However, in real application scenarios, it is difficult and costly to acquire and to annotate abundant SAR images due to the imaging mechanism of SAR, which poses a big challenge to existing SAR ATR methods. Therefore, SAR target recognition in the situation of few-shot, where only a scarce few labeled samples are available, is a fundamental problem that needs to be solved. In this paper, a new method named enhanced prototypical network with customized region-aware convolution (CRCEPN) is put forward to specially tackle the few-shot SAR ATR tasks. To be specific, a feature-extraction network based on a customized and region-aware convolution is first developed. This network can adaptively adjust convolutional kernels and their receptive fields according to each SAR image’s own characteristics as well as the semantical similarity among spatial regions, thus augmenting its capability to extract more informative and discriminative features. To achieve accurate and robust target identity prediction under the few-shot condition, an enhanced prototypical network is proposed. This network can improve the representation ability of the class prototype by properly making use of training and test samples together, thus effectively raising the classification accuracy. Meanwhile, a new hybrid loss is designed to learn a feature space with both inter-class separability and intra-class tightness as much as possible, which can further upgrade the recognition performance of the proposed method. Experiments performed on the moving and stationary target acquisition and recognition (MSTAR) dataset, the OpenSARShip dataset, and the SAMPLE+ dataset demonstrate that the proposed method is competitive with some state-of-the-art methods for few-shot SAR ATR tasks. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

25 pages, 3047 KiB  
Article
Hierarchical Dynamic Spatio-Temporal Graph Convolutional Networks with Self-Supervised Learning for Traffic Flow Forecasting
by Siwei Wei, Yanan Song, Donghua Liu, Sichen Shen, Rong Gao and Chunzhi Wang
Inventions 2024, 9(5), 102; https://doi.org/10.3390/inventions9050102 - 20 Sep 2024
Viewed by 865
Abstract
It is crucial for both traffic management organisations and individual commuters to be able to forecast traffic flows accurately. Graph neural networks made great strides in this field owing to their exceptional capacity to capture spatial correlations. However, existing approaches predominantly focus on [...] Read more.
It is crucial for both traffic management organisations and individual commuters to be able to forecast traffic flows accurately. Graph neural networks made great strides in this field owing to their exceptional capacity to capture spatial correlations. However, existing approaches predominantly focus on local geographic correlations, ignoring cross-region interdependencies in a global context, which is insufficient to extract comprehensive semantic relationships, thereby limiting prediction accuracy. Additionally, most GCN-based models rely on pre-defined graphs and unchanging adjacency matrices to reflect the spatial relationships among node features, neglecting the dynamics of spatio-temporal features and leading to challenges in capturing the complexity and dynamic spatial dependencies in traffic data. To tackle these issues, this paper puts forward a fresh approach: a new self-supervised dynamic spatio-temporal graph convolutional network (SDSC) for traffic flow forecasting. The proposed SDSC model is a hierarchically structured graph–neural architecture that is intended to augment the representation of dynamic traffic patterns through a self-supervised learning paradigm. Specifically, a dynamic graph is created using a combination of temporal, spatial, and traffic data; then, a regional graph is constructed based on geographic correlation using clustering to capture cross-regional interdependencies. In the feature learning module, spatio-temporal correlations in traffic data are subjected to recursive extraction using dynamic graph convolution facilitated by Recurrent Neural Networks (RNNs). Furthermore, self-supervised learning is embedded within the network training process as an auxiliary task, with the objective of enhancing the prediction task by optimising the mutual information of the learned features across the two graph networks. The superior performance of the proposed SDSC model in comparison with SOTA approaches was confirmed by comprehensive experiments conducted on real road datasets, PeMSD4 and PeMSD8. These findings validate the efficacy of dynamic graph modelling and self-supervision tasks in improving the precision of traffic flow prediction. Full article
Show Figures

Figure 1

Back to TopTop