Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,144)

Search Parameters:
Keywords = RGB-D

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 8945 KiB  
Article
Multimodal Data Fusion for Precise Lettuce Phenotype Estimation Using Deep Learning Algorithms
by Lixin Hou, Yuxia Zhu, Mengke Wang, Ning Wei, Jiachi Dong, Yaodong Tao, Jing Zhou and Jian Zhang
Plants 2024, 13(22), 3217; https://doi.org/10.3390/plants13223217 (registering DOI) - 15 Nov 2024
Viewed by 304
Abstract
Effective lettuce cultivation requires precise monitoring of growth characteristics, quality assessment, and optimal harvest timing. In a recent study, a deep learning model based on multimodal data fusion was developed to estimate lettuce phenotypic traits accurately. A dual-modal network combining RGB and depth [...] Read more.
Effective lettuce cultivation requires precise monitoring of growth characteristics, quality assessment, and optimal harvest timing. In a recent study, a deep learning model based on multimodal data fusion was developed to estimate lettuce phenotypic traits accurately. A dual-modal network combining RGB and depth images was designed using an open lettuce dataset. The network incorporated both a feature correction module and a feature fusion module, significantly enhancing the performance in object detection, segmentation, and trait estimation. The model demonstrated high accuracy in estimating key traits, including fresh weight (fw), dry weight (dw), plant height (h), canopy diameter (d), and leaf area (la), achieving an R2 of 0.9732 for fresh weight. Robustness and accuracy were further validated through 5-fold cross-validation, offering a promising approach for future crop phenotyping. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)
Show Figures

Figure 1

32 pages, 11087 KiB  
Article
Path Planning and Motion Control of Robot Dog Through Rough Terrain Based on Vision Navigation
by Tianxiang Chen, Yipeng Huangfu, Sutthiphong Srigrarom and Boo Cheong Khoo
Sensors 2024, 24(22), 7306; https://doi.org/10.3390/s24227306 - 15 Nov 2024
Viewed by 438
Abstract
This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from [...] Read more.
This article delineates the enhancement of an autonomous navigation and obstacle avoidance system for a quadruped robot dog. Part one of this paper presents the integration of a sophisticated multi-level dynamic control framework, utilizing Model Predictive Control (MPC) and Whole-Body Control (WBC) from MIT Cheetah. The system employs an Intel RealSense D435i depth camera for depth vision-based navigation, which enables high-fidelity 3D environmental mapping and real-time path planning. A significant innovation is the customization of the EGO-Planner to optimize trajectory planning in dynamically changing terrains, coupled with the implementation of a multi-body dynamics model that significantly improves the robot’s stability and maneuverability across various surfaces. The experimental results show that the RGB-D system exhibits superior velocity stability and trajectory accuracy to the SLAM system, with a 20% reduction in the cumulative velocity error and a 10% improvement in path tracking precision. The experimental results also show that the RGB-D system achieves smoother navigation, requiring 15% fewer iterations for path planning, and a 30% faster success rate recovery in challenging environments. The successful application of these technologies in simulated urban disaster scenarios suggests promising future applications in emergency response and complex urban environments. Part two of this paper presents the development of a robust path planning algorithm for a robot dog on a rough terrain based on attached binocular vision navigation. We use a commercial-of-the-shelf (COTS) robot dog. An optical CCD binocular vision dynamic tracking system is used to provide environment information. Likewise, the pose and posture of the robot dog are obtained from the robot’s own sensors, and a kinematics model is established. Then, a binocular vision tracking method is developed to determine the optimal path, provide a proposal (commands to actuators) of the position and posture of the bionic robot, and achieve stable motion on tough terrains. The terrain is assumed to be a gentle uneven terrain to begin with and subsequently proceeds to a more rough surface. This work consists of four steps: (1) pose and position data are acquired from the robot dog’s own inertial sensors, (2) terrain and environment information is input from onboard cameras, (3) information is fused (integrated), and (4) path planning and motion control proposals are made. Ultimately, this work provides a robust framework for future developments in the vision-based navigation and control of quadruped robots, offering potential solutions for navigating complex and dynamic terrains. Full article
Show Figures

Figure 1

20 pages, 6095 KiB  
Article
MSANet: LiDAR-Camera Online Calibration with Multi-Scale Fusion and Attention Mechanisms
by Fengguang Xiong, Zhiqiang Zhang, Yu Kong, Chaofan Shen, Mingyue Hu, Liqun Kuang and Xie Han
Remote Sens. 2024, 16(22), 4233; https://doi.org/10.3390/rs16224233 - 14 Nov 2024
Viewed by 448
Abstract
Sensor data fusion is increasingly crucial in the field of autonomous driving. In sensor fusion research, LiDAR and camera have become prevalent topics. However, accurate data calibration from different modalities is essential for effective fusion. Current calibration methods often depend on specific targets [...] Read more.
Sensor data fusion is increasingly crucial in the field of autonomous driving. In sensor fusion research, LiDAR and camera have become prevalent topics. However, accurate data calibration from different modalities is essential for effective fusion. Current calibration methods often depend on specific targets or manual intervention, which are time-consuming and have limited generalization capabilities. To address these issues, we introduce MSANet: LiDAR-Camera Online Calibration with Multi-Scale Fusion and Attention Mechanisms, an end-to-end deep learn-based online calibration network for inferring 6-degree of freedom (DOF) rigid body transformations between 2D images and 3D point clouds. By fusing multi-scale features, we obtain feature representations that contain a lot of detail and rich semantic information. The attention module is used to carry out feature correlation among different modes to complete feature matching. Rather than acquiring the precise parameters directly, MSANet online corrects deviations, aligning the initial calibration with the ground truth. We conducted extensive experiments on the KITTI datasets, demonstrating that our method performs well across various scenarios, the average error of translation prediction especially improves the accuracy by 2.03 cm compared with the best results in the comparison method. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

15 pages, 7931 KiB  
Article
Color Models in the Process of 3D Digitization of an Artwork for Presentation in a VR Environment of an Art Gallery
by Irena Drofova and Milan Adamek
Electronics 2024, 13(22), 4431; https://doi.org/10.3390/electronics13224431 - 12 Nov 2024
Viewed by 477
Abstract
This study deals with the color reproduction of a work of art to digitize it into a 3D realistic model. The experiment aims to digitize a work of art for application in a virtual reality environment concerning faithful color reproduction. Photogrammetry and scanning [...] Read more.
This study deals with the color reproduction of a work of art to digitize it into a 3D realistic model. The experiment aims to digitize a work of art for application in a virtual reality environment concerning faithful color reproduction. Photogrammetry and scanning with a LiDAR sensor are used to compare the methods and work with colors during the reconstruction of the 3D model. An innovative tablet with a camera and LiDAR sensor is used for both methods. At the same time, current findings from the field of color vision and colorimetry are applied to 3D reconstruction. The experiment focuses on working with the RGB and L*a*b* color models and, simultaneously, on the sRGB, CIE XYZ, and Rec.2020(HDR) color spaces for transforming colors into a virtual environment. For this purpose, the color is defined in the Hex Color Value format. This experiment is a starting point for further research on color reproduction in the digital environment. This study represents a partial contribution to the much-discussed area of forgeries of works of art in current trends in forensics and forgery. Full article
(This article belongs to the Section Electronic Multimedia)
Show Figures

Figure 1

20 pages, 3018 KiB  
Article
Global Semantic Localization from Abstract Ellipse-Ellipsoid Model and Object-Level Instance Topology
by Heng Wu, Yanjie Liu, Chao Wang and Yanlong Wei
Remote Sens. 2024, 16(22), 4187; https://doi.org/10.3390/rs16224187 - 10 Nov 2024
Viewed by 281
Abstract
Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based [...] Read more.
Robust and highly accurate localization using a camera is a challenging task when appearance varies significantly. In indoor environments, changes in illumination and object occlusion can have a significant impact on visual localization. In this paper, we propose a visual localization method based on an ellipse-ellipsoid model, combined with object-level instance topology and alignment. First, we develop a CNN-based (Convolutional Neural Network) ellipse prediction network, DEllipse-Net, which integrates depth information with RGB data to estimate the projection of ellipsoids onto images. Second, we model environments using 3D (Three-dimensional) ellipsoids, instance topology, and ellipsoid descriptors. Finally, the detected ellipses are aligned with the ellipsoids in the environment through semantic object association, and 6-DoF (Degree of Freedom) pose estimation is performed using the ellipse-ellipsoid model. In the bounding box noise experiment, DEllipse-Net demonstrates higher robustness compared to other methods, achieving the highest prediction accuracy for 11 out of 23 objects in ellipse prediction. In the localization test with 15 pixels of noise, we achieve ATE (Absolute Translation Error) and ARE (Absolute Rotation Error) of 0.077 m and 2.70 in the fr2_desk sequence. Additionally, DEllipse-Net is lightweight and highly portable, with a model size of only 18.6 MB, and a single model can handle all objects. In the object-level instance topology and alignment experiment, our topology and alignment methods significantly enhance the global localization accuracy of the ellipse-ellipsoid model. In experiments involving lighting changes and occlusions, our method achieves more robust global localization compared to the classical bag-of-words based localization method and other ellipse-ellipsoid localization methods. Full article
Show Figures

Figure 1

23 pages, 9555 KiB  
Article
Multi-View Fusion-Based Automated Full-Posture Cattle Body Size Measurement
by Zhihua Wu, Jikai Zhang, Jie Li and Wentao Zhao
Animals 2024, 14(22), 3190; https://doi.org/10.3390/ani14223190 - 7 Nov 2024
Viewed by 431
Abstract
Cattle farming is an important part of the global livestock industry, and cattle body size is the key indicator of livestock growth. However, traditional manual methods for measuring body sizes are not only time-consuming and labor-intensive but also incur significant costs. Meanwhile, automatic [...] Read more.
Cattle farming is an important part of the global livestock industry, and cattle body size is the key indicator of livestock growth. However, traditional manual methods for measuring body sizes are not only time-consuming and labor-intensive but also incur significant costs. Meanwhile, automatic measurement techniques are prone to being affected by environmental conditions and the standing postures of livestock. To overcome these challenges, this study proposes a multi-view fusion-driven automatic measurement system for full-attitude cattle body measurements. Outdoors in natural light, three Zed2 cameras were installed covering different views of the channel. Multiple images, including RGB images, depth images, and point clouds, were automatically acquired from multiple views using the YOLOv8n algorithm. The point clouds from different views undergo multiple denoising to become local point clouds of the cattle body. The local point clouds are coarsely and finely aligned to become a complete point cloud of the cattle body. After detecting the 2D key points on the RGB image created by the YOLOv8x-pose algorithm, the 2D key points are mapped onto the 3D cattle body by combining the internal parameters of the camera and the depth values of the corresponding pixels of the depth map. Based on the mapped 3D key points, the body sizes of cows in different poses are automatically measured, including height, length, abdominal circumference, and chest circumference. In addition, support vector machines and Bézier curves are employed to rectify the missing and deformed circumference body sizes caused by environmental effects. The automatic body measurement system measured the height, length, abdominal circumference, and chest circumference of 47 Huaxi Beef Cattle, a breed native to China, and compared the results with manual measurements. The average relative errors were 2.32%, 2.27%, 3.67%, and 5.22%, respectively, when compared with manual measurements, demonstrating the feasibility and accuracy of the system. Full article
(This article belongs to the Section Cattle)
Show Figures

Figure 1

26 pages, 33294 KiB  
Article
RGB-D Camera and Fractal-Geometry-Based Maximum Diameter Estimation Method of Apples for Robot Intelligent Selective Graded Harvesting
by Bin Yan and Xiameng Li
Fractal Fract. 2024, 8(11), 649; https://doi.org/10.3390/fractalfract8110649 - 7 Nov 2024
Viewed by 368
Abstract
Realizing the integration of intelligent fruit picking and grading for apple harvesting robots is an inevitable requirement for the future development of smart agriculture and precision agriculture. Therefore, an apple maximum diameter estimation model based on RGB-D camera fusion depth information was proposed [...] Read more.
Realizing the integration of intelligent fruit picking and grading for apple harvesting robots is an inevitable requirement for the future development of smart agriculture and precision agriculture. Therefore, an apple maximum diameter estimation model based on RGB-D camera fusion depth information was proposed in the study. Firstly, the maximum diameter parameters of Red Fuji apples were collected, and the results were statistically analyzed. Then, based on the Intel RealSense D435 RGB-D depth camera and LabelImg software, the depth information of apples and the two-dimensional size information of fruit images were obtained. Furthermore, the relationship between fruit depth information, two-dimensional size information of fruit images, and the maximum diameter of apples was explored. Based on Origin software, multiple regression analysis and nonlinear surface fitting were used to analyze the correlation between fruit depth, diagonal length of fruit bounding rectangle, and maximum diameter. A model for estimating the maximum diameter of apples was constructed. Finally, the constructed maximum diameter estimation model was experimentally validated and evaluated for imitation apples in the laboratory and fruits on the Red Fuji fruit trees in modern apple orchards. The experimental results showed that the average maximum relative error of the constructed model in the laboratory imitation apple validation set was ±4.1%, the correlation coefficient (R2) of the estimated model was 0.98613, and the root mean square error (RMSE) was 3.21 mm. The average maximum diameter estimation relative error on the modern orchard Red Fuji apple validation set was ±3.77%, the correlation coefficient (R2) of the estimation model was 0.84, and the root mean square error (RMSE) was 3.95 mm. The proposed model can provide theoretical basis and technical support for the selective apple-picking operation of intelligent robots based on apple size grading. Full article
Show Figures

Figure 1

12 pages, 3150 KiB  
Article
Continuous Growth Monitoring and Prediction with 1D Convolutional Neural Network Using Generated Data with Vision Transformer
by Woo-Joo Choi, Se-Hun Jang, Taewon Moon, Kyeong-Su Seo, Da-Seul Choi and Myung-Min Oh
Plants 2024, 13(21), 3110; https://doi.org/10.3390/plants13213110 - 4 Nov 2024
Viewed by 530
Abstract
Crop growth information is collected through destructive investigation, which inevitably causes discontinuity of the target. Real-time monitoring and estimation of the same target crops can lead to dynamic feedback control, considering immediate crop growth. Images are high-dimensional data containing crop growth and developmental [...] Read more.
Crop growth information is collected through destructive investigation, which inevitably causes discontinuity of the target. Real-time monitoring and estimation of the same target crops can lead to dynamic feedback control, considering immediate crop growth. Images are high-dimensional data containing crop growth and developmental stages and image collection is non-destructive. We propose a non-destructive growth prediction method that uses low-cost RGB images and computer vision. In this study, two methodologies were selected and verified: an image-to-growth model with crop images and a growth simulation model with estimated crop growth. The best models for each case were the vision transformer (ViT) and one-dimensional convolutional neural network (1D ConvNet). For shoot fresh weight, shoot dry weight, and leaf area of lettuce, ViT showed R2 values of 0.89, 0.93, and 0.78, respectively, whereas 1D ConvNet showed 0.96, 0.94, and 0.95, respectively. These accuracies indicated that RGB images and deep neural networks can non-destructively interpret the interaction between crops and the environment. Ultimately, growers can enhance resource use efficiency by adapting real-time monitoring and prediction to feedback environmental controls to yield high-quality crops. Full article
Show Figures

Figure 1

12 pages, 1275 KiB  
Article
A Simple and Green Analytical Alternative for Chloride Determination in High-Salt-Content Crude Oil: Combining Miniaturized Extraction with Portable Colorimetric Analysis
by Alice P. Holkem, Giuliano Agostini, Adilson B. Costa, Juliano S. Barin and Paola A. Mello
Processes 2024, 12(11), 2425; https://doi.org/10.3390/pr12112425 - 3 Nov 2024
Viewed by 867
Abstract
A simple and miniaturized protocol was developed for chloride extraction from Brazilian pre-salt crude oil for further salt determination by colorimetry. In this protocol, the colorimetric analysis of chloride using digital images was carried out in an aqueous phase obtained after a simple [...] Read more.
A simple and miniaturized protocol was developed for chloride extraction from Brazilian pre-salt crude oil for further salt determination by colorimetry. In this protocol, the colorimetric analysis of chloride using digital images was carried out in an aqueous phase obtained after a simple and miniaturized extraction carefully developed for this purpose. A portable device composed of a homemade 3D-printed chamber with a USB camera was used. The PhotoMetrix app converted the images into RGB histograms, and a partial least squares (PLS) model was obtained from chemometric treatment. The sample preparation was performed by extraction after defining the best conditions for the main parameters (e.g., extraction time, temperature, type and volume of solvent, and sample mass). The PLS model was evaluated considering the coefficient of determination (R2) and the root mean square errors (RMSEs)—calibration (RMSEC), cross-validation (RMSECV), and prediction (RMSEP). Under the optimized conditions, an extraction efficiency higher than 84% was achieved, and the limit of quantification was 1.6 mg g−1. The chloride content obtained in the pre-salt crude oils ranged from 3 to 15 mg g−1, and no differences (ANOVA, 95%) were observed between the results and the reference values by direct solid sampling elemental analysis (DSS-EA) or the ASTM D 6470 standard method. The easy-to-use colorimetric analysis combined with the extraction method’s simplicity offered a high-throughput, low-cost, and environmentally friendly method, with the possibility of portability. Additionally, the decrease in energy consumption and waste generation, increasing the sample throughput and operators’ safety, makes the proposed method a greener approach. Furthermore, the cost savings make this a suitable option for routine quality control, which can be attractive in the crude oil industry. Full article
Show Figures

Figure 1

15 pages, 2177 KiB  
Article
DDNet: Depth Dominant Network for Semantic Segmentation of RGB-D Images
by Peizhi Rong
Sensors 2024, 24(21), 6914; https://doi.org/10.3390/s24216914 - 28 Oct 2024
Viewed by 502
Abstract
Convolutional neural networks (CNNs) have been widely applied to parse indoor scenes and segment objects represented by color images. Nonetheless, the lack of geometric and context information is a problem for most RGB-based methods, with which depth features are only used as an [...] Read more.
Convolutional neural networks (CNNs) have been widely applied to parse indoor scenes and segment objects represented by color images. Nonetheless, the lack of geometric and context information is a problem for most RGB-based methods, with which depth features are only used as an auxiliary module in RGB-D semantic segmentation. In this study, a novel depth dominant network (DDNet) is proposed to fully utilize the rich context information in the depth map. The critical insight is that obvious geometric information from the depth image is more conducive to segmentation than RGB data. Compared with other methods, DDNet is a depth-based network with two branches of CNNs to extract color and depth features. As the core of the encoder network, the depth branch is given a larger fusion weight to extract geometric information, while semantic information and complementary geometric information are provided by the color branch for the depth feature maps. The effectiveness of our proposed depth-based architecture has been demonstrated by comprehensive experimental evaluations and ablation studies on challenging RGB-D semantic segmentation benchmarks, including NYUv2 and a subset of ScanNetv2. Full article
(This article belongs to the Special Issue Applied Robotics in Mechatronics and Automation)
Show Figures

Figure 1

21 pages, 1071 KiB  
Article
YOLO-I3D: Optimizing Inflated 3D Models for Real-Time Human Activity Recognition
by Ruikang Luo, Aman Anand, Farhana Zulkernine and Francois Rivest
J. Imaging 2024, 10(11), 269; https://doi.org/10.3390/jimaging10110269 - 24 Oct 2024
Viewed by 684
Abstract
Human Activity Recognition (HAR) plays a critical role in applications such as security surveillance and healthcare. However, existing methods, particularly two-stream models like Inflated 3D (I3D), face significant challenges in real-time applications due to their high computational demand, especially from the optical flow [...] Read more.
Human Activity Recognition (HAR) plays a critical role in applications such as security surveillance and healthcare. However, existing methods, particularly two-stream models like Inflated 3D (I3D), face significant challenges in real-time applications due to their high computational demand, especially from the optical flow branch. In this work, we address these limitations by proposing two major improvements. First, we introduce a lightweight motion information branch that replaces the computationally expensive optical flow component with a lower-resolution RGB input, significantly reducing computation time. Second, we incorporate YOLOv5, an efficient object detector, to further optimize the RGB branch for faster real-time performance. Experimental results on the Kinetics-400 dataset demonstrate that our proposed two-stream I3D Light model improves the original I3D model’s accuracy by 4.13% while reducing computational cost. Additionally, the integration of YOLOv5 into the I3D model enhances accuracy by 1.42%, providing a more efficient solution for real-time HAR tasks. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

25 pages, 8051 KiB  
Article
Dexterous Manipulation Based on Object Recognition and Accurate Pose Estimation Using RGB-D Data
by Udaka A. Manawadu and Naruse Keitaro
Sensors 2024, 24(21), 6823; https://doi.org/10.3390/s24216823 - 24 Oct 2024
Viewed by 616
Abstract
This study presents an integrated system for object recognition, six-degrees-of-freedom pose estimation, and dexterous manipulation using a JACO robotic arm with an Intel RealSense D435 camera. This system is designed to automate the manipulation of industrial valves by capturing point clouds (PCs) from [...] Read more.
This study presents an integrated system for object recognition, six-degrees-of-freedom pose estimation, and dexterous manipulation using a JACO robotic arm with an Intel RealSense D435 camera. This system is designed to automate the manipulation of industrial valves by capturing point clouds (PCs) from multiple perspectives to improve the accuracy of pose estimation. The object recognition module includes scene segmentation, geometric primitives recognition, model recognition, and a color-based clustering and integration approach enhanced by a dynamic cluster merging algorithm. Pose estimation is achieved using the random sample consensus algorithm, which predicts position and orientation. The system was tested within a 60° field of view, which extended in all directions in front of the object. The experimental results show that the system performs reliably within acceptable error thresholds for both position and orientation when the objects are within a ±15° range of the camera’s direct view. However, errors increased with more extreme object orientations and distances, particularly when estimating the orientation of ball valves. A zone-based dexterous manipulation strategy was developed to overcome these challenges, where the system adjusts the camera position for optimal conditions. This approach mitigates larger errors in difficult scenarios, enhancing overall system reliability. The key contributions of this research include a novel method for improving object recognition and pose estimation, a technique for increasing the accuracy of pose estimation, and the development of a robot motion model for dexterous manipulation in industrial settings. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

16 pages, 8982 KiB  
Article
A Two-Stream Method for Human Action Recognition Using Facial Action Cues
by Zhimao Lai, Yan Zhang and Xiubo Liang
Sensors 2024, 24(21), 6817; https://doi.org/10.3390/s24216817 - 23 Oct 2024
Viewed by 574
Abstract
Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. [...] Read more.
Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. In such situations, the face often remains visible, providing valuable cues for action recognition. This paper introduces Face in Action (FIA), a novel two-stream method that leverages facial action cues for robust action recognition under conditions of significant occlusion. FIA consists of an RGB stream and a landmark stream. The RGB stream processes facial image sequences using a fine-spatio-multitemporal (FSM) 3D convolution module, which employs smaller spatial receptive fields to capture detailed local facial movements and larger temporal receptive fields to model broader temporal dynamics. The landmark stream processes facial landmark sequences using a normalized temporal attention (NTA) module within an NTA-GCN block, enhancing the detection of key facial frames and improving overall recognition accuracy. We validate the effectiveness of FIA using the NTU RGB+D and NTU RGB+D 120 datasets, focusing on action categories related to medical conditions. Our experiments demonstrate that FIA significantly outperforms existing methods in scenarios with extensive occlusion, highlighting its potential for practical applications in surveillance and healthcare settings. Full article
Show Figures

Figure 1

17 pages, 4394 KiB  
Article
Real-Time Semantic Segmentation of 3D LiDAR Point Clouds for Aircraft Engine Detection in Autonomous Jetbridge Operations
by Ihnsik Weon, Soongeul Lee and Juhan Yoo
Appl. Sci. 2024, 14(21), 9685; https://doi.org/10.3390/app14219685 - 23 Oct 2024
Viewed by 600
Abstract
This paper presents a study on aircraft engine identification using real-time 3D LiDAR point cloud segmentation technology, a key element for the development of automated docking systems in airport boarding facilities, known as jetbridges. To achieve this, 3D LiDAR sensors utilizing a spinning [...] Read more.
This paper presents a study on aircraft engine identification using real-time 3D LiDAR point cloud segmentation technology, a key element for the development of automated docking systems in airport boarding facilities, known as jetbridges. To achieve this, 3D LiDAR sensors utilizing a spinning method were employed to gather surrounding environmental 3D point cloud data. The raw 3D environmental data were then filtered using the 3D RANSAC technique, excluding ground data and irrelevant apron areas. Segmentation was subsequently conducted based on the filtered data, focusing on aircraft sections. For the segmented aircraft engine parts, the centroid of the grouped data was computed to determine the 3D position of the aircraft engine. Additionally, PointNet was applied to identify aircraft engines from the segmented data. Dynamic tests were conducted in various weather and environmental conditions, evaluating the detection performance across different jetbridge movement speeds and object-to-object distances. The study achieved a mean intersection over union (mIoU) of 81.25% in detecting aircraft engines, despite experiencing challenging conditions such as low-frequency vibrations and changes in the field of view during jetbridge maneuvers. This research provides a strong foundation for enhancing the robustness of jetbridge autonomous docking systems by reducing the sensor noise and distortion in real-time applications. Our future research will focus on optimizing sensor configurations, especially in environments where sea fog, snow, and rain are frequent, by combining RGB image data with 3D LiDAR information. The ultimate goal is to further improve the system’s reliability and efficiency, not only in jetbridge operations but also in broader autonomous vehicle and robotics applications, where precision and reliability are critical. The methodologies and findings of this study hold the potential to significantly advance the development of autonomous technologies across various industrial sectors. Full article
(This article belongs to the Section Mechanical Engineering)
Show Figures

Figure 1

18 pages, 7770 KiB  
Article
Vision-Based Localization Method for Picking Points in Tea-Harvesting Robots
by Jingwen Yang, Xin Li, Xin Wang, Leiyang Fu and Shaowen Li
Sensors 2024, 24(21), 6777; https://doi.org/10.3390/s24216777 - 22 Oct 2024
Viewed by 592
Abstract
To address the issue of accurately recognizing and locating picking points for tea-picking robots in unstructured environments, a visual positioning method based on RGB-D information fusion is proposed. First, an improved T-YOLOv8n model is proposed, which improves detection and segmentation performance across multi-scale [...] Read more.
To address the issue of accurately recognizing and locating picking points for tea-picking robots in unstructured environments, a visual positioning method based on RGB-D information fusion is proposed. First, an improved T-YOLOv8n model is proposed, which improves detection and segmentation performance across multi-scale scenes through network architecture and loss function optimizations. In the far-view test set, the detection accuracy of tea buds reached 80.8%; for the near-view test set, the mAP0.5 values for tea stem detection in bounding boxes and masks reached 93.6% and 93.7%, respectively, showing improvements of 9.1% and 14.1% over the baseline model. Secondly, a layered visual servoing strategy for near and far views was designed, integrating the RealSense depth sensor with robotic arm cooperation. This strategy identifies the region of interest (ROI) of the tea bud in the far view and fuses the stem mask information with depth data to calculate the three-dimensional coordinates of the picking point. The experiments show that this method achieved a picking point localization success rate of 86.4%, with a mean depth measurement error of 1.43 mm. The proposed method improves the accuracy of picking point recognition and reduces depth information fluctuations, providing technical support for the intelligent and rapid picking of premium tea. Full article
Show Figures

Figure 1

Back to TopTop