Inicio  /  Applied Sciences  /  Vol: 12 Par: 17 (2022)  /  Artículo
ARTÍCULO
TITULO

An Unsupervised Depth-Estimation Model for Monocular Images Based on Perceptual Image Error Assessment

Hyeseung Park and Seungchul Park    

Resumen

In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth ground-truth data. Our model basically outputs a disparity map that is necessary to warp an input image to an image corresponding to a different viewpoint. When the input image is warped using the output-disparity map, distortions of various patterns inevitably occur in the reconstructed image. During the training process, the occurrence frequency and size of these distortions gradually decrease, while the similarity between the reconstructed and target images increases, which proves that the accuracy of the predicted disparity maps also increases. Therefore, one of the important factors in this type of training is an efficient loss function that accurately measures how much the difference in quality between the reconstructed and target images is and guides the gap to be properly and quickly closed as the training progresses. In recent related studies, the photometric difference was calculated through simple methods such as L1 and L2 loss or by combining one of these with a traditional computer vision-based hand-coded image-quality assessment algorithm such as SSIM. However, these methods have limitations in modeling various patterns at the level of the human visual system. Therefore, the proposed model uses a pre-trained perceptual image-quality assessment model that effectively mimics human-perception mechanisms to measure the quality of distorted images as image-reconstruction loss. In order to highlight the performance of the proposed loss functions, a simple ResNet50-based network is adopted in our model. We trained our model using stereo images of the KITTI 2015 driving dataset to measure the pixel-level depth for 768 × 384 images. Despite the simplicity of the network structure, thanks to the effectiveness of the proposed image-reconstruction loss, our model outperformed other state-of-the-art studies that have been trained in unsupervised methods on a variety of evaluation indicators.

 Artículos similares

       
 
Rong Wang, Xinyang Zhou, Yi Liu, Dongqi Liu, Yu Lu and Miao Su    
To ensure the safety and durability of concrete structures, timely detection and classification of concrete cracks using a low-cost and high-efficiency method is necessary. In this study, a concrete surface crack damage detection method based on the ResN... ver más
Revista: Applied Sciences

 
Yongbo Liu, Peng He, Yan Cao, Conghua Zhu and Shitao Ding    
A critical precondition for realizing mechanized transplantation in rice cultivation is the implementation of seedling tray techniques. To augment the efficacy of seeding, a precise evaluation of the quality of rice seedling cultivation in these trays is... ver más
Revista: Applied Sciences

 
Fadi Shaar, Arif Yilmaz, Ahmet Ercan Topcu and Yehia Ibrahim Alzoubi    
Recognizing aircraft automatically by using satellite images has different applications in both the civil and military sectors. However, due to the complexity and variety of the foreground and background of the analyzed images, it remains challenging to ... ver más
Revista: Applied Sciences

 
Sarfaraz Natha, Umme Laila, Ibrahim Ahmed Gashim, Khalid Mahboob, Muhammad Noman Saeed and Khaled Mohammed Noaman    
Brain tumors (BT) represent a severe and potentially life-threatening cancer. Failing to promptly diagnose these tumors can significantly shorten a person?s life. Therefore, early and accurate detection of brain tumors is essential, allowing for appropri... ver más
Revista: Applied Sciences

 
Hexin Lu, Xiaodong Zhu, Jingwei Cui and Haifeng Jiang    
The process of iris recognition can result in a decline in recognition performance when the resolution of the iris images is insufficient. In this study, a super-resolution model for iris images, namely SwinGIris, which combines the Swin Transformer and ... ver más
Revista: Algorithms