Inicio  /  Applied Sciences  /  Vol: 12 Par: 17 (2022)  /  Artículo
ARTÍCULO
TITULO

An Unsupervised Depth-Estimation Model for Monocular Images Based on Perceptual Image Error Assessment

Hyeseung Park and Seungchul Park    

Resumen

In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth ground-truth data. Our model basically outputs a disparity map that is necessary to warp an input image to an image corresponding to a different viewpoint. When the input image is warped using the output-disparity map, distortions of various patterns inevitably occur in the reconstructed image. During the training process, the occurrence frequency and size of these distortions gradually decrease, while the similarity between the reconstructed and target images increases, which proves that the accuracy of the predicted disparity maps also increases. Therefore, one of the important factors in this type of training is an efficient loss function that accurately measures how much the difference in quality between the reconstructed and target images is and guides the gap to be properly and quickly closed as the training progresses. In recent related studies, the photometric difference was calculated through simple methods such as L1 and L2 loss or by combining one of these with a traditional computer vision-based hand-coded image-quality assessment algorithm such as SSIM. However, these methods have limitations in modeling various patterns at the level of the human visual system. Therefore, the proposed model uses a pre-trained perceptual image-quality assessment model that effectively mimics human-perception mechanisms to measure the quality of distorted images as image-reconstruction loss. In order to highlight the performance of the proposed loss functions, a simple ResNet50-based network is adopted in our model. We trained our model using stereo images of the KITTI 2015 driving dataset to measure the pixel-level depth for 768 × 384 images. Despite the simplicity of the network structure, thanks to the effectiveness of the proposed image-reconstruction loss, our model outperformed other state-of-the-art studies that have been trained in unsupervised methods on a variety of evaluation indicators.

 Artículos similares

       
 
Xuanyuan Xie and Jieyu Zhao    
The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffu... ver más
Revista: Algorithms

 
Sarfaraz Natha, Umme Laila, Ibrahim Ahmed Gashim, Khalid Mahboob, Muhammad Noman Saeed and Khaled Mohammed Noaman    
Brain tumors (BT) represent a severe and potentially life-threatening cancer. Failing to promptly diagnose these tumors can significantly shorten a person?s life. Therefore, early and accurate detection of brain tumors is essential, allowing for appropri... ver más
Revista: Applied Sciences

 
Rong Wang, Xinyang Zhou, Yi Liu, Dongqi Liu, Yu Lu and Miao Su    
To ensure the safety and durability of concrete structures, timely detection and classification of concrete cracks using a low-cost and high-efficiency method is necessary. In this study, a concrete surface crack damage detection method based on the ResN... ver más
Revista: Applied Sciences

 
Yongbo Liu, Peng He, Yan Cao, Conghua Zhu and Shitao Ding    
A critical precondition for realizing mechanized transplantation in rice cultivation is the implementation of seedling tray techniques. To augment the efficacy of seeding, a precise evaluation of the quality of rice seedling cultivation in these trays is... ver más
Revista: Applied Sciences

 
Qiyan Li, Zhi Weng, Zhiqiang Zheng and Lixin Wang    
The decrease in lake area has garnered significant attention within the global ecological community, prompting extensive research in remote sensing and computer vision to accurately segment lake areas from satellite images. However, existing image segmen... ver más
Revista: Applied Sciences