An Unsupervised Depth-Estimation Model for Monocular Images Based on Perceptual Image Error Assessment

Hyeseung Park and Seungchul Park

Resumen

In this paper, we propose a novel unsupervised learning-based model for estimating the depth of monocular images by integrating a simple ResNet-based auto-encoder and some special loss functions. We use only stereo images obtained from binocular cameras as training data without using depth ground-truth data. Our model basically outputs a disparity map that is necessary to warp an input image to an image corresponding to a different viewpoint. When the input image is warped using the output-disparity map, distortions of various patterns inevitably occur in the reconstructed image. During the training process, the occurrence frequency and size of these distortions gradually decrease, while the similarity between the reconstructed and target images increases, which proves that the accuracy of the predicted disparity maps also increases. Therefore, one of the important factors in this type of training is an efficient loss function that accurately measures how much the difference in quality between the reconstructed and target images is and guides the gap to be properly and quickly closed as the training progresses. In recent related studies, the photometric difference was calculated through simple methods such as L1 and L2 loss or by combining one of these with a traditional computer vision-based hand-coded image-quality assessment algorithm such as SSIM. However, these methods have limitations in modeling various patterns at the level of the human visual system. Therefore, the proposed model uses a pre-trained perceptual image-quality assessment model that effectively mimics human-perception mechanisms to measure the quality of distorted images as image-reconstruction loss. In order to highlight the performance of the proposed loss functions, a simple ResNet50-based network is adopted in our model. We trained our model using stereo images of the KITTI 2015 driving dataset to measure the pixel-level depth for 768 � 384 images. Despite the simplicity of the network structure, thanks to the effectiveness of the proposed image-reconstruction loss, our model outperformed other state-of-the-art studies that have been trained in unsupervised methods on a variety of evaluation indicators.

Palabras claves

monocular depth estimation - perceptual image-quality assessment - PieAPP - KITTI

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 12 Parte: 17 (2022)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Water
Journal of Science and Applicative Technology
Acta Scientiarum: Technology

DOI

https://doi.org/10.3390/app12178829

Art�culos similares

Identification of the Surface Cracks of Concrete Based on ResNet-18 Depth Residual Network

Acceso

Rong Wang, Xinyang Zhou, Yi Liu, Dongqi Liu, Yu Lu and Miao Su

To ensure the safety and durability of concrete structures, timely detection and classification of concrete cracks using a low-cost and high-efficiency method is necessary. In this study, a concrete surface crack damage detection method based on the ResN... ver m�s

Revista: Applied Sciences

Evaluation Model of Rice Seedling Production Line Seeding Quality Based on Deep Learning

Acceso

Yongbo Liu, Peng He, Yan Cao, Conghua Zhu and Shitao Ding

A critical precondition for realizing mechanized transplantation in rice cultivation is the implementation of seedling tray techniques. To augment the efficacy of seeding, a precise evaluation of the quality of rice seedling cultivation in these trays is... ver m�s

Revista: Applied Sciences

Remote Sensing Image Segmentation for Aircraft Recognition Using U-Net as Deep Learning Architecture

Acceso

Fadi Shaar, Arif Yilmaz, Ahmet Ercan Topcu and Yehia Ibrahim Alzoubi

Recognizing aircraft automatically by using satellite images has different applications in both the civil and military sectors. However, due to the complexity and variety of the foreground and background of the analyzed images, it remains challenging to ... ver m�s

Revista: Applied Sciences

Automated Brain Tumor Identification in Biomedical Radiology Images: A Multi-Model Ensemble Deep Learning Approach

Acceso

Sarfaraz Natha, Umme Laila, Ibrahim Ahmed Gashim, Khalid Mahboob, Muhammad Noman Saeed and Khaled Mohammed Noaman

Brain tumors (BT) represent a severe and potentially life-threatening cancer. Failing to promptly diagnose these tumors can significantly shorten a person?s life. Therefore, early and accurate detection of brain tumors is essential, allowing for appropri... ver m�s

Revista: Applied Sciences

An Iris Image Super-Resolution Model Based on Swin Transformer and Generative Adversarial Network

Acceso

Hexin Lu, Xiaodong Zhu, Jingwei Cui and Haifeng Jiang

The process of iris recognition can result in a decline in recognition performance when the resolution of the iris images is insufficient. In this study, a super-resolution model for iris images, namely SwinGIris, which combines the Swin Transformer and ... ver m�s

Revista: Algorithms

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles