Deep Neural Networks Training by Stochastic Quasi-Newton Trust-Region Methods

Mahsa Yousefi and �ngeles Mart�nez

Resumen

While first-order methods are popular for solving optimization problems arising in deep learning, they come with some acute deficiencies. To overcome these shortcomings, there has been recent interest in introducing second-order information through quasi-Newton methods that are able to construct Hessian approximations using only gradient information. In this work, we study the performance of stochastic quasi-Newton algorithms for training deep neural networks. We consider two well-known quasi-Newton updates, the limited-memory Broyden?Fletcher?Goldfarb?Shanno (BFGS) and the symmetric rank one (SR1). This study fills a gap concerning the real performance of both updates in the minibatch setting and analyzes whether more efficient training can be obtained when using the more robust BFGS update or the cheaper SR1 formula, which?allowing for indefinite Hessian approximations?can potentially help to better navigate the pathological saddle points present in the non-convex loss functions found in deep learning. We present and discuss the results of an extensive experimental study that includes many aspects affecting performance, like batch normalization, the network architecture, the limited memory parameter or the batch size. Our results show that stochastic quasi-Newton algorithms are efficient and, in some instances, able to outperform the well-known first-order Adam optimizer, run with the optimal combination of its numerous hyperparameters, and the stochastic second-order trust-region STORM algorithm.

Palabras claves

stochastic optimization - quasi-Newton methods - trust-region methods - BFGS - SR1 - deep neural networks training MSC: 90C30 - 90C06 - 90C53 - 90C90 - 65K05

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 16 Parte: 10 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Algorithms
Applied Sciences
Information

DOI

https://doi.org/10.3390/a16100490

Art�culos similares

Optimizing Speech Emotion Recognition with Deep Learning and Grey Wolf Optimization: A Multi-Dataset Approach

Acceso

Suryakant Tyagi and S�ndor Sz�n�si

Machine learning and speech emotion recognition are rapidly evolving fields, significantly impacting human-centered computing. Machine learning enables computers to learn from data and make predictions, while speech emotion recognition allows computers t... ver m�s

Revista: Algorithms

Argo Buoy Trajectory Prediction: Multi-Scale Ocean Driving Factors and Time?Space Attention Mechanism

Acceso

Pengfei Ning, Dianjun Zhang, Xuefeng Zhang, Jianhui Zhang, Yulong Liu, Xiaoyi Jiang and Yansheng Zhang

The Array for Real-time Geostrophic Oceanography (Argo) program provides valuable data for maritime research and rescue operations. This paper is based on Argo historical and satellite observations, and inverted sea surface and submarine drift trajectori... ver m�s

Revista: Journal of Marine Science and Engineering

StereoYOLO: A Stereo Vision-Based Method for Maritime Object Recognition and Localization

Acceso

Yifan Shang, Wanneng Yu, Guangmiao Zeng, Huihui Li and Yuegao Wu

Image recognition is vital for intelligent ships? autonomous navigation. However, traditional methods often fail to accurately identify maritime objects? spatial positions, especially under electromagnetic silence. We introduce the StereoYOLO method, an ... ver m�s

Revista: Journal of Marine Science and Engineering

Robust Underwater Acoustic Channel Estimation Method Based on Bias-Free Convolutional Neural Network

Acceso

Diya Wang, Yonglin Zhang, Lixin Wu, Yupeng Tai, Haibin Wang, Jun Wang, Fabrice Meriaudeau and Fan Yang

In recent years, the study of deep learning techniques for underwater acoustic channel estimation has gained widespread attention. However, existing neural network channel estimation methods often overfit to training dataset noise levels, leading to dimi... ver m�s

Revista: Journal of Marine Science and Engineering

nmODE-Unet: A Novel Network for Semantic Segmentation of Medical Images

Acceso

Shubin Wang, Yuanyuan Chen and Zhang Yi

Diabetic retinopathy is a prevalent eye disease that poses a potential risk of blindness. Nevertheless, due to the small size of diabetic retinopathy lesions and the high interclass similarity in terms of location, color, and shape among different lesion... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles