Inicio  /  Applied Sciences  /  Vol: 9 Par: 11 (2019)  /  Artículo
ARTÍCULO
TITULO

Deep Convolutional Neural Network with Structured Prediction for Weakly Supervised Audio Event Detection

Inkyu Choi    
Soo Hyun Bae and Nam Soo Kim    

Resumen

Audio event detection (AED) is a task of recognizing the types of audio events in an audio stream and estimating their temporal positions. AED is typically based on fully supervised approaches, requiring strong labels including both the presence and temporal position of each audio event. However, fully supervised datasets are not easily available due to the heavy cost of human annotation. Recently, weakly supervised approaches for AED have been proposed, utilizing large scale datasets with weak labels including only the occurrence of events in recordings. In this work, we introduce a deep convolutional neural network (CNN) model called DSNet based on densely connected convolution networks (DenseNets) and squeeze-and-excitation networks (SENets) for weakly supervised training of AED. DSNet alleviates the vanishing-gradient problem and strengthens feature propagation and models interdependencies between channels. We also propose a structured prediction method for weakly supervised AED. We apply a recurrent neural network (RNN) based framework and a prediction smoothness cost function to consider long-term contextual information with reduced error propagation. In post-processing, conditional random fields (CRFs) are applied to take into account the dependency between segments and delineate the borders of audio events precisely. We evaluated our proposed models on the DCASE 2017 task 4 dataset and obtained state-of-the-art results on both audio tagging and event detection tasks.

 Artículos similares

       
 
Ku Muhammad Naim Ku Khalif, Woo Chaw Seng, Alexander Gegov, Ahmad Syafadhli Abu Bakar and Nur Adibah Shahrul    
Convolutional Neural Networks (CNNs) have garnered significant utilisation within automated image classification systems. CNNs possess the ability to leverage the spatial and temporal correlations inherent in a dataset. This study delves into the use of ... ver más
Revista: Information

 
Salman Ibne Eunus, Shahriar Hossain, A. E. M. Ridwan, Ashik Adnan, Md. Saiful Islam, Dewan Ziaul Karim, Golam Rabiul Alam and Jia Uddin    
Accidents due to defective railway lines and derailments are common disasters that are observed frequently in Southeast Asian countries. It is imperative to run proper diagnosis over the detection of such faults to prevent such accidents. However, manual... ver más
Revista: AI

 
Moiz Hassan, Kandasamy Illanko and Xavier N. Fernando    
Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/sate... ver más
Revista: AI

 
Ilia Zaznov, Julian Martin Kunkel, Atta Badii and Alfonso Dufour    
This paper introduces a novel deep learning approach for intraday stock price direction prediction, motivated by the need for more accurate models to enable profitable algorithmic trading. The key problems addressed are effectively modelling complex limi... ver más
Revista: Applied Sciences

 
Thomas Kopalidis, Vassilios Solachidis, Nicholas Vretos and Petros Daras    
Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person?s emotional state in an image or a video. This process, called ?Facial Expression Recognition (FER)?, has become one of the most ... ver más
Revista: Information