Inicio  /  Applied Sciences  /  Vol: 10 Par: 23 (2020)  /  Artículo
ARTÍCULO
TITULO

Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification

Viera Maslej-Kre?náková    
Martin Sarnovský    
Peter Butka and Kristína Machová    

Resumen

The emergence of anti-social behaviour in online environments presents a serious issue in today?s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of anti-social behaviour from the pieces of text. In this work, we present a comparison of various deep learning models used to identify the toxic comments in the Internet discussions. Our main goal was to explore the effect of the data preparation on the model performance. As we worked with the assumption that the use of traditional pre-processing methods may lead to the loss of characteristic traits, specific for toxic content, we compared several popular deep learning and transformer language models. We aimed to analyze the influence of different pre-processing techniques and text representations including standard TF-IDF, pre-trained word embeddings and also explored currently popular transformer models. Experiments were performed on the dataset from the Kaggle Toxic Comment Classification competition, and the best performing model was compared with the similar approaches using standard metrics used in data analysis.

 Artículos similares

       
 
Zahid Masood, Muhammad Usama, Shahroz Khan, Konstantinos Kostas and Panagiotis D. Kaklis    
Generative models offer design diversity but tend to be computationally expensive, while non-generative models are computationally cost-effective but produce less diverse and often invalid designs. However, the limitations of non-generative models can be... ver más

 
Yong Liu, Xiaohui Yan, Wenying Du, Tianqi Zhang, Xiaopeng Bai and Ruichuan Nan    
The current work proposes a novel super-resolution convolutional transposed network (SRCTN) deep learning architecture for downscaling daily climatic variables. The algorithm was established based on a super-resolution convolutional neural network with t... ver más
Revista: Water

 
Thomas Kopalidis, Vassilios Solachidis, Nicholas Vretos and Petros Daras    
Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person?s emotional state in an image or a video. This process, called ?Facial Expression Recognition (FER)?, has become one of the most ... ver más
Revista: Information

 
Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov    
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow... ver más
Revista: Information

 
Ligang Yuan, Jing Liu, Haiyan Chen, Daoming Fang and Wenlu Chen    
Scene taxiing time is an important indicator for assessing the operational efficiency of airports as well as green airports, and it is also a fundamental parameter in flight regularity statistics. The accurate prediction of taxiing time can help decision... ver más
Revista: Aerospace