REVISTA
Applied Sciences

TODAS

Inicio / Applied Sciences / Vol: 10 Par: 23 (2020) / Art�culo

ART�CULO

TITULO

Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification

Viera Maslej-Kre?n�kov�

Martin Sarnovsk�

Peter Butka and Krist�na Machov�

Resumen

The emergence of anti-social behaviour in online environments presents a serious issue in today?s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of anti-social behaviour from the pieces of text. In this work, we present a comparison of various deep learning models used to identify the toxic comments in the Internet discussions. Our main goal was to explore the effect of the data preparation on the model performance. As we worked with the assumption that the use of traditional pre-processing methods may lead to the loss of characteristic traits, specific for toxic content, we compared several popular deep learning and transformer language models. We aimed to analyze the influence of different pre-processing techniques and text representations including standard TF-IDF, pre-trained word embeddings and also explored currently popular transformer models. Experiments were performed on the dataset from the Kaggle Toxic Comment Classification competition, and the best performing model was compared with the similar approaches using standard metrics used in data analysis.

Palabras claves

natural language processing - toxic comments - classification - deep learning - neural networks

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 10 Parte: 23 (2020)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Water
Information
Applied Sciences

DOI

https://doi.org/10.3390/app10238631

Art�culos similares

Generative vs. Non-Generative Models in Engineering Shape Optimization

Acceso

Zahid Masood, Muhammad Usama, Shahroz Khan, Konstantinos Kostas and Panagiotis D. Kaklis

Generative models offer design diversity but tend to be computationally expensive, while non-generative models are computationally cost-effective but produce less diverse and often invalid designs. However, the limitations of non-generative models can be... ver m�s

Revista: Journal of Marine Science and Engineering

Downscaling Daily Reference Evapotranspiration Using a Super-Resolution Convolutional Transposed Network

Acceso

Yong Liu, Xiaohui Yan, Wenying Du, Tianqi Zhang, Xiaopeng Bai and Ruichuan Nan

The current work proposes a novel super-resolution convolutional transposed network (SRCTN) deep learning architecture for downscaling daily climatic variables. The algorithm was established based on a super-resolution convolutional neural network with t... ver m�s

Revista: Water

Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets

Acceso

Thomas Kopalidis, Vassilios Solachidis, Nicholas Vretos and Petros Daras

Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person?s emotional state in an image or a video. This process, called ?Facial Expression Recognition (FER)?, has become one of the most ... ver m�s

Revista: Information

Comparative Analysis of NLP-Based Models for Company Classification

Acceso

Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov

The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow... ver m�s

Revista: Information

A CNN-GRU Hybrid Model for Predicting Airport Departure Taxiing Time

Acceso

Ligang Yuan, Jing Liu, Haiyan Chen, Daoming Fang and Wenlu Chen

Scene taxiing time is an important indicator for assessing the operational efficiency of airports as well as green airports, and it is also a fundamental parameter in flight regularity statistics. The accurate prediction of taxiing time can help decision... ver m�s

Revista: Aerospace

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles