Inicio  /  Applied Sciences  /  Vol: 13 Par: 19 (2023)  /  Artículo
ARTÍCULO
TITULO

The Impact of Data Pre-Processing on Hate Speech Detection in a Mix of English and Hindi?English (Code-Mixed) Tweets

Khalil Al-Hussaeni    
Mohamed Sameer and Ioannis Karamitsos    

Resumen

Due to the increasing reliance on social network platforms in recent years, hate speech has risen significantly among online users. Government and social media platforms face the challenging responsibility of controlling, detecting, and removing massively growing hateful content as early as possible to prevent future criminal acts, such as cyberviolence and real-life hate crimes. Twitter is used globally by people from various backgrounds and nationalities; it contains tweets posted in different languages, including code-mixed language, such as Hindi?English. Due to the informal format of tweets with variations in spelling and grammar, hate speech detection is especially challenging in code-mixed text. In this paper, we tackle the critical issue of hate speech detection on social media, with a focus on a mix of English and Hindi?English (code-mixed) text messages on Twitter. More specifically, we aim to evaluate the impact of data pre-processing on hate speech detection. Our method first performs 10-step data cleansing; then, it builds a detection method based on two architectures, namely a convolutional neural network (CNN) and a combination of CNN and long short-term Memory (LSTM) algorithms. We tune the hyperparameters of the proposed model architectures and conduct extensive experimental analysis on real-life tweets to evaluate the performance of the models in terms of accuracy, efficiency, and scalability. Moreover, we compare our method with a closely related hate speech detection method from the literature. The experimental results suggest that our method results in an improved accuracy and a significantly improved runtime. Among our best-performing models, CNN-LSTM improved accuracy by nearly 2% and decreased the runtime by almost half.

 Artículos similares

       
 
Youcun Liu, Yan Liu, Ming Chen, David Labat, Yongtao Li, Xiaohui Bian and Qianqian Ding    
This paper has adopted related meteorological data collected by 69 meteorological stations between 1951 and 2013 to analyze changes and drivers of reference evapotranspiration (ET0) in the hilly regions located in southern China. Results show that: (1) E... ver más
Revista: Water

 
Daniel Althoff, Lineu Neiva Rodrigues and Demetrius David da Silva    
Small reservoirs play a key role in the Brazilian savannah (Cerrado), making irrigation feasible and contributing to the economic development and social well-being of the population. A lack of information on factors, such as evaporative water loss, has a... ver más
Revista: Water

 
I. Oktaviani, M. Asril, Y. Aryanti, S. S. Leksikowati     Pág. 47 - 52
The conversion of agricultural land and plantation into an area with high human activity can affect the biodiversity contained in it. The biodiversity of a region can be surveyed and collect in a systematic database to know the wealth of flora and fauna ... ver más

 
Jin Pan, Yong Wang, Tao Wang and Mingcai Xu    
With the development of bridge crossings over rivers, the accident of the vessel?bridge collision is increasing as well. It is important to assess probability of bridges colliding with passing ships. Firstly, the AIS (Automatic identify system) data was ... ver más

 
Hoseon Kim, Jieun Ko, Aram Jung and Seoungbum Kim    
A connected vehicle (CV) enables vehicles to communicate not only with other vehicles but also the road infrastructure based on wireless communication technologies. A road system with CVs, which is often referred to as a cooperative intelligent transport... ver más
Revista: Applied Sciences