Detecting Deception Using Natural Language Processing and Machine Learning in Datasets on COVID-19 and Climate Change

Barbara Brzic

Ivica Boticki and Marina Bagic Babac

Resumen

Deception in computer-mediated communication represents a threat, and there is a growing need to develop efficient methods of detecting it. Machine learning models have, through natural language processing, proven to be extremely successful at detecting lexical patterns related to deception. In this study, four selected machine learning models are trained and tested on data collected through a crowdsourcing platform on the topics of COVID-19 and climate change. The performance of the models was tested by analyzing n-grams (from unigrams to trigrams) and by using psycho-linguistic analysis. A selection of important features was carried out and further deepened with additional testing of the models on different subsets of the obtained features. This study concludes that the subjectivity of the collected data greatly affects the detection of hidden linguistic features of deception. The psycho-linguistic analysis alone and in combination with n-grams achieves better classification results than an n-gram analysis while testing the models on own data, but also while examining the possibility of generalization, especially on trigrams where the combined approach achieves a notably higher accuracy of up to 16%. The n-gram analysis proved to be a more robust method during the testing of the mutual applicability of the models while psycho-linguistic analysis remained most inflexible.

Palabras claves

deception detection - machine learning - natural language processing

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 16 Parte: 5 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Information
Applied Sciences

DOI