Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese

Luana Balador Belis�rio

Luiz Gabriel Ferreira and Thiago Alexandre Salgueiro Pardo

Resumen

Texts published on social media have been a valuable source of information for companies and users, as the analysis of this data helps improving/selecting products and services of interest. Due to the huge amount of data, techniques for automatically analyzing user opinions are necessary. The research field that investigates these techniques is called sentiment analysis. This paper focuses specifically on the task of subjectivity classification, which aims to predict whether a text passage conveys an opinion. We report the study and comparison of machine learning methods of different paradigms to perform subjectivity classification of book review sentences in Portuguese, which have shown to be a challenging domain in the area. Specifically, we explore richer features for the task, using several lexical, centrality-based and discourse features. We show the contributions of the different feature sets and evidence that the combination of lexical, centrality-based and discourse features produce better results than any of the feature sets individually. Additionally, by analyzing the achieved results and the acquired knowledge by some symbolic machine learning methods, we show that some discourse relations may clearly signal subjectivity. Our corpus annotation also reveals some distinctive discourse structuring patterns for sentence subjectivity.

Palabras claves

subjectivity classification - feature sets - discourse structure - Portuguese language

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 11 Parte: 9 (2020)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

DOI

https://doi.org/10.3390/info11090437

Evaluating Richer Features and Varied Machine Learning Models for Subjectivity Classification of Book Review Sentences in Portuguese

Revistas destacadas