REVISTA
Algorithms

TODAS

Redirigiendo al acceso original de articulo en 24 segundos...

Inicio / Algorithms / Vol: 15 Par: 8 (2022) / Art�culo

ART�CULO

TITULO

Short Text Classification with Tolerance-Based Soft Computing Method

Vrushang Patel

Sheela Ramanna

Ketan Kotecha and Rahee Walambe

Resumen

Text classification aims to assign labels to textual units such as documents, sentences and paragraphs. Some applications of text classification include sentiment classification and news categorization. In this paper, we present a soft computing technique-based algorithm (TSC) to classify sentiment polarities of tweets as well as news categories from text. The TSC algorithm is a supervised learning method based on tolerance near sets. Near sets theory is a more recent soft computing methodology inspired by rough sets where instead of set approximation operators used by rough sets to induce tolerance classes, the tolerance classes are directly induced from the feature vectors using a tolerance level parameter and a distance function. The proposed TSC algorithm takes advantage of the recent advances in efficient feature extraction and vector generation from pre-trained bidirectional transformer encoders for creating tolerance classes. Experiments were performed on ten well-researched datasets which include both short and long text. Both pre-trained SBERT and TF-IDF vectors were used in the experimental analysis. Results from transformer-based vectors demonstrate that TSC outperforms five well-known machine learning algorithms on four datasets, and it is comparable with all other datasets based on the weighted F1, Precision and Recall scores. The highest AUC-ROC (Area under the Receiver Operating Characteristics) score was obtained in two datasets and comparable in six other datasets. The highest ROC-PRC (Area under the Precision?Recall Curve) score was obtained in one dataset and comparable in four other datasets. Additionally, significant differences were observed in most comparisons when examining the statistical difference between the weighted F1-score of TSC and other classifiers using a Wilcoxon signed-ranks test.

Palabras claves

sentiment classification - machine learning - tolerance near sets - transformer - news classification - Natural Language Processing

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 15 Parte: 8 (2022)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Information
Informatics
Journal of Marine Science and Engineering

DOI

https://doi.org/10.3390/a15080267

Art�culos similares

Machine Learning-Based Text Classification Comparison: Turkish Language Context

Acceso

Yehia Ibrahim Alzoubi, Ahmet E. Topcu and Ahmed Enis Erkaya

The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influen... ver m�s

Revista: Applied Sciences

Arabic Mispronunciation Recognition System Using LSTM Network

Acceso

Abdelfatah Ahmed, Mohamed Bader, Ismail Shahin, Ali Bou Nassif, Naoufel Werghi and Mohammad Basel

The Arabic language has always been an immense source of attraction to various people from different ethnicities by virtue of the significant linguistic legacy that it possesses. Consequently, a multitude of people from all over the world are yearning to... ver m�s

Revista: Information

Document-Level Relation Extraction with Local Relation and Global Inference

Acceso

Yiming Liu, Hongtao Shan, Feng Nie, Gaoyu Zhang and George Xianzhi Yuan

The current popular approach to the extraction of document-level relations is mainly based on either a graph structure or serialization model method for the inference, but the graph structure method makes the model complicated, while the serialization mo... ver m�s

Revista: Information

A Multimodal Deep Learning Model Using Text, Image, and Code Data for Improving Issue Classification Tasks

Acceso

Changwon Kwak, Pilsu Jung and Seonah Lee

Issue reports are valuable resources for the continuous maintenance and improvement of software. Managing issue reports requires a significant effort from developers. To address this problem, many researchers have proposed automated techniques for classi... ver m�s

Revista: Applied Sciences

Accuracy of the Sentence-BERT Semantic Search System for a Japanese Database of Closed Medical Malpractice Claims

Acceso

Naofumi Fujishiro, Yasuhiro Otaki and Shoji Kawachi

In this study, we developed a similar text retrieval system using Sentence-BERT (SBERT) for our database of closed medical malpractice claims and investigated its retrieval accuracy. We assigned each case in the database a short Japanese summary of the a... ver m�s

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles