REVISTA
Applied Sciences

TODAS

Inicio / Applied Sciences / Vol: 13 Par: 5 (2023) / Art�culo

ART�CULO

TITULO

A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data

Tingkai Hu

Zuqin Chen

Jike Ge

Zhaoxu Yang and Jichao Xu

Resumen

Insufficiently labeled samples and low-generalization performance have become significant natural language processing problems, drawing significant concern for few-shot text classification (FSTC). Advances in prompt learning have significantly improved the performance of FSTC. However, prompt learning methods typically require the pre-trained language model and tokens of the vocabulary list for model training, while different language models have different token coding structures, making it impractical to build effective Chinese prompt learning methods from previous approaches related to English. In addition, a majority of current prompt learning methods do not make use of existing unlabeled data, thus often leading to unsatisfactory performance in real-world applications. To address the above limitations, we propose a novel Chinese FSTC method called CIPLUD that combines an improved prompt learning method and existing unlabeled data, which are used for the classification of a small amount of Chinese text data. We used the Chinese pre-trained language model to build two modules: the Multiple Masks Optimization-based Prompt Learning (MMOPL) module and the One-Class Support Vector Machine-based Unlabeled Data Leveraging (OCSVM-UDL) module. The former generates prompt prefixes with multiple masks and constructs suitable prompt templates for Chinese labels. It optimizes the random token combination problem during label prediction with joint probability and length constraints. The latter, by establishing an OCSVM model in the trained text vector space, selects reasonable pseudo-label data for each category from a large amount of unlabeled data. After selecting the pseudo-label data, we mixed them with the previous few-shot annotated data to obtain brand new training data and then repeated the steps of the two modules as an iterative semi-supervised optimization process. The experimental results on the four Chinese FSTC benchmark datasets demonstrate that our proposed solution outperformed other prompt learning methods with an average accuracy improvement of 2.3%.

Palabras claves

Chinese few-shot text classification - prompt learning - unlabeled data - pre-trained language model

Acceso

P�GINAS

pp. 0 - 0

N�MERO

Volumen: 13 Parte: 5 (2023)

MATERIAS

INGENIER�A Y CONSTRUCCI�N CIVIL
TECNOLOG�A

REVISTAS SIMILARES

Applied Sciences
Information
Journal of Marine Science and Engineering

DOI

https://doi.org/10.3390/app13053334

Art�culos similares

Decomposed Two-Stage Prompt Learning for Few-Shot Named Entity Recognition

Acceso

Feiyang Ye, Liang Huang, Senjie Liang and KaiKai Chi

Named entity recognition (NER) in a few-shot setting is an extremely challenging task, and most existing methods fail to account for the gap between NER tasks and pre-trained language models. Although prompt learning has been successfully applied in few-... ver m�s

Revista: Information

A Comparison of Machine Learning Techniques for the Detection of Type-2 Diabetes Mellitus: Experiences from Bangladesh

Acceso

Md. Jamal Uddin, Md. Martuza Ahamad, Md. Nesarul Hoque, Md. Abul Ala Walid, Sakifa Aktar, Naif Alotaibi, Salem A. Alyami, Muhammad Ashad Kabir and Mohammad Ali Moni

Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with ... ver m�s

Revista: Information

Exploring Prompts in Few-Shot Cross-Linguistic Topic Classification Scenarios

Acceso

Zhipeng Zhang, Shengquan Liu and Jianming Cheng

In recent years, large-scale pretrained language models have become widely used in natural language processing tasks. On this basis, prompt learning has achieved excellent performance in specific few-shot classification scenarios. The core idea of prompt... ver m�s

Revista: Applied Sciences

A Marine Hydrographic Station Networks Intrusion Detection Method Based on LCVAE and CNN-BiLSTM

Acceso

Tianhao Hou, Hongyan Xing, Xinyi Liang, Xin Su and Zenghui Wang

Marine sensors are highly vulnerable to illegal access network attacks. Moreover, the nation?s meteorological and hydrological information is at ever-increasing risk, which calls for a prompt and in depth analysis of the network behavior and traffic to d... ver m�s

Revista: Journal of Marine Science and Engineering

Breast Cancer Detection in Mammography Images: A CNN-Based Approach with Feature Selection

Acceso

Zahra Jafari and Ebrahim Karami

The prompt and accurate diagnosis of breast lesions, including the distinction between cancer, non-cancer, and suspicious cancer, plays a crucial role in the prognosis of breast cancer. In this paper, we introduce a novel method based on feature extracti... ver m�s

Revista: Information

Revistas destacadas

Acceso directo a los n�meros publicados en la revista Infrastructures

Infrastructures

Acceso directo a los n�meros publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los n�meros publicados en la revista BiT

Acceso directo a los n�meros publicados en la revista Revista de la Construcci�n

Revista de la Construcci�n

Ver todas las revistas disponibles