Inicio  /  Applied Sciences  /  Vol: 13 Par: 5 (2023)  /  Artículo
ARTÍCULO
TITULO

A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data

Tingkai Hu    
Zuqin Chen    
Jike Ge    
Zhaoxu Yang and Jichao Xu    

Resumen

Insufficiently labeled samples and low-generalization performance have become significant natural language processing problems, drawing significant concern for few-shot text classification (FSTC). Advances in prompt learning have significantly improved the performance of FSTC. However, prompt learning methods typically require the pre-trained language model and tokens of the vocabulary list for model training, while different language models have different token coding structures, making it impractical to build effective Chinese prompt learning methods from previous approaches related to English. In addition, a majority of current prompt learning methods do not make use of existing unlabeled data, thus often leading to unsatisfactory performance in real-world applications. To address the above limitations, we propose a novel Chinese FSTC method called CIPLUD that combines an improved prompt learning method and existing unlabeled data, which are used for the classification of a small amount of Chinese text data. We used the Chinese pre-trained language model to build two modules: the Multiple Masks Optimization-based Prompt Learning (MMOPL) module and the One-Class Support Vector Machine-based Unlabeled Data Leveraging (OCSVM-UDL) module. The former generates prompt prefixes with multiple masks and constructs suitable prompt templates for Chinese labels. It optimizes the random token combination problem during label prediction with joint probability and length constraints. The latter, by establishing an OCSVM model in the trained text vector space, selects reasonable pseudo-label data for each category from a large amount of unlabeled data. After selecting the pseudo-label data, we mixed them with the previous few-shot annotated data to obtain brand new training data and then repeated the steps of the two modules as an iterative semi-supervised optimization process. The experimental results on the four Chinese FSTC benchmark datasets demonstrate that our proposed solution outperformed other prompt learning methods with an average accuracy improvement of 2.3%.

 Artículos similares

       
 
Feiyang Ye, Liang Huang, Senjie Liang and KaiKai Chi    
Named entity recognition (NER) in a few-shot setting is an extremely challenging task, and most existing methods fail to account for the gap between NER tasks and pre-trained language models. Although prompt learning has been successfully applied in few-... ver más
Revista: Information

 
Md. Jamal Uddin, Md. Martuza Ahamad, Md. Nesarul Hoque, Md. Abul Ala Walid, Sakifa Aktar, Naif Alotaibi, Salem A. Alyami, Muhammad Ashad Kabir and Mohammad Ali Moni    
Diabetes is a chronic disease caused by a persistently high blood sugar level, causing other chronic diseases, including cardiovascular, kidney, eye, and nerve damage. Prompt detection plays a vital role in reducing the risk and severity associated with ... ver más
Revista: Information

 
Zhipeng Zhang, Shengquan Liu and Jianming Cheng    
In recent years, large-scale pretrained language models have become widely used in natural language processing tasks. On this basis, prompt learning has achieved excellent performance in specific few-shot classification scenarios. The core idea of prompt... ver más
Revista: Applied Sciences

 
Tianhao Hou, Hongyan Xing, Xinyi Liang, Xin Su and Zenghui Wang    
Marine sensors are highly vulnerable to illegal access network attacks. Moreover, the nation?s meteorological and hydrological information is at ever-increasing risk, which calls for a prompt and in depth analysis of the network behavior and traffic to d... ver más

 
Zahra Jafari and Ebrahim Karami    
The prompt and accurate diagnosis of breast lesions, including the distinction between cancer, non-cancer, and suspicious cancer, plays a crucial role in the prognosis of breast cancer. In this paper, we introduce a novel method based on feature extracti... ver más
Revista: Information