Inicio  /  Applied Sciences  /  Vol: 7 Núm: 8 Par: August (2017)  /  Artículo
ARTÍCULO
TITULO

Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

Sicong Kuang and Brian D. Davison    

Resumen

Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective. Algorithms for word embeddings have proved to be useful in many natural language processing (NLP) tasks. We introduce two algorithms based on an existing word embedding learning algorithm: the continuous bag-of-words model (CBOW). We apply the proposed algorithms to the task of recognizing healthcare-related tweets. In the CBOW model, the vector representation of words is learned from their contexts. To simplify the computation, the context is represented by an average of all words inside the context window. However, not all words in the context window contribute equally to the prediction of the target word. Greedily incorporating all the words in the context window will largely limit the contribution of the useful semantic words and bring noisy or irrelevant words into the learning process, while existing word embedding algorithms also try to learn a weighted CBOW model. Their weights are based on existing pre-defined syntactic rules while ignoring the task of the learned embedding. We propose learning weights based on the words? relative importance in the classification task. Our intuition is that such learned weights place more emphasis on words that have comparatively more to contribute to the later task. We evaluate the embeddings learned from our algorithms on two healthcare-related datasets. The experimental results demonstrate that embeddings learned from the proposed algorithms outperform existing techniques by a relative accuracy improvement of over 9%.

Palabras claves

 Artículos similares

       
 
Nasrin Elhassan, Giuseppe Varone, Rami Ahmed, Mandar Gogate, Kia Dashtipour, Hani Almoamari, Mohammed A. El-Affendi, Bassam Naji Al-Tamimi, Faisal Albalwy and Amir Hussain    
Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing u... ver más
Revista: Computers

 
Gregorius Ryan, Pricillia Katarina and Derwin Suhartono    
The rise of social media as a platform for self-expression and self-understanding has led to increased interest in using the Myers?Briggs Type Indicator (MBTI) to explore human personalities. Despite this, there needs to be more research on how other wor... ver más
Revista: Information

 
Yuanyuan Li, Yuan Huang, Weijian Huang, Junhao Yu and Zheng Huang    
An abstractive summarization model based on the joint-attention mechanism and a priori knowledge is proposed to address the problems of the inadequate semantic understanding of text and summaries that do not conform to human language habits in abstractiv... ver más
Revista: Applied Sciences

 
Sergiu Zaharia, Traian Rebedea and Stefan Trausan-Matu    
The research presented in the paper aims at increasing the capacity to identify security weaknesses in programming languages that are less supported by specialized security analysis tools, based on the knowledge gathered from securing the popular ones, f... ver más
Revista: Applied Sciences

 
Yixian Fu, Yuanyao Lu and Ran Ni    
Lip reading has attracted increasing attention recently due to advances in deep learning. However, most research targets English datasets. The study of Chinese lip-reading technology is still in its initial stage. Firstly, in this paper, we expand the na... ver más
Revista: Applied Sciences