Inicio  /  Applied Sciences  /  Vol: 10 Par: 19 (2020)  /  Artículo
ARTÍCULO
TITULO

Towards Robust Word Embeddings for Noisy Texts

Yerai Doval    
Jesús Vilares and Carlos Gómez-Rodríguez    

Resumen

Research on word embeddings has mainly focused on improving their performance on standard corpora, disregarding the difficulties posed by noisy texts in the form of tweets and other types of non-standard writing from social media. In this work, we propose a simple extension to the skipgram model in which we introduce the concept of bridge-words, which are artificial words added to the model to strengthen the similarity between standard words and their noisy variants. Our new embeddings outperform baseline models on noisy texts on a wide range of evaluation tasks, both intrinsic and extrinsic, while retaining a good performance on standard texts. To the best of our knowledge, this is the first explicit approach at dealing with these types of noisy texts at the word embedding level that goes beyond the support for out-of-vocabulary words.

 Artículos similares

       
 
Dionysios N. Markatos, Sonia Malefaki and Spiros G. Pantelakis    
When it comes to achieving sustainability and circular economy objectives, multi-criteria decision-making (MCDM) tools can be of aid in supporting decision-makers to reach a satisfying solution, especially when conflicting criteria are present. In a prev... ver más
Revista: Aerospace

 
Bahareh Lashkari and Petr Musilek    
With the widespread adoption of blockchain platforms across various decentralized applications, the smart contract?s vulnerabilities are continuously growing and evolving. Consequently, a failure to optimize conventional vulnerability analysis methods re... ver más
Revista: Information

 
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault and Khalid Choukri    
Voice communication between air traffic controllers (ATCos) and pilots is critical for ensuring safe and efficient air traffic control (ATC). The handling of these voice communications requires high levels of awareness from ATCos and can be tedious and e... ver más
Revista: Aerospace

 
Amani Alqarni and Hamoud Aljamaan    
Software defect prediction is an active research area. Researchers have proposed many approaches to overcome the imbalanced defect problem and build highly effective machine learning models that are not biased towards the majority class. Generative adver... ver más
Revista: Applied Sciences

 
Hayat Ullah and Arslan Munir    
The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human ... ver más
Revista: Algorithms