Inicio  /  Information  /  Vol: 12 Par: 12 (2021)  /  Artículo
ARTÍCULO
TITULO

Multi-Keyword Classification: A Case Study in Finnish Social Sciences Data Archive

Erjon Skenderi    
Jukka Huhtamäki and Kostas Stefanidis    

Resumen

In this paper, we consider the task of assigning relevant labels to studies in the social science domain. Manual labelling is an expensive process and prone to human error. Various multi-label text classification machine learning approaches have been proposed to resolve this problem. We introduce a dataset obtained from the Finnish Social Science Archive and comprised of 2968 research studies? metadata. The metadata of each study includes attributes, such as the ?abstract? and the ?set of labels?. We used the Bag of Words (BoW), TF-IDF term weighting and pretrained word embeddings obtained from FastText and BERT models to generate the text representations for each study?s abstract field. Our selection of multi-label classification methods includes a Naive approach, Multi-label k Nearest Neighbours (ML-kNN), Multi-Label Random Forest (ML-RF), X-BERT and Parabel. The methods were combined with the text representation techniques and their performance was evaluated on our dataset. We measured the classification accuracy of the combinations using Precision, Recall and F1 metrics. In addition, we used the Normalized Discounted Cumulative Gain to measure the label ranking performance of the selected methods combined with the text representation techniques. The results showed that the ML-RF model achieved a higher classification accuracy with the TF-IDF features and, based on the ranking score, the Parabel model outperformed the other methods.

 Artículos similares

       
 
Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete, Francisco J. Ribadas-Pena and Néstor Bolaños    
In the context of academic expert finding, this paper investigates and compares the performance of information retrieval (IR) and machine learning (ML) methods, including deep learning, to approach the problem of identifying academic figures who are expe... ver más
Revista: Algorithms

 
Jean-Sébastien Dessureault, Félix Clément, Seydou Ba, François Meunier and Daniel Massicotte    
The field of interior home design has witnessed a growing utilization of machine learning. However, the subjective nature of aesthetics poses a significant challenge due to its variability among individuals and cultures. This paper proposes an applied ma... ver más
Revista: Information

 
Yohanes Yohanie Fridelin Panduman, Nobuo Funabiki, Evianita Dewi Fajrianti, Shihao Fang and Sritrusta Sukaridhoto    
In this paper, we have developed the SEMAR (Smart Environmental Monitoring and Analytics in Real-Time) IoT application server platform for fast deployments of IoT application systems. It provides various integration capabilities for the collection, displ... ver más
Revista: Information

 
Marie-Therese Charlotte Evans, Majid Latifi, Mominul Ahsan and Julfikar Haider    
Keyword extraction from Knowledge Bases underpins the definition of relevancy in Digital Library search systems. However, it is the pertinent task of Joint Relation Extraction, which populates the Knowledge Bases from which results are retrieved. Recent ... ver más
Revista: Information

 
Gursu Gurer, Yaser Dalveren, Ali Kara and Mohammad Derawi    
The automatic dependent surveillance broadcast (ADS-B) system is one of the key components of the next generation air transportation system (NextGen). ADS-B messages are transmitted in unencrypted plain text. This, however, causes significant security vu... ver más
Revista: Aerospace