ARTÍCULO
TITULO

The Advantages of Human Evaluation of Sociomedical Question Answering Systems

Victoria Firsanova    

Resumen

The paper presents a study on question answering systems evaluation. The purpose of the study is to determine if human evaluation is indeed necessary to qualitatively measure the performance of a sociomedical dialogue system. The study is based on the data from several natural language processing experiments conducted with a question answering dataset for inclusion of people with autism spectrum disorder and state-of-the-art models with the Transformer architecture. The study describes model-centric experiments on generative and extractive question answering and data-centric experiments on dataset tuning. The purpose of both model- and data-centric approaches is to reach the highest F1-Score. Although F1-Score and Exact Match are well-known automated evaluation metrics for question answering, their reliability in measuring the performance of sociomedical systems, in which outputs should be not only consistent but also psychologically safe, is questionable. Considering this idea, the author of the paper experimented with human evaluation of a dialogue system for inclusion developed in the previous phase of the work. The result of the study is the analysis of the advantages and disadvantages of automated and human approaches to evaluate conversational artificial intelligence systems, in which the psychological safety of a user is essential.

 Artículos similares

       
 
Tianlei Wang, Fei Ding and Zhenxing Sun    
Human intelligence has the advantage for making high-level decisions in the remote control of underwater vehicles, while autonomous control is superior for accurate and fast close-range pose adjustment. Combining the advantages of both remote and autonom... ver más

 
Maricruz Fun Sang Cepeda, Marcos de Souza Freitas Machado, Fabrício Hudson Sousa Barbosa, Douglas Santana Souza Moreira, Maria José Legaz Almansa, Marcelo Igor Lourenço de Souza and Jean-David Caprace    
Operators of offshore production units (OPUs) employ risk-based assessment (RBA) techniques in order to minimise inspection expenses while maintaining risks at an acceptable level. However, when human divers and workers are involved in inspections conduc... ver más

 
Michal Brzus, Kevin Knoernschild, Jessica C. Sieren and Hans J. Johnson    
Translation of basic animal research to find effective methods of diagnosing and treating human neurological disorders requires parallel analysis infrastructures. Small animals such as mice provide exploratory animal disease models. However, many interve... ver más
Revista: Algorithms

 
Pablo Rivas and Liang Zhao    
ChatGPT is an AI-powered chatbot platform that enables human users to converse with machines. It utilizes natural language processing and machine learning algorithms, transforming how people interact with AI technology. ChatGPT offers significant advanta... ver más
Revista: AI

 
Dushi Wen, Sirui Zheng, Jiazhen Chen, Zhouyi Zheng, Chen Ding and Lei Zhang    
In the world, with the continuous development of modern society and the acceleration of urbanization, the problem of air pollution is becoming increasingly salient. Methods for predicting the air quality grade and determining the necessary governance are... ver más
Revista: Information