Portada: Infraestructura para la Logística Sustentable 2050
DESTACADO | CPI Propone - Resumen Ejecutivo

Infraestructura para el desarrollo que queremos 2026-2030

Elaborado por el Consejo de Políticas de Infraestructura (CPI), este documento constituye una hoja de ruta estratégica para orientar la inversión y la gestión de infraestructura en Chile. Presenta propuestas organizadas en siete ejes estratégicos, sin centrarse en proyectos específicos, sino en influir en las decisiones de política pública para promover una infraestructura que conecte territorios, genere oportunidades y eleve la calidad de vida de la población.
Redirigiendo al acceso original de articulo en 24 segundos...
Inicio  /  AI  /  Vol: 6 Par: 1 (2025)
ARTÍCULO
TITULO

Beyond Text Generation: Assessing Large Language Models? Ability to Reason Logically and Follow Strict Rules

Zhiyong Han    
Fortunato Battaglia    
Kush Mansuria    
Yoav Heyman and Stanley R. Terlecky    

Resumen

The growing interest in advanced large language models (LLMs) like ChatGPT has sparked debate about how best to use them in various human activities. However, a neglected issue in the debate concerning the applications of LLMs is whether they can reason logically and follow rules in novel contexts, which are critical for our understanding and applications of LLMs. To address this knowledge gap, this study investigates five LLMs (ChatGPT-4o, Claude, Gemini, Meta AI, and Mistral) using word ladder puzzles to assess their logical reasoning and rule-adherence capabilities. Our two-phase methodology involves (1) explicit instructions about word ladder puzzles and rules regarding how to solve the puzzles and then evaluate rule understanding, followed by (2) assessing LLMs? ability to create and solve word ladder puzzles while adhering to rules. Additionally, we test their ability to implicitly recognize and avoid HIPAA privacy rule violations as an example of a real-world scenario. Our findings reveal that LLMs show a persistent lack of logical reasoning and systematically fail to follow puzzle rules. Furthermore, all LLMs except Claude prioritized task completion (text writing) over ethical considerations in the HIPAA test. Our findings expose critical flaws in LLMs? reasoning and rule-following capabilities, raising concerns about their reliability in critical tasks requiring strict rule-following and logical reasoning. Therefore, we urge caution when integrating LLMs into critical fields and highlight the need for further research into their capabilities and limitations to ensure responsible AI development.

Artículos similares

Hemos preparados una selección de otros artículos que pudieran ser de tu interés
Melania Nitu and Mihai Dascalu    
Machine-generated content reshapes the landscape of digital information; hence, ensuring the authenticity of texts within digital libraries has become a paramount concern. This work introduces a corpus of approximately 60 k Romanian documents, including ... ver más
Revista: Future Internet
Sven Tarp     Pág. 175 - 188
This text is the inaugural lecture presented by Professor Sven Tarp at the Aarhus School of Business on March 14, 2008. Firstly, the text provides a brief retrospect of the history of lexicography with emphasis on the experience of the big Chinese encycl... ver más
Runqing Miao, Qingxuan Jia, Fuchun Sun, Gang Chen and Haiming Huang    
In the quest for intelligent robots, it is essential to enable them to understand tasks beyond mere manipulation. Achieving this requires a robust parsing mode that can be used to understand human cognition and semantics. However, the existing methods fo... ver más
Revista: Actuators
Carlos E. Mendoza-Ramírez, Juan C. Tudon-Martinez, Luis C. Félix-Herrán, Jorge de J. Lozoya-Santos and Adriana Vargas-Martínez    
An Augmented Reality (AR) system is a technology that overlays digital information, such as images, sounds, or text, onto a user?s view of the real world, providing an enriched and interactive experience of the surrounding environment. It has evolved int... ver más
Revista: Applied Sciences
Moreno La Quatra and Luca Cagliero    
The emergence of attention-based architectures has led to significant improvements in the performance of neural sequence-to-sequence models for text summarization. Although these models have proved to be effective in summarizing English-written documents... ver más
Revista: Future Internet