Inicio  /  Information  /  Vol: 15 Par: 2 (2024)  /  Artículo
ARTÍCULO
TITULO

SRBerta?A Transformer Language Model for Serbian Cyrillic Legal Texts

Milo? Bogdanovic    
Jelena Kocic and Leonid Stoimenov    

Resumen

Language is a unique ability of human beings. Although relatively simple for humans, the ability to understand human language is a highly complex task for machines. For a machine to learn a particular language, it must understand not only the words and rules used in a particular language, but also the context of sentences and the meaning that words take on in a particular context. In the experimental development we present in this paper, the goal was the development of the language model SRBerta?a language model designed to understand the formal language of Serbian legal documents. SRBerta is the first of its kind since it has been trained using Cyrillic legal texts contained within a dataset created specifically for this purpose. The main goal of SRBerta network development was to understand the formal language of Serbian legislation. The training process was carried out using minimal resources (single NVIDIA Quadro RTX 5000 GPU) and performed in two phases?base model training and fine-tuning. We will present the structure of the model, the structure of the training datasets, the training process, and the evaluation results. Further, we will explain the accuracy metric used in our case and demonstrate that SRBerta achieves a high level of accuracy for the task of masked language modeling in Serbian Cyrillic legal texts. Finally, SRBerta model and training datasets are publicly available for scientific and commercial purposes.

Palabras claves

 Artículos similares

       
 
Pascal Harth, Orlando Jähde, Sophia Schneider, Nils Horn and Rüdiger Buchkremer    
In this research, we present an algorithm that leverages language-transformer technologies to automate the generation of product requirements, utilizing E-Shop consumer reviews as a data source. Our methodology combines classical natural language process... ver más
Revista: Algorithms

 
Oscar Ondeng, Heywood Ouma and Peter Akuon    
Visual understanding is a research area that bridges the gap between computer vision and natural language processing. Image captioning is a visual understanding task in which natural language descriptions of images are automatically generated using visio... ver más
Revista: Applied Sciences

 
Fetoun Mansour AlZahrani and Maha Al-Yahya    
Authorship attribution (AA) is a field of natural language processing that aims to attribute text to its author. Although the literature includes several studies on Arabic AA in general, applying AA to classical Arabic texts has not gained similar attent... ver más
Revista: Applied Sciences

 
Marian Bucos and Bogdan Dragulescu    
Misinformation poses a significant challenge in the digital age, requiring robust methods to detect fake news. This study investigates the effectiveness of using Back Translation (BT) augmentation, specifically transformer-based models, to improve fake n... ver más
Revista: Applied Sciences

 
Dina Oralbekova, Orken Mamyrbayev, Mohamed Othman, Dinara Kassymova and Kuralai Mukhsina    
This article provides a comprehensive survey of contemporary language modeling approaches within the realm of natural language processing (NLP) tasks. This paper conducts an analytical exploration of diverse methodologies employed in the creation of lang... ver más
Revista: Applied Sciences