Inicio  /  Information  /  Vol: 11 Par: 3 (2020)  /  Artículo
ARTÍCULO
TITULO

A Syllable-Based Technique for Uyghur Text Compression

Wayit Abliz    
Hao Wu    
Maihemuti Maimaiti    
Jiamila Wushouer    
Kahaerjiang Abiderexiti    
Tuergen Yibulayin and Aishan Wumaier    

Resumen

To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a 12-bit and 16-bit syllable code tables and added commonly used symbols?such as punctuation marks and ASCII characters?to the code tables. To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code table. The experiments showed that the 12-bit coding scheme had an average compression ratio of 0.3 on Uyghur text less than 4 KB in size and that the 16-bit coding scheme had an average compression ratio of 0.5 on text less than 2 KB in size. Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications.

Palabras claves

 Artículos similares

       
 
Isamu Furuya, Takuya Takagi, Yuto Nakashima, Shunsuke Inenaga, Hideo Bannai and Takuya Kida    
This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most fr... ver más
Revista: Algorithms

 
Chang-Min Kim, Ellen J. Hong, Kyungyong Chung and Roy C. Park    
Recently, demand for handwriting recognition, such as automation of mail sorting, license plate recognition, and electronic memo pads, has exponentially increased in various industrial fields. In addition, in the image recognition field, methods using ar... ver más
Revista: Applied Sciences

 
Jérémy Barbay    
We describe an algorithm computing an optimal prefix free code for n unsorted positive weights in time within ??(??(1+lg??))???(??lg??) O ( n ( 1 + lg a ) ) ? O ( n lg n ) , where the alternation ???[1..??-1] a ? [ 1 . . n - 1 ] approximates the minimal... ver más
Revista: Algorithms

 
Liat Rozenberg, Sagi Lotan and Dan Feldman    
Whether the source is autonomous car, robotic vacuum cleaner, or a quadcopter, signals from sensors tend to have some hidden patterns that repeat themselves. For example, typical GPS traces from a smartphone contain periodic trajectories such as ?home, w... ver más
Revista: Algorithms

 
Muhammed Oguzhan Külekci and Yasin Öztürk    
Non-uniquely-decodable (non-UD) codes can be defined as the codes that cannot be uniquely decoded without additional disambiguation information. These are mainly the class of non?prefix?free codes, where a code-word can be a prefix of other(s), and thus,... ver más
Revista: Algorithms