Inicio  /  Algorithms  /  Vol: 13 Par: 4 (2020)  /  Artículo
ARTÍCULO
TITULO

Practical Grammar Compression Based on Maximal Repeats

Isamu Furuya    
Takuya Takagi    
Yuto Nakashima    
Shunsuke Inenaga    
Hideo Bannai and Takuya Kida    

Resumen

This study presents an analysis of RePair, which is a grammar compression algorithm known for its simple scheme, while also being practically effective. First, we show that the main process of RePair, that is, the step by step substitution of the most frequent symbol pairs, works within the corresponding most frequent maximal repeats. Then, we reveal the relation between maximal repeats and grammars constructed by RePair. On the basis of this analysis, we further propose a novel variant of RePair, called MR-RePair, which considers the one-time substitution of the most frequent maximal repeats instead of the consecutive substitution of the most frequent pairs. The results of the experiments comparing the size of constructed grammars and execution time of RePair and MR-RePair on several text corpora demonstrate that MR-RePair constructs more compact grammars than RePair does, especially for highly repetitive texts.

 Artículos similares