Inicio  /  Algorithms  /  Vol: 14 Par: 1 (2021)  /  Artículo
ARTÍCULO
TITULO

Re-Pair in Small Space

Dominik Köppl    
Tomohiro I    
Isamu Furuya    
Yoshimasa Takabatake    
Kensuke Sakai and Keisuke Goto    

Resumen

Re-Pairis a grammar compression scheme with favorably good compression rates. The computation of Re-Pair comes with the cost of maintaining large frequency tables, which makes it hard to compute Re-Pair on large-scale data sets. As a solution for this problem, we present, given a text of length n whose characters are drawn from an integer alphabet with size ??=????(1) s = n O ( 1 ) , an ??(min(??2,??2lglog????lglglg??/log????)) O ( min ( n 2 , n 2 lg log t n lg lg lg n / log t n ) ) time algorithm computing Re-Pair with max((??/??)lg??, max ( ( n / c ) lg n , ???lg???)+??(lg??) n lg t ) + O ( lg n ) bits of working space including the text space, where ??=1 c = 1 is a fixed user-defined constant and ?? t is the sum of ?? s and the number of non-terminals. We give variants of our solution working in parallel or in the external memory model. Unfortunately, the algorithm seems not practical since a preliminary version already needs roughly one hour for computing Re-Pair on one megabyte of text.