Inicio  /  Algorithms  /  Vol: 13 Par: 5 (2020)  /  Artículo
ARTÍCULO
TITULO

Mining Sequential Patterns with VC-Dimension and Rademacher Complexity

Diego Santoro    
Andrea Tonon and Fabio Vandin    

Resumen

Sequential pattern mining is a fundamental data mining task with application in several domains. We study two variants of this task?the first is the extraction of frequent sequential patterns, whose frequency in a dataset of sequential transactions is higher than a user-provided threshold; the second is the mining of true frequent sequential patterns, which appear with probability above a user-defined threshold in transactions drawn from the generative process underlying the data. We present the first sampling-based algorithm to mine, with high confidence, a rigorous approximation of the frequent sequential patterns from massive datasets. We also present the first algorithms to mine approximations of the true frequent sequential patterns with rigorous guarantees on the quality of the output. Our algorithms are based on novel applications of Vapnik-Chervonenkis dimension and Rademacher complexity, advanced tools from statistical learning theory, to sequential pattern mining. Our extensive experimental evaluation shows that our algorithms provide high-quality approximations for both problems we consider.

 Artículos similares

       
 
Sami Diaf and Ulrich Fritsche    
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions and introduces the concept of Topic Scaling, which ranks learned topics wi... ver más
Revista: Algorithms

 
Consolata Gakii, Paul O. Mireji and Richard Rimiru    
Analysis of high-dimensional data, with more features (p" role="presentation">??p p ) than observations (N" role="presentation">??N N ) (p>N" role="presentation">??>??p>N p > N ), places significant demand in cost and memory computational usage at... ver más
Revista: Algorithms

 
Xinhua Wang, Xuemeng Yu, Lei Guo, Fangai Liu and Liancheng Xu    
As students? behaviors are important factors that can reflect their learning styles and living habits on campus, extracting useful features of them plays a helpful role in understanding the students? learning process, which is an important step towards p... ver más
Revista: Information

 
Andrea Brunello, Enrico Marzano, Angelo Montanari and Guido Sciavicco    
Temporal information plays a very important role in many analysis tasks, and can be encoded in at least two different ways. It can be modeled by discrete sequences of events as, for example, in the business intelligence domain, with the aim of tracking t... ver más
Revista: Computers

 
Ivy Ordanel, Proceso Fernandez, Jr. and Henry Adorna    
The Poset Cover Problem is an optimization problem where the goal is to determine a minimum set of posets that covers a given set of linear orders. This problem is relevant in the field of data mining, specifically in determining directed networks or mod... ver más
Revista: Algorithms