Inicio  /  Future Internet  /  Vol: 14 Par: 7 (2022)  /  Artículo
ARTÍCULO
TITULO

Zero-Inflated Patent Data Analysis Using Generating Synthetic Samples

Daiho Uhm and Sunghae Jun    

Resumen

Due to the expansion of the internet, we encounter various types of big data such as web documents or sensing data. Compared to traditional small data such as experimental samples, big data provide more chances to find hidden and novel patterns with big data analysis using statistics and machine learning algorithms. However, as the use of big data increases, problems also occur. One of them is a zero-inflated problem in structured data preprocessed from big data. Most count values are zeros because a specific word is found in only some documents. In particular, since most of the patent data are in the form of a text document, they are more affected by the zero-inflated problem. To solve this problem, we propose a generation of synthetic samples using statistical inference and tree structure. Using patent document and simulation data, we verify the performance and validity of our proposed method. In this paper, we focus on patent keyword analysis as text big data analysis, and we encounter the zero-inflated problem just like other text data.

 Artículos similares

       
 
Kenneth David Strang    
A critical worldwide problem is that ransomware cyberattacks can be costly to organizations. Moreover, accidental employee cybercrime risk can be challenging to prevent, even by leveraging advanced computer science techniques. This exploratory project us... ver más

 
Wei-Ling Hsu, Yi-Jheng Chang, Lin Mou, Juan-Wen Huang and Hsin-Lung Liu    
Historic urban areas are the foundations of urban development. Due to rapid urbanization, the sustainable development of historic urban areas has become challenging for many cities. Elements of tourism and tourism service facilities play an important rol... ver más

 
Shaopan Li, Yiping Lin and Hong Huang    
Estimating disaster relief supplies is crucial for governments coordinating and executing disaster relief operations. Rapid and accurate estimation of disaster relief supplies can assist the government to optimize the allocation of resources and better o... ver más

 
Andreas F. Gkontzis, Sotiris Kotsiantis, Georgios Feretzakis and Vassilios S. Verykios    
In an epoch characterized by the swift pace of digitalization and urbanization, the essence of community well-being hinges on the efficacy of urban management. As cities burgeon and transform, the need for astute strategies to navigate the complexities o... ver más

 
Lei Zhou, Weiye Xiao, Chen Wang, Haoran Wang     Pág. 143 - 161
Human mobility datasets, such as traffic flow data, reveal the connections between urban spaces. A novel framework is proposed to explore the spatial association between urban commercial and residential spaces via consumption travel flows in Shanghai. A ... ver más