ARTÍCULO
TITULO

Towards Automatic Points of Interest Matching

Mateusz Piech    
Aleksander Smywinski-Pohl    
Robert Marcjan and Leszek Siwik    

Resumen

Complementing information about particular points, places, or institutions, i.e., so-called Points of Interest (POIs) can be achieved by matching data from the growing number of geospatial databases; these include Foursquare, OpenStreetMap, Yelp, and Facebook Places. Doing this potentially allows for the acquisition of more accurate and more complete information about POIs than would be possible by merely extracting the information from each of the systems alone. Problem: The task of Points of Interest matching, and the development of an algorithm to perform this automatically, are quite challenging problems due to the prevalence of different data structures, data incompleteness, conflicting information, naming differences, data inaccuracy, and cultural and language differences; in short, the difficulties experienced in the process of obtaining (complementary) information about the POI from different sources are due, in part, to the lack of standardization among Points of Interest descriptions; a further difficulty stems from the vast and rapidly growing amount of data to be assessed on each occasion. Research design and contributions: To propose an efficient algorithm for automatic Points of Interest matching, we: (1) analyzed available data sources?their structures, models, attributes, number of objects, the quality of data (number of missing attributes), etc.?and defined a unified POI model; (2) prepared a fairly large experimental dataset consisting of 50,000 matching and 50,000 non-matching points, taken from different geographical, cultural, and language areas; (3) comprehensively reviewed metrics that can be used for assessing the similarity between Points of Interest; (4) proposed and verified different strategies for dealing with missing or incomplete attributes; (5) reviewed and analyzed six different classifiers for Points of Interest matching, conducting experiments and follow-up comparisons to determine the most effective combination of similarity metric, strategy for dealing with missing data, and POIs matching classifier; and (6) presented an algorithm for automatic Points of Interest matching, detailing its accuracy and carrying out a complexity analysis. Results and conclusions: The main results of the research are: (1) comprehensive experimental verification and numerical comparisons of the crucial Points of Interest matching components (similarity metrics, approaches for dealing with missing data, and classifiers), indicating that the best Points of Interest matching classifier is a combination of random forest algorithm coupled with marking of missing data and mixing different similarity metrics for different POI attributes; and (2) an efficient greedy algorithm for automatic POI matching. At a cost of just 3.5% in terms of accuracy, it allows for reducing POI matching time complexity by two orders of magnitude in comparison to the exact algorithm.

 Artículos similares

       
 
Stefan Claus and Massimo Stella    
The ability to spot key ideas, trends, and relationships between them in documents is key to financial services, such as banks and insurers. Identifying patterns across vast amounts of domain-specific reports is crucial for devising efficient and targete... ver más
Revista: Future Internet

 
Tessio Novack, Leonard Vorbeck, Heinrich Lorei and Alexander Zipf    
As a recognized type of art, graffiti is a cultural asset and an important aspect of a city?s aesthetics. As such, graffiti is associated with social and commercial vibrancy and is known to attract tourists. However, positional uncertainty and incomplete... ver más

 
Matthew T. O. Worsey, Hugo G. Espinosa, Jonathan B. Shepherd and David V. Thiel    
Machine learning is a powerful tool for data classification and has been used to classify movement data recorded by wearable inertial sensors in general living and sports. Inertial sensors can provide valuable biofeedback in combat sports such as boxing;... ver más
Revista: IoT

 
Ainhoa Serna, Jon Kepa Gerrikagoitia, Unai Bernabé, Tomás Ruiz     Pág. 1 - 8
Urban transport became an important element in the promotion of strategies towards sustainability, in fact one of the challenges posed by booming urban populations is the question of mobility. Traditional travel survey methods used to study urban mobilit... ver más

 
Maria Karatsoli, Martin Margreiter, Matthias Spangler     Pág. 204 - 211
This paper analyses the use of Bluetooth-based travel times, for Automatic Incident Detection (AID) purposes. Automatic incident messages were derived for both actual and simulated data through the use of an AID algorithm. This algorithm was developed by... ver más