Related papers: Mining Points of Interest via Address Embeddings: An Unsupervised Approach

Mining Points of Interest via Address Embeddings: An Unsupervised Approach

URL: http://arxiv.org/abs/2109.04467v1
Date: Thu, 9 Sep 2021 17:59:45 GMT
Title: Mining Points of Interest via Address Embeddings: An Unsupervised Approach
Authors: Abhinav Ganesan, Anubhav Gupta, and Jose Mathew
Abstract summary: We propose an end-to-end unsupervised system design for obtaining polygon representations of points of interest (PoI) from address locations and address texts. The proposed algorithm achieves a median area precision of 98 %, a median area recall of 8 %, and a median F-score of 0.15.
Score: 0.7646713951724009
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Digital maps are commonly used across the globe for exploring places that users are interested in, commonly referred to as points of interest (PoI). In online food delivery platforms, PoIs could represent any major private compounds where customers could order from such as hospitals, residential complexes, office complexes, educational institutes and hostels. In this work, we propose an end-to-end unsupervised system design for obtaining polygon representations of PoIs (PoI polygons) from address locations and address texts. We preprocess the address texts using locality names and generate embeddings for the address texts using a deep learning-based architecture, viz. RoBERTa, trained on our internal address dataset. The PoI candidates are identified by jointly clustering the anonymised customer phone GPS locations (obtained during address onboarding) and the embeddings of the address texts. The final list of PoI polygons is obtained from these PoI candidates using novel post-processing steps. This algorithm identified 74.8 % more PoIs than those obtained using the Mummidi-Krumm baseline algorithm run on our internal dataset. The proposed algorithm achieves a median area precision of 98 %, a median area recall of 8 %, and a median F-score of 0.15. In order to improve the recall of the algorithmic polygons, we post-process them using building footprint polygons from the OpenStreetMap (OSM) database. The post-processing algorithm involves reshaping the algorithmic polygon using intersecting polygons and closed private roads from the OSM database, and accounting for intersection with public roads on the OSM database. We achieve a median area recall of 70 %, a median area precision of 69 %, and a median F-score of 0.69 on these post-processed polygons.

Related papers

Boundary Detection Algorithm Inspired by Locally Linear Embedding [8.259071011958254]
We propose a method for detecting boundary points inspired by the widely used locally linear embedding algorithm. We implement this method using two nearest neighborhood search schemes: the $epsilon$-radius ball scheme and the $K$-nearest neighbor scheme.
arXiv Detail & Related papers (2024-06-26T16:05:57Z)
Progressive Evolution from Single-Point to Polygon for Scene Text [79.29097971932529]
We introduce Point2Polygon, which can efficiently transform single-points into compact polygons. Our method uses a coarse-to-fine process, starting with creating anchor points based on recognition confidence, then vertically and horizontally refining the polygon. In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons.
arXiv Detail & Related papers (2023-12-21T12:08:27Z)
Deepparse : An Extendable, and Fine-Tunable State-Of-The-Art Library for Parsing Multinational Street Addresses [0.0]
This paper presents Deepparse, a Python open-source, extendable, fine-tunable address parsing solution under LGPL-3.0 licence. It can parse addresses written in any language and use any address standard. The library supports fine-tuning with new data to generate a custom address.
arXiv Detail & Related papers (2023-11-20T15:37:33Z)
Dominating Set Database Selection for Visual Place Recognition [2.6641546039481527]
This paper presents an approach for creating a visual place recognition database for localization in indoor environments from RGBD scanning sequences. The proposed approach is formulated as a problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred to DominatingSet. The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation.
arXiv Detail & Related papers (2023-03-09T09:12:21Z)
HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D Images [58.720142291102135]
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment. The dataset is based on the popular Habitat simulator, in which it is possible to generate indoor scenes using both own sensor data and open datasets.
arXiv Detail & Related papers (2022-12-30T12:20:56Z)
A Metaheuristic Algorithm for Large Maximum Weight Independent Set Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum. Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges. We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z)
End-to-End Segmentation via Patch-wise Polygons Prediction [93.91375268580806]
The leading segmentation methods represent the output map as a pixel grid. We study an alternative representation in which the object edges are modeled, per image patch, as a polygon with $k$ vertices that is coupled with per-patch label probabilities.
arXiv Detail & Related papers (2021-12-05T10:42:40Z)
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds [79.99653758293277]
PCAM is a neural network whose key element is a pointwise product of cross-attention matrices. We show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets.
arXiv Detail & Related papers (2021-10-04T09:23:27Z)
An End-to-end Point of Interest (POI) Conflation Framework [0.966840768820136]
Point of interest (POI) data serves as a valuable source of semantic information for places of interest. This study proposes a novel end-to-end POI conflation framework consisting of six steps.
arXiv Detail & Related papers (2021-09-13T15:50:48Z)
An Improved Approach for Estimating Social POI Boundaries With Textual Attributes on Social Media [3.590202054885437]
It has been insufficiently explored how to perform density-based clustering by exploiting textual attributes on social media. We present a new approach and algorithm, built upon our earlier work on social POI boundary estimation (SoBEst) Our study is motivated by the following empirical observation: a fixed representative coordinate of each POI that SoBEst basically assumes may be far away from the centroid of the estimated social POI boundary for certain POIs.
arXiv Detail & Related papers (2020-12-18T00:41:44Z)
Sketch and Scale: Geo-distributed tSNE and UMAP [75.44887265789056]
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem. We introduce a novel framework: Sketch and Scale (SnS) It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary. We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe.
arXiv Detail & Related papers (2020-11-11T22:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.