Mining Points of Interest via Address Embeddings: An Unsupervised
Approach
- URL: http://arxiv.org/abs/2109.04467v1
- Date: Thu, 9 Sep 2021 17:59:45 GMT
- Title: Mining Points of Interest via Address Embeddings: An Unsupervised
Approach
- Authors: Abhinav Ganesan, Anubhav Gupta, and Jose Mathew
- Abstract summary: We propose an end-to-end unsupervised system design for obtaining polygon representations of points of interest (PoI) from address locations and address texts.
The proposed algorithm achieves a median area precision of 98 %, a median area recall of 8 %, and a median F-score of 0.15.
- Score: 0.7646713951724009
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Digital maps are commonly used across the globe for exploring places that
users are interested in, commonly referred to as points of interest (PoI). In
online food delivery platforms, PoIs could represent any major private
compounds where customers could order from such as hospitals, residential
complexes, office complexes, educational institutes and hostels. In this work,
we propose an end-to-end unsupervised system design for obtaining polygon
representations of PoIs (PoI polygons) from address locations and address
texts. We preprocess the address texts using locality names and generate
embeddings for the address texts using a deep learning-based architecture, viz.
RoBERTa, trained on our internal address dataset. The PoI candidates are
identified by jointly clustering the anonymised customer phone GPS locations
(obtained during address onboarding) and the embeddings of the address texts.
The final list of PoI polygons is obtained from these PoI candidates using
novel post-processing steps. This algorithm identified 74.8 % more PoIs than
those obtained using the Mummidi-Krumm baseline algorithm run on our internal
dataset. The proposed algorithm achieves a median area precision of 98 %, a
median area recall of 8 %, and a median F-score of 0.15. In order to improve
the recall of the algorithmic polygons, we post-process them using building
footprint polygons from the OpenStreetMap (OSM) database. The post-processing
algorithm involves reshaping the algorithmic polygon using intersecting
polygons and closed private roads from the OSM database, and accounting for
intersection with public roads on the OSM database. We achieve a median area
recall of 70 %, a median area precision of 69 %, and a median F-score of 0.69
on these post-processed polygons.
Related papers
- Boundary Detection Algorithm Inspired by Locally Linear Embedding [8.259071011958254]
We propose a method for detecting boundary points inspired by the widely used locally linear embedding algorithm.
We implement this method using two nearest neighborhood search schemes: the $epsilon$-radius ball scheme and the $K$-nearest neighbor scheme.
arXiv Detail & Related papers (2024-06-26T16:05:57Z) - Progressive Evolution from Single-Point to Polygon for Scene Text [79.29097971932529]
We introduce Point2Polygon, which can efficiently transform single-points into compact polygons.
Our method uses a coarse-to-fine process, starting with creating anchor points based on recognition confidence, then vertically and horizontally refining the polygon.
In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons.
arXiv Detail & Related papers (2023-12-21T12:08:27Z) - Deepparse : An Extendable, and Fine-Tunable State-Of-The-Art Library for
Parsing Multinational Street Addresses [0.0]
This paper presents Deepparse, a Python open-source, extendable, fine-tunable address parsing solution under LGPL-3.0 licence.
It can parse addresses written in any language and use any address standard.
The library supports fine-tuning with new data to generate a custom address.
arXiv Detail & Related papers (2023-11-20T15:37:33Z) - Dominating Set Database Selection for Visual Place Recognition [2.6641546039481527]
This paper presents an approach for creating a visual place recognition database for localization in indoor environments from RGBD scanning sequences.
The proposed approach is formulated as a problem in terms of dominating set algorithm for graph, constructed from spatial information, and referred to DominatingSet.
The paper also presents a fully automated pipeline for VPR database creation from RGBD scanning sequences, as well as a set of metrics for VPR database evaluation.
arXiv Detail & Related papers (2023-03-09T09:12:21Z) - HPointLoc: Point-based Indoor Place Recognition using Synthetic RGB-D
Images [58.720142291102135]
We present a novel dataset named as HPointLoc, specially designed for exploring capabilities of visual place recognition in indoor environment.
The dataset is based on the popular Habitat simulator, in which it is possible to generate indoor scenes using both own sensor data and open datasets.
arXiv Detail & Related papers (2022-12-30T12:20:56Z) - A Metaheuristic Algorithm for Large Maximum Weight Independent Set
Problems [58.348679046591265]
Given a node-weighted graph, find a set of independent (mutually nonadjacent) nodes whose node-weight sum is maximum.
Some of the graphs airsing in this application are large, having hundreds of thousands of nodes and hundreds of millions of edges.
We develop a new local search algorithm, which is a metaheuristic in the greedy randomized adaptive search framework.
arXiv Detail & Related papers (2022-03-28T21:34:16Z) - End-to-End Segmentation via Patch-wise Polygons Prediction [93.91375268580806]
The leading segmentation methods represent the output map as a pixel grid.
We study an alternative representation in which the object edges are modeled, per image patch, as a polygon with $k$ vertices that is coupled with per-patch label probabilities.
arXiv Detail & Related papers (2021-12-05T10:42:40Z) - PCAM: Product of Cross-Attention Matrices for Rigid Registration of
Point Clouds [79.99653758293277]
PCAM is a neural network whose key element is a pointwise product of cross-attention matrices.
We show that PCAM achieves state-of-the-art results among methods which, like us, solve steps (a) and (b) jointly via deepnets.
arXiv Detail & Related papers (2021-10-04T09:23:27Z) - An End-to-end Point of Interest (POI) Conflation Framework [0.966840768820136]
Point of interest (POI) data serves as a valuable source of semantic information for places of interest.
This study proposes a novel end-to-end POI conflation framework consisting of six steps.
arXiv Detail & Related papers (2021-09-13T15:50:48Z) - An Improved Approach for Estimating Social POI Boundaries With Textual
Attributes on Social Media [3.590202054885437]
It has been insufficiently explored how to perform density-based clustering by exploiting textual attributes on social media.
We present a new approach and algorithm, built upon our earlier work on social POI boundary estimation (SoBEst)
Our study is motivated by the following empirical observation: a fixed representative coordinate of each POI that SoBEst basically assumes may be far away from the centroid of the estimated social POI boundary for certain POIs.
arXiv Detail & Related papers (2020-12-18T00:41:44Z) - Sketch and Scale: Geo-distributed tSNE and UMAP [75.44887265789056]
Running machine learning analytics over geographically distributed datasets is a rapidly arising problem.
We introduce a novel framework: Sketch and Scale (SnS)
It leverages a Count Sketch data structure to compress the data on the edge nodes, aggregates the reduced size sketches on the master node, and runs vanilla tSNE or UMAP on the summary.
We show this technique to be fully parallel, scale linearly in time, logarithmically in memory, and communication, making it possible to analyze datasets with many millions, potentially billions of data points, spread across several data centers around the globe.
arXiv Detail & Related papers (2020-11-11T22:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.