Is ChatGPT a game changer for geocoding -- a benchmark for geocoding
address parsing techniques
- URL: http://arxiv.org/abs/2310.14360v4
- Date: Fri, 15 Dec 2023 08:19:59 GMT
- Title: Is ChatGPT a game changer for geocoding -- a benchmark for geocoding
address parsing techniques
- Authors: Zhengcong Yin, Diya Li, Daniel W. Goldberg
- Abstract summary: We introduce a benchmark dataset of low-quality address descriptions synthesized based on human input patterns mining from actual input logs of a geocoding system in production.
This dataset has 21 different input errors and variations; contains over 239,000 address records that are uniquely selected from streets across all U.S. 50 states and D.C.
We train and gauge the performance of the GPT-3 model in extracting address components, contrasting its performance with transformer-based and LSTM-based models.
- Score: 3.759936323189418
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The remarkable success of GPT models across various tasks, including toponymy
recognition motivates us to assess the performance of the GPT-3 model in the
geocoding address parsing task. To ensure that the evaluation more accurately
mirrors performance in real-world scenarios with diverse user input qualities
and resolve the pressing need for a 'gold standard' evaluation dataset for
geocoding systems, we introduce a benchmark dataset of low-quality address
descriptions synthesized based on human input patterns mining from actual input
logs of a geocoding system in production. This dataset has 21 different input
errors and variations; contains over 239,000 address records that are uniquely
selected from streets across all U.S. 50 states and D.C.; and consists of three
subsets to be used as training, validation, and testing sets. Building on this,
we train and gauge the performance of the GPT-3 model in extracting address
components, contrasting its performance with transformer-based and LSTM-based
models. The evaluation results indicate that Bidirectional LSTM-CRF model has
achieved the best performance over these transformer-based models and GPT-3
model. Transformer-based models demonstrate very comparable results compared to
the Bidirectional LSTM-CRF model. The GPT-3 model, though trailing in
performance, showcases potential in the address parsing task with few-shot
examples, exhibiting room for improvement with additional fine-tuning. We open
source the code and data of this presented benchmark so that researchers can
utilize it for future model development or extend it to evaluate similar tasks,
such as document geocoding.
Related papers
- Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications [0.8999666725996975]
This paper proposes a Question-Answering (QA) system for the telecom domain using 3rd Generation Partnership Project technical documents.
A hybrid dataset, Telco-DPR, is presented, combining text and tables, and includes a set of synthetic question/answer pairs.
The retrieval models are evaluated and compared using top-K accuracy and Mean Reciprocal Rank (MRR)
The proposed QA system, using the developed RAG model and the Generative Pretrained Transformer (GPT)-4, achieves a 14% improvement in answer accuracy.
arXiv Detail & Related papers (2024-10-15T16:37:18Z) - cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation [0.7853804618032806]
We present a novel conditional model in the Generative Adrial Network framework for simulating multiple classes of time-domain observations.
Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three classes.
Our experiments show that training convolutional neural networks with our cDVGAN-generated data improves the detection of samples embedded in detector noise.
arXiv Detail & Related papers (2024-01-29T17:59:26Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - TSI-GAN: Unsupervised Time Series Anomaly Detection using Convolutional
Cycle-Consistent Generative Adversarial Networks [2.4469484645516837]
Anomaly detection is widely used in network intrusion detection, autonomous driving, medical diagnosis, credit card frauds, etc.
This paper proposes TSI-GAN, an unsupervised anomaly detection model for time-series that can learn complex temporal patterns automatically.
We evaluate TSI-GAN using 250 well-curated and harder-than-usual datasets and compare with 8 state-of-the-art baseline methods.
arXiv Detail & Related papers (2023-03-22T23:24:47Z) - TopoBERT: Plug and Play Toponym Recognition Module Harnessing Fine-tuned
BERT [11.446721140340575]
TopoBERT, a toponym recognition module based on a one dimensional Convolutional Neural Network (CNN1D) and Bidirectional Representation from Transformers (BERT), is proposed and fine-tuned.
TopoBERT achieves state-of-the-art performance compared to the other five baseline models and can be applied to diverse toponym recognition tasks without additional training.
arXiv Detail & Related papers (2023-01-31T13:44:34Z) - Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files.
We show that our URL transformer model requires a different training approach to reach high performance levels.
We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z) - Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based
Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems.
GREW is the first large-scale dataset for gait recognition in the wild.
SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z) - Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D
Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on.
We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z) - When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable.
In order to achieve a better accuracy, we propose two lightweight modules.
DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers.
QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.