Related papers: Is ChatGPT a game changer for geocoding -- a benchmark for geocoding address parsing techniques

Is ChatGPT a game changer for geocoding -- a benchmark for geocoding address parsing techniques

URL: http://arxiv.org/abs/2310.14360v4
Date: Fri, 15 Dec 2023 08:19:59 GMT
Title: Is ChatGPT a game changer for geocoding -- a benchmark for geocoding address parsing techniques
Authors: Zhengcong Yin, Diya Li, Daniel W. Goldberg
Abstract summary: We introduce a benchmark dataset of low-quality address descriptions synthesized based on human input patterns mining from actual input logs of a geocoding system in production. This dataset has 21 different input errors and variations; contains over 239,000 address records that are uniquely selected from streets across all U.S. 50 states and D.C. We train and gauge the performance of the GPT-3 model in extracting address components, contrasting its performance with transformer-based and LSTM-based models.
Score: 3.759936323189418
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The remarkable success of GPT models across various tasks, including toponymy recognition motivates us to assess the performance of the GPT-3 model in the geocoding address parsing task. To ensure that the evaluation more accurately mirrors performance in real-world scenarios with diverse user input qualities and resolve the pressing need for a 'gold standard' evaluation dataset for geocoding systems, we introduce a benchmark dataset of low-quality address descriptions synthesized based on human input patterns mining from actual input logs of a geocoding system in production. This dataset has 21 different input errors and variations; contains over 239,000 address records that are uniquely selected from streets across all U.S. 50 states and D.C.; and consists of three subsets to be used as training, validation, and testing sets. Building on this, we train and gauge the performance of the GPT-3 model in extracting address components, contrasting its performance with transformer-based and LSTM-based models. The evaluation results indicate that Bidirectional LSTM-CRF model has achieved the best performance over these transformer-based models and GPT-3 model. Transformer-based models demonstrate very comparable results compared to the Bidirectional LSTM-CRF model. The GPT-3 model, though trailing in performance, showcases potential in the address parsing task with few-shot examples, exhibiting room for improvement with additional fine-tuning. We open source the code and data of this presented benchmark so that researchers can utilize it for future model development or extend it to evaluate similar tasks, such as document geocoding.

Related papers

Telco-DPR: A Hybrid Dataset for Evaluating Retrieval Models of 3GPP Technical Specifications [0.8999666725996975]
This paper proposes a Question-Answering (QA) system for the telecom domain using 3rd Generation Partnership Project technical documents. A hybrid dataset, Telco-DPR, is presented, combining text and tables, and includes a set of synthetic question/answer pairs. The retrieval models are evaluated and compared using top-K accuracy and Mean Reciprocal Rank (MRR) The proposed QA system, using the developed RAG model and the Generative Pretrained Transformer (GPT)-4, achieves a 14% improvement in answer accuracy.
arXiv Detail & Related papers (2024-10-15T16:37:18Z)
cDVGAN: One Flexible Model for Multi-class Gravitational Wave Signal and Glitch Generation [0.7853804618032806]
We present a novel conditional model in the Generative Adrial Network framework for simulating multiple classes of time-domain observations. Our proposed cDVGAN outperforms 4 different baseline GAN models in replicating the features of the three classes. Our experiments show that training convolutional neural networks with our cDVGAN-generated data improves the detection of samples embedded in detector noise.
arXiv Detail & Related papers (2024-01-29T17:59:26Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
TSI-GAN: Unsupervised Time Series Anomaly Detection using Convolutional Cycle-Consistent Generative Adversarial Networks [2.4469484645516837]
Anomaly detection is widely used in network intrusion detection, autonomous driving, medical diagnosis, credit card frauds, etc. This paper proposes TSI-GAN, an unsupervised anomaly detection model for time-series that can learn complex temporal patterns automatically. We evaluate TSI-GAN using 250 well-curated and harder-than-usual datasets and compare with 8 state-of-the-art baseline methods.
arXiv Detail & Related papers (2023-03-22T23:24:47Z)
TopoBERT: Plug and Play Toponym Recognition Module Harnessing Fine-tuned BERT [11.446721140340575]
TopoBERT, a toponym recognition module based on a one dimensional Convolutional Neural Network (CNN1D) and Bidirectional Representation from Transformers (BERT), is proposed and fine-tuned. TopoBERT achieves state-of-the-art performance compared to the other five baseline models and can be applied to diverse toponym recognition tasks without additional training.
arXiv Detail & Related papers (2023-01-31T13:44:34Z)
Transformers for End-to-End InfoSec Tasks: A Feasibility Study [6.847381178288385]
We implement transformer models for two distinct InfoSec data formats - specifically URLs and PE files. We show that our URL transformer model requires a different training approach to reach high performance levels. We demonstrate that this approach performs comparably to well-established malware detection models on benchmark PE file datasets.
arXiv Detail & Related papers (2022-12-05T23:50:46Z)
Gait Recognition in the Wild: A Large-scale Benchmark and NAS-based Baseline [95.88825497452716]
Gait benchmarks empower the research community to train and evaluate high-performance gait recognition systems. GREW is the first large-scale dataset for gait recognition in the wild. SPOSGait is the first NAS-based gait recognition model.
arXiv Detail & Related papers (2022-05-05T14:57:39Z)
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection [85.11649974840758]
3D object detection networks tend to be biased towards the data they are trained on. We propose a single-frame approach for source-free, unsupervised domain adaptation of lidar-based 3D object detectors.
arXiv Detail & Related papers (2021-11-30T18:42:42Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings. We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data. We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z)
Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words" Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.