Deep Contextual Embeddings for Address Classification in E-commerce
- URL: http://arxiv.org/abs/2007.03020v1
- Date: Mon, 6 Jul 2020 19:06:34 GMT
- Title: Deep Contextual Embeddings for Address Classification in E-commerce
- Authors: Shreyas Mangalgi, Lakshya Kumar and Ravindra Babu Tallamraju
- Abstract summary: E-commerce customers in developing nations like India tend to follow no fixed format while entering shipping addresses.
It is imperative to understand the language of addresses, so that shipments can be routed without delays.
We propose a novel approach towards understanding customer addresses by deriving motivation from recent advances in Natural Language Processing (NLP)
- Score: 0.03222802562733786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: E-commerce customers in developing nations like India tend to follow no fixed
format while entering shipping addresses. Parsing such addresses is challenging
because of a lack of inherent structure or hierarchy. It is imperative to
understand the language of addresses, so that shipments can be routed without
delays. In this paper, we propose a novel approach towards understanding
customer addresses by deriving motivation from recent advances in Natural
Language Processing (NLP). We also formulate different pre-processing steps for
addresses using a combination of edit distance and phonetic algorithms. Then we
approach the task of creating vector representations for addresses using
Word2Vec with TF-IDF, Bi-LSTM and BERT based approaches. We compare these
approaches with respect to sub-region classification task for North and South
Indian cities. Through experiments, we demonstrate the effectiveness of
generalized RoBERTa model, pre-trained over a large address corpus for language
modelling task. Our proposed RoBERTa model achieves a classification accuracy
of around 90% with minimal text preprocessing for sub-region classification
task outperforming all other approaches. Once pre-trained, the RoBERTa model
can be fine-tuned for various downstream tasks in supply chain like pincode
suggestion and geo-coding. The model generalizes well for such tasks even with
limited labelled data. To the best of our knowledge, this is the first of its
kind research proposing a novel approach of understanding customer addresses in
e-commerce domain by pre-training language models and fine-tuning them for
different purposes.
Related papers
- A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus [71.77214818319054]
Natural language inference is a proxy for natural language understanding.
There is no publicly available NLI corpus for the Romanian language.
We introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs.
arXiv Detail & Related papers (2024-05-20T08:41:15Z) - A Parameter-Efficient Learning Approach to Arabic Dialect Identification
with Pre-Trained General-Purpose Speech Model [9.999900422312098]
We develop a token-level label mapping to condition the GSM for Arabic Dialect Identification (ADI)
We achieve new state-of-the-art accuracy on the ADI-17 dataset by vanilla fine-tuning.
Our study demonstrates how to identify Arabic dialects using a small dataset and limited with open source code and pre-trained models.
arXiv Detail & Related papers (2023-05-18T18:15:53Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text
Generation in E-commerce Title and Review Summarization [14.414693156937782]
We propose a novel domain-specific generative pre-training (DS-GPT) method for text generation.
We apply it to the product titleand review summarization problems on E-commerce mobile display.
arXiv Detail & Related papers (2021-12-15T19:02:49Z) - Multinational Address Parsing: A Zero-Shot Evaluation [0.3211619859724084]
Address parsing consists of identifying the segments that make up an address, such as a street name or a postal code.
Previous work on neural networks has only focused on parsing addresses from a single source country.
This paper explores the possibility of transferring the address parsing knowledge acquired by training deep learning models on some countries' addresses to others.
arXiv Detail & Related papers (2021-12-07T21:40:43Z) - Structured Prediction as Translation between Augmented Natural Languages [109.50236248762877]
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks.
Instead of tackling the problem by training task-specific discriminatives, we frame it as a translation task between augmented natural languages.
Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction.
arXiv Detail & Related papers (2021-01-14T18:32:21Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Leveraging Subword Embeddings for Multinational Address Parsing [0.0764671395172401]
We build a single model capable of learning to parse addresses from multiple countries at the same time.
We achieve accuracies around 99 % on the countries used for training with no pre-processing nor post-processing needed.
We explore the possibility of transferring the address parsing knowledge obtained by training on some countries' addresses to others with no further training in a zero-shot transfer learning setting.
arXiv Detail & Related papers (2020-06-29T16:14:27Z) - Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages [112.65994041398481]
We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
arXiv Detail & Related papers (2020-01-30T16:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.