TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored
Search
- URL: http://arxiv.org/abs/2101.06323v2
- Date: Tue, 9 Feb 2021 17:29:32 GMT
- Title: TextGNN: Improving Text Encoder via Graph Neural Network in Sponsored
Search
- Authors: Jason Yue Zhu, Yanling Cui, Yuming Liu, Hao Sun, Xue Li, Markus
Pelger, Tianqi Yang, Liangjie Zhang, Ruofei Zhang, Huasha Zhao
- Abstract summary: We propose a TextGNN model that naturally extends the strong twin tower structured encoders with the complementary graph information from user historical behaviors.
In offline experiments, the model achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for long-tail low-frequency Ads.
In online A/B testing, the model shows a 2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate.
- Score: 11.203006652211075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text encoders based on C-DSSM or transformers have demonstrated strong
performance in many Natural Language Processing (NLP) tasks. Low latency
variants of these models have also been developed in recent years in order to
apply them in the field of sponsored search which has strict computational
constraints. However these models are not the panacea to solve all the Natural
Language Understanding (NLU) challenges as the pure semantic information in the
data is not sufficient to fully identify the user intents. We propose the
TextGNN model that naturally extends the strong twin tower structured encoders
with the complementary graph information from user historical behaviors, which
serves as a natural guide to help us better understand the intents and hence
generate better language representations. The model inherits all the benefits
of twin tower models such as C-DSSM and TwinBERT so that it can still be used
in the low latency environment while achieving a significant performance gain
than the strong encoder-only counterpart baseline models in both offline
evaluations and online production system. In offline experiments, the model
achieves a 0.14% overall increase in ROC-AUC with a 1% increased accuracy for
long-tail low-frequency Ads, and in the online A/B testing, the model shows a
2.03% increase in Revenue Per Mille with a 2.32% decrease in Ad defect rate.
Related papers
- Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens [53.99177152562075]
Scaling up autoregressive models in vision has not proven as beneficial as in large language models.
We focus on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed order using BERT- or GPT-like transformer architectures.
Our results show that while all models scale effectively in terms of validation loss, their evaluation performance -- measured by FID, GenEval score, and visual quality -- follows different trends.
arXiv Detail & Related papers (2024-10-17T17:59:59Z) - Camouflage is all you need: Evaluating and Enhancing Language Model
Robustness Against Camouflage Adversarial Attacks [53.87300498478744]
Adversarial attacks represent a substantial challenge in Natural Language Processing (NLP)
This study undertakes a systematic exploration of this challenge in two distinct phases: vulnerability evaluation and resilience enhancement.
Results suggest a trade-off between performance and robustness, with some models maintaining similar performance while gaining robustness.
arXiv Detail & Related papers (2024-02-15T10:58:22Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - 2D Self-Organized ONN Model For Handwritten Text Recognition [4.66970207245168]
This study proposes the 2D Self-organized ONNs (Self-ONNs) in the core of a novel network model.
Deformable convolutions, which have recently been demonstrated to tackle variations in the writing styles better, are utilized in this study.
Results show that the proposed model with the operational layers of Self-ONNs significantly improves Character Error Rate (CER) and Word Error Rate (WER)
arXiv Detail & Related papers (2022-07-17T11:18:20Z) - Characterizing and Understanding the Behavior of Quantized Models for
Reliable Deployment [32.01355605506855]
Quantization-aware training can produce more stable models than standard, adversarial, and Mixup training.
Disagreements often have closer top-1 and top-2 output probabilities, and $Margin$ is a better indicator than the other uncertainty metrics to distinguish disagreements.
We opensource our code and models as a new benchmark for further studying the quantized models.
arXiv Detail & Related papers (2022-04-08T11:19:16Z) - A Likelihood Ratio based Domain Adaptation Method for E2E Models [10.510472957585646]
End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants.
While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem.
In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities.
arXiv Detail & Related papers (2022-01-10T21:22:39Z) - Error Detection in Large-Scale Natural Language Understanding Systems
Using Transformer Models [0.0]
Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition.
We address this challenge to detect domain classification errors using offline Transformer models.
We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors.
arXiv Detail & Related papers (2021-09-04T00:10:48Z) - WNARS: WFST based Non-autoregressive Streaming End-to-End Speech
Recognition [59.975078145303605]
We propose a novel framework, namely WNARS, using hybrid CTC-attention AED models and weighted finite-state transducers.
On the AISHELL-1 task, our WNARS achieves a character error rate of 5.22% with 640ms latency, to the best of our knowledge, which is the state-of-the-art performance for online ASR.
arXiv Detail & Related papers (2021-04-08T07:56:03Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.