Cross-Domain Generalization Through Memorization: A Study of Nearest
Neighbors in Neural Duplicate Question Detection
- URL: http://arxiv.org/abs/2011.11090v1
- Date: Sun, 22 Nov 2020 19:19:33 GMT
- Title: Cross-Domain Generalization Through Memorization: A Study of Nearest
Neighbors in Neural Duplicate Question Detection
- Authors: Yadollah Yaghoobzadeh, Alexandre Rochette and Timothy J. Hazen
- Abstract summary: Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems.
We leverage neural representations and study nearest neighbors for cross-domain generalization in DQD.
We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets.
- Score: 72.01292864036087
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Duplicate question detection (DQD) is important to increase efficiency of
community and automatic question answering systems. Unfortunately, gathering
supervised data in a domain is time-consuming and expensive, and our ability to
leverage annotations across domains is minimal. In this work, we leverage
neural representations and study nearest neighbors for cross-domain
generalization in DQD. We first encode question pairs of the source and target
domain in a rich representation space and then using a k-nearest neighbour
retrieval-based method, we aggregate the neighbors' labels and distances to
rank pairs. We observe robust performance of this method in different
cross-domain scenarios of StackExchange, Spring and Quora datasets,
outperforming cross-entropy classification in multiple cases.
Related papers
- Context-aware Domain Adaptation for Time Series Anomaly Detection [69.3488037353497]
Time series anomaly detection is a challenging task with a wide range of real-world applications.
Recent efforts have been devoted to time series domain adaptation to leverage knowledge from similar domains.
We propose a framework that combines context sampling and anomaly detection into a joint learning procedure.
arXiv Detail & Related papers (2023-04-15T02:28:58Z) - Exploiting Graph Structured Cross-Domain Representation for Multi-Domain
Recommendation [71.45854187886088]
Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer.
We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec.
We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-12T19:51:32Z) - Efficient Hierarchical Domain Adaptation for Pretrained Language Models [77.02962815423658]
Generative language models are trained on diverse, general domain corpora.
We introduce a method to scale domain adaptation to many diverse domains using a computationally efficient adapter approach.
arXiv Detail & Related papers (2021-12-16T11:09:29Z) - Adaptive Methods for Aggregated Domain Generalization [26.215904177457997]
In many settings, privacy concerns prohibit obtaining domain labels for the training data samples.
We propose a domain-adaptive approach to this problem, which operates in two steps.
Our approach achieves state-of-the-art performance on a variety of domain generalization benchmarks without using domain labels.
arXiv Detail & Related papers (2021-12-09T08:57:01Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Universal Cross-Domain Retrieval: Generalizing Across Classes and
Domains [27.920212868483702]
We propose SnMpNet, which incorporates two novel losses to account for the unseen classes and domains encountered during testing.
Specifically, we introduce a novel Semantic Neighborhood loss to bridge the knowledge gap between seen and unseen classes.
We also introduce a mix-up based supervision at image-level as well as semantic-level of the data for training with the Mixture Prediction loss.
arXiv Detail & Related papers (2021-08-18T19:21:04Z) - SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer
Matching Retrieval [24.77260903221371]
We introduce SPARTA, a novel neural retrieval method that shows great promise in performance, generalization, and interpretability for open-domain question answering.
SPARTA learns a sparse representation that can be efficiently implemented as an Inverted Index.
We validated our approaches on 4 open-domain question answering (OpenQA) tasks and 11 retrieval question answering (ReQA) tasks.
arXiv Detail & Related papers (2020-09-28T02:11:02Z) - Mind the Gap: Enlarging the Domain Gap in Open Set Domain Adaptation [65.38975706997088]
Open set domain adaptation (OSDA) assumes the presence of unknown classes in the target domain.
We show that existing state-of-the-art methods suffer a considerable performance drop in the presence of larger domain gaps.
We propose a novel framework to specifically address the larger domain gaps.
arXiv Detail & Related papers (2020-03-08T14:20:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.