Comparing Apples to Oranges: Learning Similarity Functions for Data
Produced by Different Distributions
- URL: http://arxiv.org/abs/2208.12731v2
- Date: Mon, 23 Oct 2023 13:27:16 GMT
- Title: Comparing Apples to Oranges: Learning Similarity Functions for Data
Produced by Different Distributions
- Authors: Leonidas Tsepenekas, Ivan Brugere, Freddy Lecue, Daniele Magazzeni
- Abstract summary: We present an efficient sampling framework that learns these across-groups similarity functions.
We show analytical results with rigorous theoretical bounds, and empirically validate our algorithms.
- Score: 6.906621279967866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Similarity functions measure how comparable pairs of elements are, and play a
key role in a wide variety of applications, e.g., notions of Individual
Fairness abiding by the seminal paradigm of Dwork et al., as well as Clustering
problems. However, access to an accurate similarity function should not always
be considered guaranteed, and this point was even raised by Dwork et al. For
instance, it is reasonable to assume that when the elements to be compared are
produced by different distributions, or in other words belong to different
``demographic'' groups, knowledge of their true similarity might be very
difficult to obtain. In this work, we present an efficient sampling framework
that learns these across-groups similarity functions, using only a limited
amount of experts' feedback. We show analytical results with rigorous
theoretical bounds, and empirically validate our algorithms via a large suite
of experiments.
Related papers
- Collaborative Learning with Different Labeling Functions [7.228285747845779]
We study a variant of Collaborative PAC Learning, in which we aim to learn an accurate classifier for each of the $n$ data distributions.
We show that, when the data distributions satisfy a weaker realizability assumption, sample-efficient learning is still feasible.
arXiv Detail & Related papers (2024-02-16T04:32:22Z) - Comparing Feature Importance and Rule Extraction for Interpretability on
Text Data [7.893831644671976]
We show that using different methods can lead to unexpectedly different explanations.
To quantify this effect, we propose a new approach to compare explanations produced by different methods.
arXiv Detail & Related papers (2022-07-04T13:54:55Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - Instance Similarity Learning for Unsupervised Feature Representation [83.31011038813459]
We propose an instance similarity learning (ISL) method for unsupervised feature representation.
We employ the Generative Adversarial Networks (GAN) to mine the underlying feature manifold.
Experiments on image classification demonstrate the superiority of our method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-08-05T16:42:06Z) - Investigate the Essence of Long-Tailed Recognition from a Unified
Perspective [11.080317683184363]
deep recognition models often suffer from long-tailed data distributions due to heavy imbalanced sample number across categories.
In this work, we demonstrate that long-tailed recognition suffers from both sample number and category similarity.
arXiv Detail & Related papers (2021-07-08T11:08:40Z) - An Empirical Comparison of Instance Attribution Methods for NLP [62.63504976810927]
We evaluate the degree to which different potential instance attribution agree with respect to the importance of training samples.
We find that simple retrieval methods yield training instances that differ from those identified via gradient-based methods.
arXiv Detail & Related papers (2021-04-09T01:03:17Z) - Unsupervised Feature Learning by Cross-Level Instance-Group
Discrimination [68.83098015578874]
We integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination.
CLD effectively brings unsupervised learning closer to natural data and real-world applications.
New state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, and beats MoCo v2 and SimCLR on every reported performance.
arXiv Detail & Related papers (2020-08-09T21:13:13Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z) - Pairwise Supervision Can Provably Elicit a Decision Boundary [84.58020117487898]
Similarity learning is a problem to elicit useful representations by predicting the relationship between a pair of patterns.
We show that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.
arXiv Detail & Related papers (2020-06-11T05:35:16Z) - Building and Interpreting Deep Similarity Models [0.0]
We propose to make similarities interpretable by augmenting them with an explanation in terms of input features.
We develop BiLRP, a scalable and theoretically founded method to systematically decompose similarity scores on pairs of input features.
arXiv Detail & Related papers (2020-03-11T17:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.