Related papers: Unsupervised Learning under Latent Label Shift

Unsupervised Learning under Latent Label Shift

URL: http://arxiv.org/abs/2207.13179v1
Date: Tue, 26 Jul 2022 20:52:53 GMT
Title: Unsupervised Learning under Latent Label Shift
Authors: Manley Roberts, Pranav Mani, Saurabh Garg, Zachary C. Lipton
Abstract summary: We introduce unsupervised learning under Latent Label Shift (LLS) We show that our algorithm can leverage domain information to improve state of the art unsupervised classification methods.
Score: 21.508249151557244
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve state of the art unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.

Related papers

Bandit Multiclass List Classification [28.483435983018616]
We study the problem of multiclass list classification with (semi-)bandit feedback, where input examples are mapped into subsets of size $m$ of a collection of $K$ possible labels. Our main result is for the $(varepsilon,delta)$-PAC variant of the problem for which we design an algorithm that returns an $varepsilon$-optimal hypothesis.
arXiv Detail & Related papers (2025-02-13T12:13:25Z)
IT$^3$: Idempotent Test-Time Training [95.78053599609044]
This paper introduces Idempotent Test-Time Training (IT$3$), a novel approach to addressing the challenge of distribution shift. IT$3$ is based on the universal property of idempotence. We demonstrate the versatility of our approach across various tasks, including corrupted image classification.
arXiv Detail & Related papers (2024-10-05T15:39:51Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning. We propose a new learning paradigm that integrates both paired and unpaired data. Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
One-Bit Quantization and Sparsification for Multiclass Linear Classification with Strong Regularization [18.427215139020625]
We show that the best classification is achieved when $f(cdot) = |cdot|2$ and $lambda to infty$. It is often possible to find sparse and one-bit solutions that perform almost as well as one corresponding to $f(cdot) = |cdot|_infty$ in the large $lambda$ regime.
arXiv Detail & Related papers (2024-02-16T06:39:40Z)
Testable Learning with Distribution Shift [9.036777309376697]
We define a new model called testable learning with distribution shift. We obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. We give several positive results for learning concept classes such as halfspaces, intersections of halfspaces, and decision trees.
arXiv Detail & Related papers (2023-11-25T23:57:45Z)
Statistical learning on measures: an application to persistence diagrams [0.0]
We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $mathcalX$. We show that our framework allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams.
arXiv Detail & Related papers (2023-03-15T09:01:37Z)
HappyMap: A Generalized Multi-calibration Method [23.086009024383024]
Multi-calibration is a powerful and evolving concept originating in the field of algorithmic fairness. In this work, we view the term $(f(x)-y)$ as just one specific mapping, and explore the power of an enriched class of mappings. We propose textitHappyMap, a generalization of multi-calibration, which yields a wide range of new applications.
arXiv Detail & Related papers (2023-03-08T05:05:01Z)
Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact Supervision [53.530957567507365]
In some real-world tasks, each training sample is associated with a candidate label set that contains one ground-truth label and some false positive labels. In this paper, we formalize such problems as multi-instance partial-label learning (MIPL) Existing multi-instance learning algorithms and partial-label learning algorithms are suboptimal for solving MIPL problems.
arXiv Detail & Related papers (2022-12-18T03:28:51Z)
Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations [44.99833362998488]
Changes in the data distribution at test time can have deleterious effects on the performance of predictive models. We propose a test-time label shift correction that adapts to changes in the joint distribution $p(y, z)$ using EM applied to unlabeled samples.
arXiv Detail & Related papers (2022-11-28T18:52:33Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes. It allows computing statistical quantities that are in general difficult to compute. We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.