Semi-Supervised Clustering with Inaccurate Pairwise Annotations
- URL: http://arxiv.org/abs/2104.02146v1
- Date: Mon, 5 Apr 2021 20:37:00 GMT
- Title: Semi-Supervised Clustering with Inaccurate Pairwise Annotations
- Authors: Daniel Gribel, Michel Gendreau, Thibaut Vidal
- Abstract summary: This paper presents a clustering model that incorporates pairwise annotations in the form of must-link and cannot-link relations.
We also extend the model to integrate prior knowledge of experts' accuracy and discuss circumstances in which the use of this knowledge is beneficial.
- Score: 3.7384509727711923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pairwise relational information is a useful way of providing partial
supervision in domains where class labels are difficult to acquire. This work
presents a clustering model that incorporates pairwise annotations in the form
of must-link and cannot-link relations and considers possible annotation
inaccuracies (i.e., a common setting when experts provide pairwise
supervision). We propose a generative model that assumes Gaussian-distributed
data samples along with must-link and cannot-link relations generated by
stochastic block models. We adopt a maximum-likelihood approach and demonstrate
that, even when supervision is weak and inaccurate, accounting for relational
information significantly improves clustering performance. Relational
information also helps to detect meaningful groups in real-world datasets that
do not fit the original data-distribution assumptions. Additionally, we extend
the model to integrate prior knowledge of experts' accuracy and discuss
circumstances in which the use of this knowledge is beneficial.
Related papers
- Out of spuriousity: Improving robustness to spurious correlations without group annotations [2.592470112714595]
We propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations.
The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork in a fully trained dense network.
arXiv Detail & Related papers (2024-07-20T20:24:14Z) - Unsupervised Concept Discovery Mitigates Spurious Correlations [45.48778210340187]
Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.
In this paper, we establish a novel connection between unsupervised object-centric learning and mitigation of spurious correlations.
We introduce CoBalT: a concept balancing technique that effectively mitigates spurious correlations without requiring human labeling of subgroups.
arXiv Detail & Related papers (2024-02-20T20:48:00Z) - Open Set Relation Extraction via Unknown-Aware Training [72.10462476890784]
We propose an unknown-aware training method, regularizing the model by dynamically synthesizing negative instances.
Inspired by text adversarial attacks, we adaptively apply small but critical perturbations to original training instances.
Experimental results show that this method achieves SOTA unknown relation detection without compromising the classification of known relations.
arXiv Detail & Related papers (2023-06-08T05:45:25Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - A Relation-Oriented Clustering Method for Open Relation Extraction [18.20811491136624]
We propose a relation-oriented clustering model and use it to identify the novel relations in the unlabeled data.
We minimize distance between the instance with same relation by gathering the instances towards their corresponding relation centroids.
Experimental results show that our method reduces the error rate by 29.2% and 15.7%, on two datasets respectively.
arXiv Detail & Related papers (2021-09-15T10:46:39Z) - Learning Bias-Invariant Representation by Cross-Sample Mutual
Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task.
The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator.
We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z) - Deep Conditional Gaussian Mixture Model for Constrained Clustering [7.070883800886882]
Constrained clustering can leverage prior information on a growing amount of only partially labeled data.
We propose a novel framework for constrained clustering that is intuitive, interpretable, and can be trained efficiently in the framework of gradient variational inference.
arXiv Detail & Related papers (2021-06-11T13:38:09Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Clustering-based Unsupervised Generative Relation Extraction [3.342376225738321]
We propose a Clustering-based Unsupervised generative Relation Extraction framework (CURE)
We use an "Encoder-Decoder" architecture to perform self-supervised learning so the encoder can extract relation information.
Our model performs better than state-of-the-art models on both New York Times (NYT) and United Nations Parallel Corpus (UNPC) standard datasets.
arXiv Detail & Related papers (2020-09-26T20:36:40Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - SelfORE: Self-supervised Relational Feature Learning for Open Relation
Extraction [60.08464995629325]
Open-domain relation extraction is the task of extracting open-domain relation facts from natural language sentences.
We proposed a self-supervised framework named SelfORE, which exploits weak, self-supervised signals.
Experimental results on three datasets show the effectiveness and robustness of SelfORE.
arXiv Detail & Related papers (2020-04-06T07:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.