Achieving Unbiased Multi-Instance Learning via Balanced Fine-Grained Positive-Unlabeled Learning
- URL: http://arxiv.org/abs/2503.13562v2
- Date: Tue, 17 Jun 2025 04:34:01 GMT
- Title: Achieving Unbiased Multi-Instance Learning via Balanced Fine-Grained Positive-Unlabeled Learning
- Authors: Lin-Han Jia, Lan-Zhe Guo, Zhi Zhou, Si-Ye Han, Zi-Wen Li, Yu-Feng Li,
- Abstract summary: In real-world applications, it is often challenging to detect anomalous samples when the information they contain is extremely limited.<n>In this study, we observe that the MIL problem can be transformed into a fine-grained Positive-Unlabeled (PU) learning problem.<n>This transformation allows us to address the imbalance issue in an unbiased manner using a micro-level balancing mechanism.
- Score: 46.44686264442672
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In real-world applications, it is often challenging to detect anomalous samples when the anomalous information they contain is extremely limited. In such cases, both macro-level and micro-level detection using multi-instance learning (MIL) encounter significant difficulties. The former struggles because normal and anomalous samples are highly similar and hard to distinguish at the macro level, while the latter is limited by the lack of labels at the micro level. In MIL, micro-level labels are inferred from macro-level labels, which can lead to severe bias. Moreover, the more imbalanced the distribution between normal and anomalous samples, the more pronounced these limitations become. In this study, we observe that the MIL problem can be elegantly transformed into a fine-grained Positive-Unlabeled (PU) learning problem. This transformation allows us to address the imbalance issue in an unbiased manner using a micro-level balancing mechanism. To this end, we propose a novel framework-Balanced Fine-Grained Positive-Unlabeled (BFGPU)-based on rigorous theoretical foundations to address the challenges above. Extensive experiments on both public and real-world datasets demonstrate the effectiveness of BFGPU, which outperforms existing methods, even in extreme scenarios where both macro and micro-level distributions are highly imbalanced. The code is open-sourced at https://github.com/BFGPU/BFGPU.
Related papers
- FairSAM: Fair Classification on Corrupted Data Through Sharpness-Aware Minimization [12.178322948983263]
Image classification models trained on clean data often suffer from significant performance degradation when exposed to testing corrupted data.
This degradation not only impacts overall performance but also disproportionately affects various demographic subgroups, raising critical algorithmic bias concerns.
Existing fairness-aware machine learning methods aim to reduce performance disparities but hardly maintain robust and equitable accuracy when faced with data corruption.
We propose textbfFairSAM, a new framework that integrates underlineFairness-oriented strategies into underlineSAM to deliver equalized performance across demographic groups under corrupted conditions.
arXiv Detail & Related papers (2025-03-29T01:51:59Z) - Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy [17.610305828703957]
Language models are strong few-shot learners and achieve good overall accuracy in text classification tasks.<n>We propose a post-hoc nonlinear integer programming based debiasing method to enable flexible rectifications of class probabilities.<n>Our approach achieves state-of-the-art overall accuracy gains with balanced class accuracies.
arXiv Detail & Related papers (2025-03-07T05:34:31Z) - Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training [14.16923025335549]
Multiple instance learning (MIL) problem is currently solved from either bag-classification or instance-classification perspective.
We formulate MIL as a semi-supervised instance classification problem, so that all the labeled and unlabeled instances can be fully utilized.
We propose a weakly-supervised self-training method, in which we utilize the positive bag labels to construct a global constraint.
arXiv Detail & Related papers (2024-08-09T01:53:41Z) - MAPL: Memory Augmentation and Pseudo-Labeling for Semi-Supervised Anomaly Detection [0.0]
A new meth-odology for detecting surface defects in in-dustrial settings is introduced, referred to as Memory Augmentation and Pseudo-Labeling(MAPL)<n>The methodology first in-troduces an anomaly simulation strategy, which significantly improves the model's ability to recognize rare or unknown anom-aly types.<n>An end-to-end learning framework is employed by MAPL to identify the abnormal regions directly from the input data.
arXiv Detail & Related papers (2024-05-10T02:26:35Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem.
We propose a consistent approach that does not rely on the uniform distribution assumption.
We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z) - MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly
Detection [124.52227588930543]
Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications.
An inconspicuous yet powerful statistics model, the normalizing flows, is appropriate for anomaly detection and localization in an unsupervised fashion.
We propose a novel Multi-Scale Flow-based framework dubbed MSFlow composed of asymmetrical parallel flows followed by a fusion flow.
Our MSFlow achieves a new state-of-the-art with a detection AUORC score of up to 99.7%, localization AUCROC score of 98.8%, and PRO score of 97.1%.
arXiv Detail & Related papers (2023-08-29T13:38:35Z) - RoSAS: Deep Semi-Supervised Anomaly Detection with
Contamination-Resilient Continuous Supervision [21.393509817509464]
This paper proposes a novel semi-supervised anomaly detection method, which devises textitcontamination-resilient continuous supervisory signals
Our approach significantly outperforms state-of-the-art competitors by 20%-30% in AUC-PR.
arXiv Detail & Related papers (2023-07-25T04:04:49Z) - Revisiting Class Imbalance for End-to-end Semi-Supervised Object
Detection [1.6249267147413524]
Semi-supervised object detection (SSOD) has made significant progress with the development of pseudo-label-based end-to-end methods.
Many methods face challenges due to class imbalance, which hinders the effectiveness of the pseudo-label generator.
In this paper, we examine the root causes of low-quality pseudo-labels and present novel learning mechanisms to improve the label generation quality.
arXiv Detail & Related papers (2023-06-04T06:01:53Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly
Detection [74.80595632328094]
Multiple Instance Learning (MIL) is prevailing in Weakly Supervised Video Anomaly Detection (WSVAD)
We propose a new MIL framework: Unbiased MIL (UMIL), to learn unbiased anomaly features that improve WSVAD.
arXiv Detail & Related papers (2023-03-22T08:11:22Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Augment to Detect Anomalies with Continuous Labelling [10.646747658653785]
Anomaly detection is to recognize samples that differ in some respect from the training observations.
Recent state-of-the-art deep learning-based anomaly detection methods suffer from high computational cost, complexity, unstable training procedures, and non-trivial implementation.
We leverage a simple learning procedure that trains a lightweight convolutional neural network, reaching state-of-the-art performance in anomaly detection.
arXiv Detail & Related papers (2022-07-03T20:11:51Z) - Uncertainty-aware Pseudo-label Selection for Positive-Unlabeled Learning [10.014356492742074]
We propose to tackle the issues of imbalanced datasets and model calibration in a positive-unlabeled learning setting.
By boosting the signal from the minority class, pseudo-labeling expands the labeled dataset with new samples from the unlabeled set.
Within a series of experiments, PUUPL yields substantial performance gains in highly imbalanced settings.
arXiv Detail & Related papers (2022-01-31T12:55:47Z) - SLA$^2$P: Self-supervised Anomaly Detection with Adversarial
Perturbation [77.71161225100927]
Anomaly detection is a fundamental yet challenging problem in machine learning.
We propose a novel and powerful framework, dubbed as SLA$2$P, for unsupervised anomaly detection.
arXiv Detail & Related papers (2021-11-25T03:53:43Z) - Toward Deep Supervised Anomaly Detection: Reinforcement Learning from
Partially Labeled Anomaly Data [150.9270911031327]
We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset.
Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data.
We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies.
arXiv Detail & Related papers (2020-09-15T03:05:39Z) - Towards Discriminability and Diversity: Batch Nuclear-norm Maximization
under Label Insufficient Situations [154.51144248210338]
Batch Nuclear-norm Maximization (BNM) is proposed to boost the learning under label insufficient learning scenarios.
BNM outperforms competitors and works well with existing well-known methods.
arXiv Detail & Related papers (2020-03-27T05:04:24Z) - On Positive-Unlabeled Classification in GAN [130.43248168149432]
This paper defines a positive and unlabeled classification problem for standard GANs.
It then leads to a novel technique to stabilize the training of the discriminator in GANs.
arXiv Detail & Related papers (2020-02-04T05:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.