Joint Enhancement and Classification using Coupled Diffusion Models of Signals and Logits
- URL: http://arxiv.org/abs/2602.15405v1
- Date: Tue, 17 Feb 2026 07:25:41 GMT
- Title: Joint Enhancement and Classification using Coupled Diffusion Models of Signals and Logits
- Authors: Gilad Nurko, Roi Benita, Yehoshua Dissen, Tomohiro Nakatani, Marc Delcroix, Shoko Araki, Joseph Keshet,
- Abstract summary: We propose a general, domain-agnostic framework that integrates two interacting diffusion models.<n>We introduce three strategies to effectively model the joint distribution of the input and the logit.<n>The proposed framework surpasses traditional sequential enhancement baselines, delivering robust and flexible improvements in classification accuracy under diverse noise conditions.
- Score: 45.99435851580745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robust classification in noisy environments remains a fundamental challenge in machine learning. Standard approaches typically treat signal enhancement and classification as separate, sequential stages: first enhancing the signal and then applying a classifier. This approach fails to leverage the semantic information in the classifier's output during denoising. In this work, we propose a general, domain-agnostic framework that integrates two interacting diffusion models: one operating on the input signal and the other on the classifier's output logits, without requiring any retraining or fine-tuning of the classifier. This coupled formulation enables mutual guidance, where the enhancing signal refines the class estimation and, conversely, the evolving class logits guide the signal reconstruction towards discriminative regions of the manifold. We introduce three strategies to effectively model the joint distribution of the input and the logit. We evaluated our joint enhancement method for image classification and automatic speech recognition. The proposed framework surpasses traditional sequential enhancement baselines, delivering robust and flexible improvements in classification accuracy under diverse noise conditions.
Related papers
- Self-Ensemble Post Learning for Noisy Domain Generalization [18.4218677759831]
This paper explores how to make existing methods rework when meeting noise.<n>We find that the latent features inside the model have certain discriminative capabilities.<n>We propose the Self-Ensemble Post Learning approach to diversify features which can be leveraged.
arXiv Detail & Related papers (2025-12-11T17:09:35Z) - EReLiFM: Evidential Reliability-Aware Residual Flow Meta-Learning for Open-Set Domain Generalization under Noisy Labels [85.78886153628663]
Open-Set Domain Generalization aims to enable deep learning models to recognize unseen categories in new domains.<n>Label noise hinders open-set domain generalization by corrupting source-domain knowledge.<n>We propose Evidential Reliability-Aware Residual Flow Meta-Learning (EReLiFM) to bridge domain gaps.
arXiv Detail & Related papers (2025-10-14T16:23:11Z) - DiscRec: Disentangled Semantic-Collaborative Modeling for Generative Recommendation [33.152693125551785]
Generative recommendation is emerging as a powerful paradigm that directly generates item predictions.<n>Current methods face two key challenges: token-item misalignment and semantic-collaborative signal entanglement.<n>We propose DiscRec, a novel framework that enables Disentangled Semantic-Collaborative signal modeling.
arXiv Detail & Related papers (2025-06-18T15:53:47Z) - Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries.<n>We propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pretrained denoising diffusion model and the real data distribution.
arXiv Detail & Related papers (2025-03-13T17:59:59Z) - Audio Contrastive-based Fine-tuning: Decoupling Representation Learning and Classification [26.82307246813389]
We propose a disentangled two-stage framework that separates representation refinement from downstream evaluation.<n>First, we employ a "contrastive-tuning" stage to explicitly improve the geometric structure of the model's embedding space.<n>We then introduce a dual-probe evaluation protocol to assess the quality of these refined representations from a geometric perspective.
arXiv Detail & Related papers (2023-09-21T08:59:13Z) - DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers [6.131022957085439]
We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers.
Applying DiffAug to a given example consists of one forward-diffusion step followed by one reverse-diffusion step.
arXiv Detail & Related papers (2023-06-15T15:19:25Z) - Visualizing Classifier Adjacency Relations: A Case Study in Speaker
Verification and Voice Anti-Spoofing [72.4445825335561]
We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers.
Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores.
While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems.
arXiv Detail & Related papers (2021-06-11T13:03:33Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Handling Class Imbalance in Low-Resource Dialogue Systems by Combining
Few-Shot Classification and Interpolation [19.988400064884825]
Utterance classification performance in low-resource dialogue systems is constrained by an inevitably high degree of data imbalance in class labels.
We present a new end-to-end pairwise learning framework that inducing a few-shot classification capability in the utterance representations and augmenting data through an agnostic of utterance representations.
arXiv Detail & Related papers (2020-10-28T17:05:24Z) - Semi-supervised Neural Chord Estimation Based on a Variational
Autoencoder with Latent Chord Labels and Features [18.498244371257304]
This paper describes a statistically-principled semi-supervised method of automatic chord estimation.
It can make effective use of music signals regardless of the availability of chord annotations.
arXiv Detail & Related papers (2020-05-14T15:58:36Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.