Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification
- URL: http://arxiv.org/abs/2511.07888v1
- Date: Wed, 12 Nov 2025 01:26:29 GMT
- Title: Breaking the Adversarial Robustness-Performance Trade-off in Text Classification via Manifold Purification
- Authors: Chenhao Dang, Jing Ma,
- Abstract summary: A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data.<n>We argue that this challenge can be resolved by modeling the distribution of clean samples in the encoder embedding manifold.<n>We propose the Manifold-Correcting Causal Flow (MC2F), a two-module system that operates directly on sentence embeddings.
- Score: 7.908790490702219
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A persistent challenge in text classification (TC) is that enhancing model robustness against adversarial attacks typically degrades performance on clean data. We argue that this challenge can be resolved by modeling the distribution of clean samples in the encoder embedding manifold. To this end, we propose the Manifold-Correcting Causal Flow (MC^2F), a two-module system that operates directly on sentence embeddings. A Stratified Riemannian Continuous Normalizing Flow (SR-CNF) learns the density of the clean data manifold. It identifies out-of-distribution embeddings, which are then corrected by a Geodesic Purification Solver. This solver projects adversarial points back onto the learned manifold via the shortest path, restoring a clean, semantically coherent representation. We conducted extensive evaluations on text classification (TC) across three datasets and multiple adversarial attacks. The results demonstrate that our method, MC^2F, not only establishes a new state-of-the-art in adversarial robustness but also fully preserves performance on clean data, even yielding modest gains in accuracy.
Related papers
- DVD: A Robust Method for Detecting Variant Contamination in Large Language Model Evaluation [24.086354908256293]
textbfDVD is a single-sample detector that models the local output distribution induced by temperature sampling.<n>We construct the first benchmark for variant contamination across two domains Omni-MATH and SuperGPQA.<n>textbfDVD consistently outperforms perplexity-based, Min-$k$%++, edit-distance (CDD), and embedding-similarity baselines.
arXiv Detail & Related papers (2026-01-08T12:48:40Z) - Latent Sculpting for Zero-Shot Generalization: A Manifold Learning Approach to Out-of-Distribution Anomaly Detection [2.8547732086436306]
A fundamental limitation of supervised deep learning is "Generalization Collapse"<n>We propose Latent Sculpting, a hierarchical two-stage representation learning framework.<n>We report an 88.89% detection rate on "Infiltration" scenarios.
arXiv Detail & Related papers (2025-12-19T11:37:02Z) - Latent Iterative Refinement Flow: A Geometric-Constrained Approach for Few-Shot Generation [5.062604189239418]
We introduce Latent Iterative Refinement Flow (LIRF), a novel approach to few-shot generation.<n>LIRF establishes a stable latent space using an autoencoder trained with our novel textbfmanifold-preservation loss.<n>Within this cycle, candidate samples are refined by a geometric textbfcorrection operator, a provably contractive mapping.
arXiv Detail & Related papers (2025-09-24T08:57:21Z) - Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds [48.37843602248313]
Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions.<n>We propose Consistency Model-based Adversarial Purification (CMAP), which optimize vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data.<n>CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.
arXiv Detail & Related papers (2024-12-11T14:14:02Z) - Rectified Diffusion Guidance for Conditional Generation [94.83538269086613]
We revisit the theory behind CFG and rigorously confirm that the improper combination coefficients (textiti.e.) brings about expectation shift the generative distribution.<n>We show that our approach enjoys a textbftextitform solution given the strength.<n> Empirical evidence on real-world data demonstrate the compatibility of our design with existing state-of-the-art diffusion models.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion [55.185588994883226]
We introduce VQ-LCMD, a continuous-space latent diffusion framework within the embedding space that stabilizes training.<n>VQ-LCMD uses a novel training objective combining the joint embedding-diffusion variational lower bound with a consistency-matching (CM) loss.<n>Experiments show that the proposed VQ-LCMD yields superior results on FFHQ, LSUN Churches, and LSUN Bedrooms compared to discrete-state latent diffusion models.
arXiv Detail & Related papers (2024-10-18T09:12:33Z) - Adversarial Vulnerability as a Consequence of On-Manifold Inseparibility [16.998477658358773]
We consider classification tasks and characterize the data distribution as a low-dimensional manifold.
We argue that clean training experiences poor convergence in the off-manifold direction caused by the ill-conditioning.
We perform experiments and exhibit tremendous robustness improvements in clean training through long training and the employment of second-order methods.
arXiv Detail & Related papers (2024-10-09T14:18:52Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Hard-label Manifolds: Unexpected Advantages of Query Efficiency for
Finding On-manifold Adversarial Examples [67.23103682776049]
Recent zeroth order hard-label attacks on image classification models have shown comparable performance to their first-order, gradient-level alternatives.
It was recently shown in the gradient-level setting that regular adversarial examples leave the data manifold, while their on-manifold counterparts are in fact generalization errors.
We propose an information-theoretic argument based on a noisy manifold distance oracle, which leaks manifold information through the adversary's gradient estimate.
arXiv Detail & Related papers (2021-03-04T20:53:06Z) - MMCGAN: Generative Adversarial Network with Explicit Manifold Prior [78.58159882218378]
We propose to employ explicit manifold learning as prior to alleviate mode collapse and stabilize training of GAN.
Our experiments on both the toy data and real datasets show the effectiveness of MMCGAN in alleviating mode collapse, stabilizing training, and improving the quality of generated samples.
arXiv Detail & Related papers (2020-06-18T07:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.