An Experimental Study of Semantic Continuity for Deep Learning Models
- URL: http://arxiv.org/abs/2011.09789v1
- Date: Thu, 19 Nov 2020 12:23:28 GMT
- Title: An Experimental Study of Semantic Continuity for Deep Learning Models
- Authors: Shangxi Wu and Jitao Sang and Xian Zhao and Lizhang Chen
- Abstract summary: We argue that semantic discontinuity results from inappropriate training targets and contributes to notorious issues such as adversarial robustness, interpretability, etc.
We first conduct data analysis to provide evidence of semantic discontinuity in existing deep learning models, and then design a simple semantic continuity constraint which theoretically enables models to obtain smooth gradients and learn semantic-oriented features.
- Score: 11.883949320223078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models suffer from the problem of semantic discontinuity: small
perturbations in the input space tend to cause semantic-level interference to
the model output. We argue that the semantic discontinuity results from these
inappropriate training targets and contributes to notorious issues such as
adversarial robustness, interpretability, etc. We first conduct data analysis
to provide evidence of semantic discontinuity in existing deep learning models,
and then design a simple semantic continuity constraint which theoretically
enables models to obtain smooth gradients and learn semantic-oriented features.
Qualitative and quantitative experiments prove that semantically continuous
models successfully reduce the use of non-semantic information, which further
contributes to the improvement in adversarial robustness, interpretability,
model transfer, and machine bias.
Related papers
- Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.
Models may behave unreliably due to poorly explored failure modes.
causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Bridging Interpretability and Robustness Using LIME-Guided Model Refinement [0.0]
Local Interpretable Model-Agnostic Explanations (LIME) systematically enhance model robustness.
Empirical evaluations on multiple benchmark datasets demonstrate that LIME-guided refinement not only improves interpretability but also significantly enhances resistance to adversarial perturbations and generalization to out-of-distribution data.
arXiv Detail & Related papers (2024-12-25T17:32:45Z) - Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach [2.4427666827706074]
This paper presents a novel approach to imitation learning from observations, where an autoregressive mixture of experts model is deployed to fit the underlying policy.
The effectiveness of the proposed framework is validated using two autonomous driving datasets collected from human demonstrations.
arXiv Detail & Related papers (2024-11-12T22:56:28Z) - Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness [68.69369585600698]
Deep learning models often suffer from a lack of interpretability due to polysemanticity.
Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability.
We show that monosemantic features not only enhance interpretability but also bring concrete gains in model performance.
arXiv Detail & Related papers (2024-10-27T18:03:20Z) - Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [31.8260779160424]
We investigate how popular algorithms perform as the learned dynamics model is improved.
We propose Reach-Aware Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.