An Experimental Study of Semantic Continuity for Deep Learning Models
- URL: http://arxiv.org/abs/2011.09789v1
- Date: Thu, 19 Nov 2020 12:23:28 GMT
- Title: An Experimental Study of Semantic Continuity for Deep Learning Models
- Authors: Shangxi Wu and Jitao Sang and Xian Zhao and Lizhang Chen
- Abstract summary: We argue that semantic discontinuity results from inappropriate training targets and contributes to notorious issues such as adversarial robustness, interpretability, etc.
We first conduct data analysis to provide evidence of semantic discontinuity in existing deep learning models, and then design a simple semantic continuity constraint which theoretically enables models to obtain smooth gradients and learn semantic-oriented features.
- Score: 11.883949320223078
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models suffer from the problem of semantic discontinuity: small
perturbations in the input space tend to cause semantic-level interference to
the model output. We argue that the semantic discontinuity results from these
inappropriate training targets and contributes to notorious issues such as
adversarial robustness, interpretability, etc. We first conduct data analysis
to provide evidence of semantic discontinuity in existing deep learning models,
and then design a simple semantic continuity constraint which theoretically
enables models to obtain smooth gradients and learn semantic-oriented features.
Qualitative and quantitative experiments prove that semantically continuous
models successfully reduce the use of non-semantic information, which further
contributes to the improvement in adversarial robustness, interpretability,
model transfer, and machine bias.
Related papers
- Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness [68.69369585600698]
Deep learning models often suffer from a lack of interpretability due to polysemanticity.
Recent advances in monosemanticity, where neurons correspond to consistent and distinct semantics, have significantly improved interpretability.
We show that monosemantic features not only enhance interpretability but also bring concrete gains in model performance.
arXiv Detail & Related papers (2024-10-27T18:03:20Z) - Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales [3.242050660144211]
Saliency post-hoc explainability methods are important tools for understanding increasingly complex NLP models.
We present a methodology for incorporating rationales, which are text annotations explaining human decisions, into text classification models.
arXiv Detail & Related papers (2024-04-03T22:39:33Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - Multicriteria interpretability driven Deep Learning [0.0]
Deep Learning methods are renowned for their performances, yet their lack of interpretability prevents them from high-stakes contexts.
Recent model methods address this problem by providing post-hoc interpretability methods by reverse-engineering the model's inner workings.
We propose a Multicriteria agnostic technique that allows to control the feature effects on the model's outcome by injecting knowledge in the objective function.
arXiv Detail & Related papers (2021-11-28T09:41:13Z) - Influence Tuning: Demoting Spurious Correlations via Instance
Attribution and Instance-Driven Updates [26.527311287924995]
influence tuning can help deconfounding the model from spurious patterns in data.
We show that in a controlled setup, influence tuning can help deconfounding the model from spurious patterns in data.
arXiv Detail & Related papers (2021-10-07T06:59:46Z) - Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning [57.4036085386653]
We show that prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inferences based on lexical overlap.
We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning.
arXiv Detail & Related papers (2021-09-09T10:10:29Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.