Related papers: Harnessing the Power of Explanations for Incremental Training: A LIME-Based Approach

Harnessing the Power of Explanations for Incremental Training: A LIME-Based Approach

URL: http://arxiv.org/abs/2211.01413v2
Date: Tue, 11 Jul 2023 20:22:08 GMT
Title: Harnessing the Power of Explanations for Incremental Training: A LIME-Based Approach
Authors: Arnab Neelim Mazumder, Niall Lyons, Ashutosh Pandey, Avik Santra, and Tinoosh Mohsenin
Abstract summary: In this work, model explanations are fed back to the feed-forward training to help the model generalize better. The framework incorporates the custom weighted loss with Elastic Weight Consolidation (EWC) to maintain performance in sequential testing sets. The proposed custom training procedure results in a consistent enhancement of accuracy ranging from 0.5% to 1.5% throughout all phases of the incremental learning setup.
Score: 6.244905619201076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Explainability of neural network prediction is essential to understand feature importance and gain interpretable insight into neural network performance. However, explanations of neural network outcomes are mostly limited to visualization, and there is scarce work that looks to use these explanations as feedback to improve model performance. In this work, model explanations are fed back to the feed-forward training to help the model generalize better. To this extent, a custom weighted loss where the weights are generated by considering the Euclidean distances between true LIME (Local Interpretable Model-Agnostic Explanations) explanations and model-predicted LIME explanations is proposed. Also, in practical training scenarios, developing a solution that can help the model learn sequentially without losing information on previous data distribution is imperative due to the unavailability of all the training data at once. Thus, the framework incorporates the custom weighted loss with Elastic Weight Consolidation (EWC) to maintain performance in sequential testing sets. The proposed custom training procedure results in a consistent enhancement of accuracy ranging from 0.5% to 1.5% throughout all phases of the incremental learning setup compared to traditional loss-based training methods for the keyword spotting task using the Google Speech Commands dataset.

Related papers

Reasoning to Learn from Latent Thoughts [45.59740535714148]
We show that explicitly modeling and inferring the latent thoughts that underlie the text generation process can significantly improve pretraining data efficiency. We show that a 1B LM can bootstrap its performance across at least three iterations and significantly outperform baselines trained on raw data. The gains from inference scaling and EM iterations suggest new opportunities for scaling data-constrained pretraining.
arXiv Detail & Related papers (2025-03-24T16:41:23Z)
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance. We introduce novel algorithms for dynamic, instance-level data reweighting. Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy. By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z)
Improve Vision Language Model Chain-of-thought Reasoning [86.83335752119741]
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. We show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses.
arXiv Detail & Related papers (2024-10-21T17:00:06Z)
Bayes' Power for Explaining In-Context Learning Generalizations [46.17844703369127]
In this paper, we argue that a more useful interpretation of neural network behavior in this era is as an approximation of the true posterior. We show how models become robust in-context learners by effectively composing knowledge from their training data.
arXiv Detail & Related papers (2024-10-02T14:01:34Z)
Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance. Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning. Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z)
Leveraging Angular Information Between Feature and Classifier for Long-tailed Learning: A Prediction Reformulation Approach [90.77858044524544]
We reformulate the recognition probabilities through included angles without re-balancing the classifier weights. Inspired by the performance improvement of the predictive form reformulation, we explore the different properties of this angular prediction. Our method is able to obtain the best performance among peer methods without pretraining on CIFAR10/100-LT and ImageNet-LT.
arXiv Detail & Related papers (2022-12-03T07:52:48Z)
Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data. We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z)
Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications. Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy. We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z)
Retrieval Augmentation to Improve Robustness and Interpretability of Deep Neural Networks [3.0410237490041805]
In this work, we actively exploit the training data to improve the robustness and interpretability of deep neural networks. Specifically, the proposed approach uses the target of the nearest input example to initialize the memory state of an LSTM model or to guide attention mechanisms. Results show the effectiveness of the proposed models for the two tasks, on the widely used Flickr8 and IMDB datasets.
arXiv Detail & Related papers (2021-02-25T17:38:31Z)
Self-Adaptive Training: Bridging the Supervised and Self-Supervised Learning [16.765461276790944]
Self-adaptive training is a unified training algorithm that dynamically calibrates and enhances training process by model predictions without incurring extra computational cost. We analyze the training dynamics of deep networks on training data corrupted by, e.g., random noise and adversarial examples. Our analysis shows that model predictions are able to magnify useful underlying information in data and this phenomenon occurs broadly even in the absence of emphany label information.
arXiv Detail & Related papers (2021-01-21T17:17:30Z)
Explanation-Guided Training for Cross-Domain Few-Shot Classification [96.12873073444091]
Cross-domain few-shot classification task (CD-FSC) combines few-shot classification with the requirement to generalize across domains represented by datasets. We introduce a novel training approach for existing FSC models. We show that explanation-guided training effectively improves the model generalization.
arXiv Detail & Related papers (2020-07-17T07:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.