Exploiting Representation Bias for Data Distillation in Abstractive Text
Summarization
- URL: http://arxiv.org/abs/2312.06022v2
- Date: Wed, 20 Dec 2023 15:07:59 GMT
- Title: Exploiting Representation Bias for Data Distillation in Abstractive Text
Summarization
- Authors: Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty
- Abstract summary: We show that deep models fail to capture the diversity of the input space.
We employ clustering techniques to learn the diversity of a model's sample space.
We devise a metric to filter out redundant data points to make the model more robust and less data hungry.
- Score: 25.467836837575742
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Abstractive text summarization is surging with the number of training samples
to cater to the needs of the deep learning models. These models tend to exploit
the training data representations to attain superior performance by improving
the quantitative element of the resultant summary. However, increasing the size
of the training set may not always be the ideal solution to maximize the
performance, and therefore, a need to revisit the quality of training samples
and the learning protocol of deep learning models is a must. In this paper, we
aim to discretize the vector space of the abstractive text summarization models
to understand the characteristics learned between the input embedding space and
the models' encoder space. We show that deep models fail to capture the
diversity of the input space. Further, the distribution of data points on the
encoder space indicates that an unchecked increase in the training samples does
not add value; rather, a tear-down of data samples is highly needed to make the
models focus on variability and faithfulness. We employ clustering techniques
to learn the diversity of a model's sample space and how data points are mapped
from the embedding space to the encoder space and vice versa. Further, we
devise a metric to filter out redundant data points to make the model more
robust and less data hungry. We benchmark our proposed method using
quantitative metrics, such as Rouge, and qualitative metrics, such as
BERTScore, FEQA and Pyramid score. We also quantify the reasons that inhibit
the models from learning the diversity from the varied input samples.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models [36.05242956018461]
In this paper, we establish a bridge between identifying detrimental training samples via influence functions and outlier gradient detection.
We first validate the hypothesis of our proposed outlier gradient analysis approach on synthetic datasets.
We then demonstrate its effectiveness in detecting mislabeled samples in vision models and selecting data samples for improving performance of natural language processing transformer models.
arXiv Detail & Related papers (2024-05-06T21:34:46Z) - Reinforcement Learning with Generative Models for Compact Support Sets [10.041289551532804]
We propose a framework utilizing reinforcement learning as a control for foundation models.
Our framework produced excellent results, increasing classification accuracy by significant margins for no additional labelling or data cost.
arXiv Detail & Related papers (2024-04-25T02:48:16Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - ClusterQ: Semantic Feature Distribution Alignment for Data-Free
Quantization [111.12063632743013]
We propose a new and effective data-free quantization method termed ClusterQ.
To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics.
We also incorporate the intra-class variance to solve class-wise mode collapse.
arXiv Detail & Related papers (2022-04-30T06:58:56Z) - Semi-supervised Deep Learning for Image Classification with Distribution
Mismatch: A Survey [1.5469452301122175]
Deep learning models rely on the abundance of labelled observations to train a prospective model.
It is expensive to gather labelled observations of data, making the usage of deep learning models not ideal.
In many situations different unlabelled data sources might be available.
This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets.
arXiv Detail & Related papers (2022-03-01T02:46:00Z) - Entropy optimized semi-supervised decomposed vector-quantized
variational autoencoder model based on transfer learning for multiclass text
classification and generation [3.9318191265352196]
We propose a semisupervised discrete latent variable model for multi-class text classification and text generation.
The proposed model employs the concept of transfer learning for training a quantized transformer model.
Experimental results indicate that the proposed model has surpassed the state-of-the-art models remarkably.
arXiv Detail & Related papers (2021-11-10T07:07:54Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.