Does the dataset meet your expectations? Explaining sample
representation in image data
- URL: http://arxiv.org/abs/2012.08642v1
- Date: Sun, 6 Dec 2020 18:16:28 GMT
- Title: Does the dataset meet your expectations? Explaining sample
representation in image data
- Authors: Dhasarathy Parthasarathy, Anton Johansson
- Abstract summary: A neural network model is affected adversely by a lack of diversity in training data.
We present a method that identifies and explains such deficiencies.
We then apply the method to examine a dataset of geometric shapes.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the behavior of a neural network model is adversely affected by a lack
of diversity in training data, we present a method that identifies and explains
such deficiencies. When a dataset is labeled, we note that annotations alone
are capable of providing a human interpretable summary of sample diversity.
This allows explaining any lack of diversity as the mismatch found when
comparing the \textit{actual} distribution of annotations in the dataset with
an \textit{expected} distribution of annotations, specified manually to capture
essential label diversity. While, in many practical cases, labeling (samples
$\rightarrow$ annotations) is expensive, its inverse, simulation (annotations
$\rightarrow$ samples) can be cheaper. By mapping the expected distribution of
annotations into test samples using parametric simulation, we present a method
that explains sample representation using the mismatch in diversity between
simulated and collected data. We then apply the method to examine a dataset of
geometric shapes to qualitatively and quantitatively explain sample
representation in terms of comprehensible aspects such as size, position, and
pixel brightness.
Related papers
- Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Stylist: Style-Driven Feature Ranking for Robust Novelty Detection [8.402607231390606]
We propose to use the formalization of separating into semantic or content changes, that are relevant to our task, and style changes, that are irrelevant.
Within this formalization, we define the robust novelty detection as the task of finding semantic changes while being robust to style distributional shifts.
We show that our selection manages to remove features responsible for spurious correlations and improve novelty detection performance.
arXiv Detail & Related papers (2023-10-05T17:58:32Z) - Sample-Specific Debiasing for Better Image-Text Models [6.301766237907306]
Self-supervised representation learning on image-text data facilitates crucial medical applications, such as image classification, visual grounding, and cross-modal retrieval.
One common approach involves contrasting semantically similar (positive) and dissimilar (negative) pairs of data points.
Drawing negative samples uniformly from the training data set introduces false negatives, i.e., samples that are treated as dissimilar but belong to the same class.
In healthcare data, the underlying class distribution is nonuniform, implying that false negatives occur at a highly variable rate.
arXiv Detail & Related papers (2023-04-25T22:23:41Z) - Learning to Imagine: Diversify Memory for Incremental Learning using
Unlabeled Data [69.30452751012568]
We develop a learnable feature generator to diversify exemplars by adaptively generating diverse counterparts of exemplars.
We introduce semantic contrastive learning to enforce the generated samples to be semantic consistent with exemplars.
Our method does not bring any extra inference cost and outperforms state-of-the-art methods on two benchmarks.
arXiv Detail & Related papers (2022-04-19T15:15:18Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Information Symmetry Matters: A Modal-Alternating Propagation Network
for Few-Shot Learning [118.45388912229494]
We propose a Modal-Alternating Propagation Network (MAP-Net) to supplement the absent semantic information of unlabeled samples.
We design a Relation Guidance (RG) strategy to guide the visual relation vectors via semantics so that the propagated information is more beneficial.
Our proposed method achieves promising performance and outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2021-09-03T03:43:53Z) - Support-set bottlenecks for video-text representation learning [131.4161071785107]
The dominant paradigm for learning video-text representations -- noise contrastive learning -- is too strict.
We propose a novel method that alleviates this by leveraging a generative model to naturally push these related samples together.
Our proposed method outperforms others by a large margin on MSR-VTT, VATEX and ActivityNet, and MSVD for video-to-text and text-to-video retrieval.
arXiv Detail & Related papers (2020-10-06T15:38:54Z) - Null-sampling for Interpretable and Fair Representations [8.654168514863649]
We learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness.
By placing the representations into the data domain, the changes made by the model are easily examinable by human auditors.
arXiv Detail & Related papers (2020-08-12T11:49:01Z) - The Bures Metric for Generative Adversarial Networks [10.69910379275607]
Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples.
We propose to match the real batch diversity to the fake batch diversity.
We observe that diversity matching reduces mode collapse substantially and has a positive effect on the sample quality.
arXiv Detail & Related papers (2020-06-16T12:04:41Z) - On conditional versus marginal bias in multi-armed bandits [105.07190334523304]
The bias of the sample means of the arms in multi-armed bandits is an important issue in adaptive data analysis.
We characterize the sign of the conditional bias of monotone functions of the rewards, including the sample mean.
Our results hold for arbitrary conditioning events and leverage natural monotonicity properties of the data collection policy.
arXiv Detail & Related papers (2020-02-19T20:16:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.