Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
- URL: http://arxiv.org/abs/2110.07575v1
- Date: Thu, 14 Oct 2021 17:38:20 GMT
- Title: Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset
- Authors: Ian Palmer, Andrew Rouditchenko, Andrei Barbu, Boris Katz, James Glass
- Abstract summary: Spoken ObjectNet is designed to remove some of these biases and provide a way to better evaluate how effectively models will perform in real-world scenarios.
This dataset expands upon ObjectNet, which is a bias-controlled image dataset that features similar image classes to those present in ImageNet.
Results show that models trained on other datasets and then evaluated on Spoken ObjectNet tend to perform poorly due to biases in other datasets that the models have learned.
- Score: 14.44921491933053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visually-grounded spoken language datasets can enable models to learn
cross-modal correspondences with very weak supervision. However, modern
audio-visual datasets contain biases that undermine the real-world performance
of models trained on that data. We introduce Spoken ObjectNet, which is
designed to remove some of these biases and provide a way to better evaluate
how effectively models will perform in real-world scenarios. This dataset
expands upon ObjectNet, which is a bias-controlled image dataset that features
similar image classes to those present in ImageNet. We detail our data
collection pipeline, which features several methods to improve caption quality,
including automated language model checks. Lastly, we show baseline results on
image retrieval and audio retrieval tasks. These results show that models
trained on other datasets and then evaluated on Spoken ObjectNet tend to
perform poorly due to biases in other datasets that the models have learned. We
also show evidence that the performance decrease is due to the dataset
controls, and not the transfer setting.
Related papers
- ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
Higher accuracy on ImageNet usually leads to better robustness against different corruptions.
We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions.
We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
arXiv Detail & Related papers (2023-03-30T02:02:32Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations.
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z) - Rethinking Natural Adversarial Examples for Classification Models [43.87819913022369]
ImageNet-A is a famous dataset of natural adversarial examples.
We validated the hypothesis by reducing the background influence in ImageNet-A examples with object detection techniques.
Experiments showed that the object detection models with various classification models as backbones obtained much higher accuracy than their corresponding classification models.
arXiv Detail & Related papers (2021-02-23T14:46:48Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z) - ObjectNet Dataset: Reanalysis and Correction [47.64219291655723]
Recently, Barbu et al introduced a dataset called ObjectNet which includes objects in daily life situations.
They showed a dramatic performance drop of the state of the art object recognition models on this dataset.
We highlight a major problem with their work which is applying object recognizers to the scenes containing multiple objects rather than isolated objects.
arXiv Detail & Related papers (2020-04-04T22:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.