Benchmarking a Benchmark: How Reliable is MS-COCO?
- URL: http://arxiv.org/abs/2311.02709v1
- Date: Sun, 5 Nov 2023 16:55:40 GMT
- Title: Benchmarking a Benchmark: How Reliable is MS-COCO?
- Authors: Eric Zimmermann, Justin Szeto, Jerome Pasquero, Frederic Ratle
- Abstract summary: Sama-COCO, a re-annotation of MS-COCO, is used to discover potential biases by leveraging a shape analysis pipeline.
A model is trained and evaluated on both datasets to examine the impact of different annotation conditions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmark datasets are used to profile and compare algorithms across a
variety of tasks, ranging from image classification to segmentation, and also
play a large role in image pretraining algorithms. Emphasis is placed on
results with little regard to the actual content within the dataset. It is
important to question what kind of information is being learned from these
datasets and what are the nuances and biases within them. In the following
work, Sama-COCO, a re-annotation of MS-COCO, is used to discover potential
biases by leveraging a shape analysis pipeline. A model is trained and
evaluated on both datasets to examine the impact of different annotation
conditions. Results demonstrate that annotation styles are important and that
annotation pipelines should closely consider the task of interest. The dataset
is made publicly available at https://www.sama.com/sama-coco-dataset/ .
Related papers
- Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Comparing Importance Sampling Based Methods for Mitigating the Effect of
Class Imbalance [0.0]
We compare three techniques that derive from importance sampling: loss reweighting, undersampling, and oversampling.
We find that up-weighting the loss for and undersampling has a negigible effect on the performance on underrepresented classes.
Our findings also indicate that there may exist some redundancy in data in the Planet dataset.
arXiv Detail & Related papers (2024-02-28T22:52:27Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - TrueDeep: A systematic approach of crack detection with less data [0.0]
We show that by incorporating domain knowledge along with deep learning architectures, we can achieve similar performance with less data.
Our algorithms, developed with 23% of the overall data, have a similar performance on the test data and significantly better performance on multiple blind datasets.
arXiv Detail & Related papers (2023-05-30T14:51:58Z) - A Bag-of-Prototypes Representation for Dataset-Level Applications [24.629132557336312]
This work investigates dataset vectorization for two dataset-level tasks: assessing training set suitability and test set difficulty.
We propose a bag-of-prototypes (BoP) dataset representation that extends the image-level bag consisting of patch descriptors to dataset-level bag consisting of semantic prototypes.
BoP consistently shows its advantage over existing representations on a series of benchmarks for two dataset-level tasks.
arXiv Detail & Related papers (2023-03-23T13:33:58Z) - Urban Scene Semantic Segmentation with Low-Cost Coarse Annotation [107.72926721837726]
coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models.
We propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of coarsely annotated data.
Our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget.
arXiv Detail & Related papers (2022-12-15T15:43:42Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Free Lunch for Co-Saliency Detection: Context Adjustment [14.688461235328306]
We propose a "cost-free" group-cut-paste (GCP) procedure to leverage images from off-the-shelf saliency detection datasets and synthesize new samples.
We collect a novel dataset called Context Adjustment Training. The two variants of our dataset, i.e., CAT and CAT+, consist of 16,750 and 33,500 images, respectively.
arXiv Detail & Related papers (2021-08-04T14:51:37Z) - A Critical Assessment of State-of-the-Art in Entity Alignment [1.7725414095035827]
We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
arXiv Detail & Related papers (2020-10-30T15:09:19Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.