Related papers: ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts

URL: http://arxiv.org/abs/2510.26186v1
Date: Thu, 30 Oct 2025 06:46:17 GMT
Title: ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts
Authors: Jinho Choi, Hyesu Lim, Steffen Schneider, Jaegul Choo,
Abstract summary: ConceptScope is a scalable and automated framework for analyzing visual datasets.<n>It categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels.<n>It reliably detects known biases and uncovers previously unannotated ones.
Score: 54.60525564599342
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Dataset bias, where data points are skewed to certain concepts, is ubiquitous in machine learning datasets. Yet, systematically identifying these biases is challenging without costly, fine-grained attribute annotations. We present ConceptScope, a scalable and automated framework for analyzing visual datasets by discovering and quantifying human-interpretable concepts using Sparse Autoencoders trained on representations from vision foundation models. ConceptScope categorizes concepts into target, context, and bias types based on their semantic relevance and statistical correlation to class labels, enabling class-level dataset characterization, bias identification, and robustness evaluation through concept-based subgrouping. We validate that ConceptScope captures a wide range of visual concepts, including objects, textures, backgrounds, facial attributes, emotions, and actions, through comparisons with annotated datasets. Furthermore, we show that concept activations produce spatial attributions that align with semantically meaningful image regions. ConceptScope reliably detects known biases (e.g., background bias in Waterbirds) and uncovers previously unannotated ones (e.g, co-occurring objects in ImageNet), offering a practical tool for dataset auditing and model diagnostics.

Related papers

Insight: Interpretable Semantic Hierarchies in Vision-Language Encoders [52.94006363830628]
Language-aligned vision foundation models perform strongly across diverse downstream tasks.<n>Recent works decompose these representations into human-interpretable concepts, but provide poor spatial grounding and are limited to image classification tasks.<n>We propose Insight, a language-aligned concept foundation model that provides fine-grained concepts, which are human-interpretable and spatially grounded in the input image.
arXiv Detail & Related papers (2026-01-20T09:57:26Z)
Prompt the Unseen: Evaluating Visual-Language Alignment Beyond Supervision [22.712690974750007]
Vision-Language Models (VLMs) combine a vision encoder and a large language model (LLM) through alignment training.<n>Despite its importance, the projection layer's ability to generalize to unseen visual concepts has not been systematically evaluated.<n>This study introduces a new evaluation framework for alignment generalization.
arXiv Detail & Related papers (2025-08-31T05:00:51Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Knowledge graphs for empirical concept retrieval [1.06378109904813]
Concept-based explainable AI is promising as a tool to improve the understanding of complex models at the premises of a given user. Here, we present a workflow for user-driven data collection in both text and image domains. We test the retrieved concept datasets on two concept-based explainability methods, namely concept activation vectors (CAVs) and concept activation regions (CARs)
arXiv Detail & Related papers (2024-04-10T13:47:22Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data. We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z)
Discovering Concepts in Learned Representations using Statistical Inference and Interactive Visualization [0.76146285961466]
Concept discovery is important for bridging the gap between non-deep learning experts and model end-users. Current approaches include hand-crafting concept datasets and then converting them to latent space directions. In this study, we offer another two approaches to guide user discovery of meaningful concepts, one based on multiple hypothesis testing, and another on interactive visualization.
arXiv Detail & Related papers (2022-02-09T22:29:48Z)
Translational Concept Embedding for Generalized Compositional Zero-shot Learning [73.60639796305415]
Generalized compositional zero-shot learning means to learn composed concepts of attribute-object pairs in a zero-shot fashion. This paper introduces a new approach, termed translational concept embedding, to solve these two difficulties in a unified framework.
arXiv Detail & Related papers (2021-12-20T21:27:51Z)
Object Recognition as Classification of Visual Properties [5.1652563977194434]
We present an object recognition process based on Ranganathan's four-phased faceted knowledge organization process. We briefly introduce the ongoing project MultiMedia UKC, whose aim is to build an object recognition resource.
arXiv Detail & Related papers (2021-12-20T13:50:07Z)
Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis. We learn sentiment, aspect> joint topic embeddings in the word embedding space. We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.