Related papers: Evaluating Object-Centric Models beyond Object Discovery

Evaluating Object-Centric Models beyond Object Discovery

URL: http://arxiv.org/abs/2602.07532v1
Date: Sat, 07 Feb 2026 13:07:48 GMT
Title: Evaluating Object-Centric Models beyond Object Discovery
Authors: Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth,
Abstract summary: Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution data.<n>Most prior work focuses on evaluating OCL models solely through object discovery and simple reasoning tasks.<n>We introduce a unified evaluation task and metric that jointly assess localization (where) and representation usefulness.
Score: 19.133368391349393
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution (OOD) data. However, OCL models are often not evaluated regarding these goals. Instead, most prior work focuses on evaluating OCL models solely through object discovery and simple reasoning tasks, such as probing the representation via image classification. We identify two limitations in existing benchmarks: (1) They provide limited insights on the representation usefulness of OCL models, and (2) localization and representation usefulness are assessed using disjoint metrics. To address (1), we use instruction-tuned VLMs as evaluators, enabling scalable benchmarking across diverse VQA datasets to measure how well VLMs leverage OCL representations for complex reasoning tasks. To address (2), we introduce a unified evaluation task and metric that jointly assess localization (where) and representation usefulness (what), thereby eliminating inconsistencies introduced by disjoint evaluation. Finally, we include a simple multi-feature reconstruction baseline as a reference point.

Related papers

CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z)
Multi-Rationale Explainable Object Recognition via Contrastive Conditional Inference [1.2309843977641421]
We introduce a multi-rationale explainable object recognition benchmark comprising datasets in which each image is annotated with multiple ground-truth rationales.<n>We propose a contrastive conditional inference framework that explicitly models the probabilistic relationships among image embeddings, category labels, and rationales.<n>Our approach achieves state-of-the-art results on the multi-rationale explainable object recognition benchmark, including strong zero-shot performance.
arXiv Detail & Related papers (2025-08-19T21:28:12Z)
Are We Done with Object-Centric Learning? [65.67948794110212]
Object-centric learning (OCL) seeks to learn representations that only encode an object, isolated from other objects or background cues in a scene.<n>With recent sample-efficient segmentation models, we can separate objects in the pixel space and encode them independently.<n>We address the OOD generalization challenge caused by spurious background cues through the lens of OCL.
arXiv Detail & Related papers (2025-04-09T17:59:05Z)
Vector-Quantized Vision Foundation Models for Object-Centric Learning [29.417271736114454]
We propose a unified architecture, Vector-Quantized VFMs for Object-Centric Learning (OCL)<n>The key to our unification is simply shared quantizing VFM representations in OCL aggregation and decoding.<n>Our VVO consistently outperforms baselines in object discovery and recognition, as well as downstream visual prediction and reasoning.
arXiv Detail & Related papers (2025-02-27T16:51:13Z)
A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches [6.356364436395916]
We present the first comprehensive review of class-agnostic counting (CAC) methodologies.<n>We propose a taxonomy to categorize CAC approaches into three paradigms: reference-based, reference-less, and open-world text-guided.<n>We present results on the FSC-147 dataset, setting a leaderboard using gold-standard metrics, and on the CARPK dataset to assess generalization capabilities.
arXiv Detail & Related papers (2025-01-31T14:47:09Z)
Mind the Prompt: A Novel Benchmark for Prompt-based Class-Agnostic Counting [8.000723123087473]
Class-agnostic counting (CAC) counts instances of arbitrary object classes never seen during model training.<n>We introduce the Prompt-Aware Counting benchmark to measure the robustness and trustworthiness of prompt-based CAC models.<n>We evaluate state-of-the-art methods and demonstrate that, although some achieve impressive results on standard class-specific counting metrics, they exhibit a significant deficiency in understanding the input prompt.
arXiv Detail & Related papers (2024-09-24T10:35:42Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Look-into-Object: Self-supervised Structure Modeling for Object Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions. We show the recognition backbone can be substantially enhanced for more robust representation learning. Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.