Adaptive Contextual Perception: How to Generalize to New Backgrounds and
Ambiguous Objects
- URL: http://arxiv.org/abs/2306.05963v2
- Date: Fri, 27 Oct 2023 19:35:32 GMT
- Title: Adaptive Contextual Perception: How to Generalize to New Backgrounds and
Ambiguous Objects
- Authors: Zhuofan Ying, Peter Hase, Mohit Bansal
- Abstract summary: We investigate how vision models adaptively use context for out-of-distribution generalization.
We show that models that excel in one setting tend to struggle in the other.
To replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations.
- Score: 75.15563723169234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Biological vision systems make adaptive use of context to recognize objects
in new settings with novel contexts as well as occluded or blurry objects in
familiar settings. In this paper, we investigate how vision models adaptively
use context for out-of-distribution (OOD) generalization and leverage our
analysis results to improve model OOD generalization. First, we formulate two
distinct OOD settings where the contexts are either irrelevant
(Background-Invariance) or beneficial (Object-Disambiguation), reflecting the
diverse contextual challenges faced in biological vision. We then analyze model
performance in these two different OOD settings and demonstrate that models
that excel in one setting tend to struggle in the other. Notably, prior works
on learning causal features improve on one setting but hurt in the other. This
underscores the importance of generalizing across both OOD settings, as this
ability is crucial for both human cognition and robust AI systems. Next, to
better understand the model properties contributing to OOD generalization, we
use representational geometry analysis and our own probing methods to examine a
population of models, and we discover that those with more factorized
representations and appropriate feature weighting are more successful in
handling Background-Invariance and Object-Disambiguation tests. We further
validate these findings through causal intervention on representation
factorization and feature weighting to demonstrate their causal effect on
performance. Lastly, we propose new augmentation methods to enhance model
generalization. These methods outperform strong baselines, yielding
improvements in both in-distribution and OOD tests. In conclusion, to replicate
the generalization abilities of biological vision, computer vision models must
have factorized object vs. background representations and appropriately weight
both kinds of features.
Related papers
- From Efficiency to Equity: Measuring Fairness in Preference Learning [3.2132738637761027]
We evaluate fairness in preference learning models inspired by economic theories of inequality and Rawlsian justice.
We propose metrics adapted from the Gini Coefficient, Atkinson Index, and Kuznets Ratio to quantify fairness in these models.
arXiv Detail & Related papers (2024-10-24T15:25:56Z) - Benchmarking the Attribution Quality of Vision Models [13.255247017616687]
We propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol.
This allows us to evaluate 23 attribution methods and how eight different design choices of popular vision models affect their attribution quality.
We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work.
arXiv Detail & Related papers (2024-07-16T17:02:20Z) - Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms [91.19304518033144]
We aim to align vision models with human aesthetic standards in a retrieval system.
We propose a preference-based reinforcement learning method that fine-tunes the vision models to better align the vision models with human aesthetics.
arXiv Detail & Related papers (2024-06-13T17:59:20Z) - Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Generalization properties of contrastive world models [10.806958747213976]
We conduct an extensive study on the generalization properties of contrastive world model.
Our experiments show that the contrastive world model fails to generalize under the different OOD tests.
Our work highlights the importance of object-centric representations for generalization and current models are limited in their capacity to learn such representations required for human-level generalization.
arXiv Detail & Related papers (2023-12-29T19:25:34Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - Spurious Feature Diversification Improves Out-of-distribution Generalization [43.84284578270031]
Generalization to out-of-distribution (OOD) data is a critical challenge in machine learning.
We study WiSE-FT, a popular weight space ensemble method that interpolates between a pre-trained and a fine-tuned model.
We observe an unexpected FalseFalseTrue" phenomenon, in which WiSE-FT successfully corrects many cases where each individual model makes incorrect predictions.
arXiv Detail & Related papers (2023-09-29T13:29:22Z) - On the Robustness of Aspect-based Sentiment Analysis: Rethinking Model,
Data, and Training [109.9218185711916]
Aspect-based sentiment analysis (ABSA) aims at automatically inferring the specific sentiment polarities toward certain aspects of products or services behind social media texts or reviews.
We propose to enhance the ABSA robustness by systematically rethinking the bottlenecks from all possible angles, including model, data, and training.
arXiv Detail & Related papers (2023-04-19T11:07:43Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.