Related papers: LOCL: Learning Object-Attribute Composition using Localization

LOCL: Learning Object-Attribute Composition using Localization

URL: http://arxiv.org/abs/2210.03780v1
Date: Fri, 7 Oct 2022 18:48:45 GMT
Title: LOCL: Learning Object-Attribute Composition using Localization
Authors: Satish Kumar, ASM Iftekhar, Ekta Prashnani, B.S.Manjunath
Abstract summary: This paper describes LOCL that generalizes composition zero shot learning to objects in cluttered and more realistic settings. Key contribution is a modular approach to localizing objects and attributes of interest in a weakly supervised context.
Score: 13.820889273887454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper describes LOCL (Learning Object Attribute Composition using Localization) that generalizes composition zero shot learning to objects in cluttered and more realistic settings. The problem of unseen Object Attribute (OA) associations has been well studied in the field, however, the performance of existing methods is limited in challenging scenes. In this context, our key contribution is a modular approach to localizing objects and attributes of interest in a weakly supervised context that generalizes robustly to unseen configurations. Localization coupled with a composition classifier significantly outperforms state of the art (SOTA) methods, with an improvement of about 12% on currently available challenging datasets. Further, the modularity enables the use of localized feature extractor to be used with existing OA compositional learning methods to improve their overall performance.

Related papers

Structure-aware Prompt Adaptation from Seen to Unseen for Open-Vocabulary Compositional Zero-Shot Learning [86.58227205147546]
The goal of Compositional Zero-Shot Learning (OV-CZSL) is to recognize iteration-object compositions in the open-vocabulary setting.<n>We propose Structure-aware Prompt Adaptation (SPA) method, which enables models to generalize from seen to unseen attributes and objects.
arXiv Detail & Related papers (2026-03-04T07:54:28Z)
A Conditional Probability Framework for Compositional Zero-shot Learning [86.86063926727489]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen combinations of known objects and attributes by leveraging knowledge from previously seen compositions.<n>Traditional approaches primarily focus on disentangling attributes and objects, treating them as independent entities during learning.<n>We adopt a Conditional Probability Framework (CPF) to explicitly model attribute-object dependencies.
arXiv Detail & Related papers (2025-07-23T10:20:52Z)
Real Classification by Description: Extending CLIP's Limits of Part Attributes Recognition [1.2499537119440243]
We tackle zero shot "real" classification by description, a novel task that evaluates the ability of Vision-Language Models (VLMs) to classify objects based solely on descriptive attributes, excluding object class names. We release description data for six popular fine-grained benchmarks, which omit object names to encourage genuine zero-shot learning. We introduce a modified CLIP architecture that leverages multiple resolutions to improve the detection of fine-grained part attributes.
arXiv Detail & Related papers (2024-12-18T15:28:08Z)
Compositional Zero-Shot Learning with Contextualized Cues and Adaptive Contrastive Training [17.893694262999826]
This paper introduces a novel framework, Understanding and Linking Attributes and Objects (ULAO) in Compositional Zero-Shot Learning (CZSL) ULAO comprises two innovative modules. The Understanding Attributes and Objects (UAO) module improves primitive understanding by sequential primitive prediction and leveraging recognized objects as contextual hints for attribute classification. The Linking Attributes and Objects (LAO) module improves the attribute-object linkage understanding through a new contrastive learning strategy that incorporates tailored hard negative generation and adaptive loss adjustments.
arXiv Detail & Related papers (2024-12-10T03:41:20Z)
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation [47.047267066525265]
We introduce a novel approach that incorporates object-level contextual knowledge within images. Our proposed approach achieves state-of-the-art performance with strong generalizability across diverse datasets.
arXiv Detail & Related papers (2024-11-26T06:34:48Z)
Composable Part-Based Manipulation [61.48634521323737]
We propose composable part-based manipulation (CPM) to improve learning and generalization of robotic manipulation skills. CPM comprises a collection of composable diffusion models, where each model captures a different inter-object correspondence. We validate our approach in both simulated and real-world scenarios, demonstrating its effectiveness in achieving robust and generalized manipulation capabilities.
arXiv Detail & Related papers (2024-05-09T16:04:14Z)
Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities. We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z)
Disentangling Visual Embeddings for Attributes and Objects [38.27308243429424]
We study the problem of compositional zero-shot learning for object-attribute recognition. Prior works use visual features extracted with a backbone network, pre-trained for object classification. We propose a novel architecture that can disentangle attribute and object features in the visual space.
arXiv Detail & Related papers (2022-05-17T17:59:36Z)
Unveiling the Potential of Structure-Preserving for Weakly Supervised Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL. In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network. In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z)
Local Context Attention for Salient Object Segmentation [5.542044768017415]
We propose a novel Local Context Attention Network (LCANet) to generate locally reinforcement feature maps in a uniform representational architecture. The proposed network introduces an Attentional Correlation Filter (ACF) module to generate explicit local attention by calculating the correlation feature map between coarse prediction and global context. Comprehensive experiments are conducted on several salient object segmentation datasets, demonstrating the superior performance of the proposed LCANet against the state-of-the-art methods.
arXiv Detail & Related papers (2020-09-24T09:20:06Z)
Simple and effective localized attribute representations for zero-shot learning [48.053204004771665]
Zero-shot learning (ZSL) aims to discriminate images from unseen classes by exploiting relations to seen classes via their semantic descriptions. We propose localizing representations in the semantic/attribute space, with a simple but effective pipeline where localization is implicit. Our method can be implemented easily, which can be used as a new baseline for zero shot-learning.
arXiv Detail & Related papers (2020-06-10T16:46:12Z)
Pairwise Similarity Knowledge Transfer for Weakly Supervised Object Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels. In this work, we argue that learning only an objectness function is a weak form of knowledge transfer. Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.