TaCo: Textual Attribute Recognition via Contrastive Learning
- URL: http://arxiv.org/abs/2208.10180v1
- Date: Mon, 22 Aug 2022 09:45:34 GMT
- Title: TaCo: Textual Attribute Recognition via Contrastive Learning
- Authors: Chang Nie, Yiqing Hu, Yanqiu Qu, Hao Liu, Deqiang Jiang, Bo Ren
- Abstract summary: TaCo is a contrastive framework for textual attribute recognition tailored toward the most common document scenes.
We design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning.
Experiments show that TaCo surpasses the supervised counterparts and advances the state-of-the-art remarkably on multiple attribute recognition tasks.
- Score: 9.042957048594825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As textual attributes like font are core design elements of document format
and page style, automatic attributes recognition favor comprehensive practical
applications. Existing approaches already yield satisfactory performance in
differentiating disparate attributes, but they still suffer in distinguishing
similar attributes with only subtle difference. Moreover, their performance
drop severely in real-world scenarios where unexpected and obvious imaging
distortions appear. In this paper, we aim to tackle these problems by proposing
TaCo, a contrastive framework for textual attribute recognition tailored toward
the most common document scenes. Specifically, TaCo leverages contrastive
learning to dispel the ambiguity trap arising from vague and open-ended
attributes. To realize this goal, we design the learning paradigm from three
perspectives: 1) generating attribute views, 2) extracting subtle but crucial
details, and 3) exploiting valued view pairs for learning, to fully unlock the
pre-training potential. Extensive experiments show that TaCo surpasses the
supervised counterparts and advances the state-of-the-art remarkably on
multiple attribute recognition tasks. Online services of TaCo will be made
available.
Related papers
- MARS: Paying more attention to visual attributes for text-based person search [6.438244172631555]
This paper presents a novel TBPS architecture named MARS (Mae-Attribute-Relation-Sensitive)
It enhances current state-of-the-art models by introducing two key components: a Visual Reconstruction Loss and an Attribute Loss.
Experiments on three commonly used datasets, namely CUHK-PEDES, ICFG-PEDES, and RSTPReid, report performance improvements.
arXiv Detail & Related papers (2024-07-05T06:44:43Z) - DualFocus: Integrating Plausible Descriptions in Text-based Person Re-identification [6.381155145404096]
We introduce DualFocus, a unified framework that integrates plausible descriptions to enhance the interpretative accuracy of vision-language models in Person Re-identification tasks.
To achieve a balance between coarse and fine-grained alignment of visual and textual embeddings, we propose the Dynamic Tokenwise Similarity (DTS) loss.
The comprehensive experiments on CUHK-PEDES, ICFG-PEDES, and RSTPReid, DualFocus demonstrates superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2024-05-13T04:21:00Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - DualCoOp++: Fast and Effective Adaptation to Multi-Label Recognition
with Limited Annotations [79.433122872973]
Multi-label image recognition in the low-label regime is a task of great challenge and practical significance.
We leverage the powerful alignment between textual and visual features pretrained with millions of auxiliary image-text pairs.
We introduce an efficient and effective framework called Evidence-guided Dual Context Optimization (DualCoOp++)
arXiv Detail & Related papers (2023-08-03T17:33:20Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Semantic Prompt for Few-Shot Image Recognition [76.68959583129335]
We propose a novel Semantic Prompt (SP) approach for few-shot learning.
The proposed approach achieves promising results, improving the 1-shot learning accuracy by 3.67% on average.
arXiv Detail & Related papers (2023-03-24T16:32:19Z) - OvarNet: Towards Open-vocabulary Object Attribute Recognition [42.90477523238336]
We propose a naive two-stage approach for open-vocabulary object detection and attribute classification, termed CLIP-Attr.
The candidate objects are first proposed with an offline RPN and later classified for semantic category and attributes.
We show that recognition of semantic category and attributes is complementary for visual scene understanding.
arXiv Detail & Related papers (2023-01-23T15:59:29Z) - Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene
Text Recognition [60.36540008537054]
In this work, we excavate the implicit task, character counting within the traditional text recognition, without additional labor annotation cost.
We design a two-branch reciprocal feature learning framework in order to adequately utilize the features from both the tasks.
Experiments on 7 benchmarks show the advantages of the proposed methods in both text recognition and the new-built character counting tasks.
arXiv Detail & Related papers (2021-05-13T12:27:35Z) - Semi-supervised Learning with a Teacher-student Network for Generalized
Attribute Prediction [7.462336024223667]
This paper presents a study on semi-supervised learning to solve the visual attribute prediction problem.
Our method achieves competitive performance on various benchmarks for fashion attribute prediction.
arXiv Detail & Related papers (2020-07-14T02:06:24Z) - ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural
Language [36.319953919737245]
Person search by natural language aims at retrieving a specific person in a large-scale image pool that matches the given textual descriptions.
We propose an attribute-aligning perspective that allows grounding specific attribute phrases to the corresponding visual regions.
We achieve success as well as the performance boosting by a robust feature learning.
arXiv Detail & Related papers (2020-05-15T02:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.