Contextual Representation Learning beyond Masked Language Modeling
- URL: http://arxiv.org/abs/2204.04163v1
- Date: Fri, 8 Apr 2022 16:18:06 GMT
- Title: Contextual Representation Learning beyond Masked Language Modeling
- Authors: Zhiyi Fu, Wangchunshu Zhou, Jingjing Xu, Hao Zhou, Lei Li
- Abstract summary: We analyze language models (MLMs) such as BERT learn contextually.
To address these issues, we propose TACO, a representation learning approach directly directly global semantics.
TACO extracts contextual semantics hidden in contextualized representations to encourage models to attend global semantics.
- Score: 45.46220173487394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How do masked language models (MLMs) such as BERT learn contextual
representations? In this work, we analyze the learning dynamics of MLMs. We
find that MLMs adopt sampled embeddings as anchors to estimate and inject
contextual semantics to representations, which limits the efficiency and
effectiveness of MLMs. To address these issues, we propose TACO, a simple yet
effective representation learning approach to directly model global semantics.
TACO extracts and aligns contextual semantics hidden in contextualized
representations to encourage models to attend global semantics when generating
contextualized representations. Experiments on the GLUE benchmark show that
TACO achieves up to 5x speedup and up to 1.2 points average improvement over
existing MLMs. The code is available at https://github.com/FUZHIYI/TACO.
Related papers
- ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models [11.997499811414837]
Masked Language Models (ML)Mss are trained by randomly masking portions of the input sequences with [MASK] tokens and learning to reconstruct the original content based on the remaining context.
arXiv Detail & Related papers (2025-01-23T05:46:50Z) - Knowing Where to Focus: Attention-Guided Alignment for Text-based Person Search [64.15205542003056]
We introduce Attention-Guided Alignment (AGA) framework featuring two innovative components: Attention-Guided Mask (AGM) Modeling and Text Enrichment Module (TEM)
AGA achieves new state-of-the-art results with Rank-1 accuracy reaching 78.36%, 67.31%, and 67.4% on CUHK-PEDES, ICFG-PEDES, and RSTP, respectively.
arXiv Detail & Related papers (2024-12-19T17:51:49Z) - Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings [69.35226485836641]
Excessive use of visual tokens in existing Multimoal Large Language Models (MLLMs) often exhibits obvious redundancy and brings in prohibitively expensive computation.
We propose a simple yet effective method to improve the efficiency of MLLMs, termed dynamic visual-token exit (DyVTE)
DyVTE uses lightweight hyper-networks to perceive the text token status and decide the removal of all visual tokens after a certain layer.
arXiv Detail & Related papers (2024-11-29T11:24:23Z) - ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models [73.34709921061928]
We propose a training-free method to inject visual prompts into Multimodal Large Language Models (MLLMs)
We optimize a learnable latent variable based on an energy function, enhancing the strength of referring regions in the attention map.
Our method offers a promising direction for integrating referring abilities into MLLMs, and supports referring with box, mask, scribble and point.
arXiv Detail & Related papers (2024-07-31T11:40:29Z) - NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks.
Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored.
We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z) - MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models [73.86954509967416]
Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks.
This paper presents the first comprehensive MLLM Evaluation benchmark MME.
It measures both perception and cognition abilities on a total of 14 subtasks.
arXiv Detail & Related papers (2023-06-23T09:22:36Z) - Masked and Permuted Implicit Context Learning for Scene Text Recognition [8.742571493814326]
Scene Recognition (STR) is difficult because of variations in text styles, shapes, and backgrounds.
We propose a masked and permuted implicit context learning network for STR, within a single decoder.
arXiv Detail & Related papers (2023-05-25T15:31:02Z) - Universal Sentence Representation Learning with Conditional Masked
Language Model [7.334766841801749]
We present Conditional Masked Language Modeling (M) to effectively learn sentence representations.
Our English CMLM model achieves state-of-the-art performance on SentEval.
As a fully unsupervised learning method, CMLM can be conveniently extended to a broad range of languages and domains.
arXiv Detail & Related papers (2020-12-28T18:06:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.