Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models
- URL: http://arxiv.org/abs/2502.17669v1
- Date: Mon, 24 Feb 2025 21:33:27 GMT
- Title: Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models
- Authors: Bushi Xiao, Michael Bennie, Jayetri Bardhan, Daisy Zhe Wang,
- Abstract summary: PRISMATIC is the first multimodal structural priming dataset.<n>This work provides new insights into evaluating and understanding how syntactic information is processed in multimodal language models.
- Score: 3.2327374287192314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduced PRISMATIC, the first multimodal structural priming dataset, and proposed a reference-free evaluation metric that assesses priming effects without predefined target sentences. Using this metric, we constructed and tested models with different multimodal encoding architectures (dual encoder and fusion encoder) to investigate their structural preservation capabilities. Our findings show that models with both encoding methods demonstrate comparable syntactic priming effects. However, only fusion-encoded models exhibit robust positive correlations between priming effects and visual similarity, suggesting a cognitive process more aligned with human psycholinguistic patterns. This work provides new insights into evaluating and understanding how syntactic information is processed in multimodal language models.
Related papers
- Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Collaborative decoding of critical tokens for boosting factuality of
large language models [57.504894664689]
Finetuned and aligned models show improved abilities of instruction following and safe generation.
The common practice of using sampling during generation also increases chances of hallucination.
We introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens.
arXiv Detail & Related papers (2024-02-28T01:53:37Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - A Recursive Bateson-Inspired Model for the Generation of Semantic Formal
Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data.
The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept.
The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z) - Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image.
We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z) - Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models [39.479912987123214]
Self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.
We introduce Fusioner, with a lightweight, transformer-based fusion module, that pairs the frozen visual representation with language concept.
We show that, the proposed fusion approach is effective to any pair of visual and language models, even those pre-trained on a corpus of uni-modal data.
arXiv Detail & Related papers (2022-10-27T02:57:26Z) - TAGPRIME: A Unified Framework for Relational Structure Extraction [71.88926365652034]
TAGPRIME is a sequence tagging model that appends priming words about the information of the given condition to the input text.
With the self-attention mechanism in pre-trained language models, the priming words make the output contextualized representations contain more information about the given condition.
Extensive experiments and analyses on three different tasks that cover ten datasets across five different languages demonstrate the generality and effectiveness of TAGPRIME.
arXiv Detail & Related papers (2022-05-25T08:57:46Z) - Model-Based Counterfactual Synthesizer for Interpretation [40.01787107375103]
We propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models.
We first analyze the model-based counterfactual process and construct a base synthesizer using a conditional generative adversarial net (CGAN)
To better approximate the counterfactual universe for those rare queries, we novelly employ the umbrella sampling technique to conduct the MCS framework training.
arXiv Detail & Related papers (2021-06-16T17:09:57Z) - Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data.
We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations.
Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - A Systematic Assessment of Syntactic Generalization in Neural Language
Models [20.589737524626745]
We present a systematic evaluation of the syntactic knowledge of neural language models.
We find substantial differences in syntactic generalization performance by model architecture.
Our results also reveal a dissociation between perplexity and syntactic generalization performance.
arXiv Detail & Related papers (2020-05-07T18:35:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.