Related papers: Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models

Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models

URL: http://arxiv.org/abs/2502.17669v2
Date: Thu, 16 Oct 2025 20:51:47 GMT
Title: Towards Human Cognition: Visual Context Guides Syntactic Priming in Fusion-Encoded Models
Authors: Bushi Xiao, Michael Bennie, Jayetri Bardhan, Daisy Zhe Wang,
Abstract summary: Structural priming is a cognitive phenomenon where exposure to a particular syntactic structure increases the likelihood of producing the same structure in subsequent utterances.<n>We introduce PRISMATIC, the first multimodal structural priming dataset.<n>We propose the Syntactic Preservation Index (SPI), a novel reference-free evaluation metric designed specifically to assess structural priming effects in sentence level.
Score: 3.63819860423174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structural priming is a cognitive phenomenon where exposure to a particular syntactic structure increases the likelihood of producing the same structure in subsequent utterances. While humans consistently demonstrate structural priming effects across various linguistic contexts, it remains unclear whether multimodal large language models (MLLMs) exhibit similar syntactic preservation behaviors. We introduce PRISMATIC, the first multimodal structural priming dataset, which advances computational linguistics by providing a standardized benchmark for investigating syntax-vision interactions. We propose the Syntactic Preservation Index (SPI), a novel reference-free evaluation metric designed specifically to assess structural priming effects in sentence level. Using this metric, we constructed and tested models with two different multimodal encoding architectures to investigate their structural preservation capabilities. Our experimental results demonstrate that models with both encoding methods show comparable syntactic priming effects. However, only fusion-encoded models exhibit robust positive correlations between priming effects and visual similarity, suggesting a cognitive process more aligned with human psycholinguistic patterns. This work provides new insights into evaluating and understanding how syntactic information is processed in multimodal language models.

Related papers

Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Analysis and Visualization of Linguistic Structures in Large Language Models: Neural Representations of Verb-Particle Constructions in BERT [0.0]
This study investigates the internal representations of verb-particle combinations within large language models (LLMs)<n>We analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'<n>Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories.
arXiv Detail & Related papers (2024-12-19T09:21:39Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models. Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z)
Collaborative decoding of critical tokens for boosting factuality of large language models [57.504894664689]
Finetuned and aligned models show improved abilities of instruction following and safe generation. The common practice of using sampling during generation also increases chances of hallucination. We introduce a collaborative decoding framework to harness the high factuality within pretrained models through the concept of critical tokens.
arXiv Detail & Related papers (2024-02-28T01:53:37Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges. We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z)
A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data [77.34726150561087]
This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. The model is able to produce fairly rich yet human-readable conceptual representations without training.
arXiv Detail & Related papers (2023-07-16T15:59:13Z)
Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis [89.04041100520881]
This research proposes to retrieve textual and visual evidence based on the object, sentence, and whole image. We develop a novel approach to synthesize the object-level, image-level, and sentence-level information for better reasoning between the same and different modalities.
arXiv Detail & Related papers (2023-05-25T15:26:13Z)
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models [39.479912987123214]
Self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks. We introduce Fusioner, with a lightweight, transformer-based fusion module, that pairs the frozen visual representation with language concept. We show that, the proposed fusion approach is effective to any pair of visual and language models, even those pre-trained on a corpus of uni-modal data.
arXiv Detail & Related papers (2022-10-27T02:57:26Z)
TAGPRIME: A Unified Framework for Relational Structure Extraction [71.88926365652034]
TAGPRIME is a sequence tagging model that appends priming words about the information of the given condition to the input text. With the self-attention mechanism in pre-trained language models, the priming words make the output contextualized representations contain more information about the given condition. Extensive experiments and analyses on three different tasks that cover ten datasets across five different languages demonstrate the generality and effectiveness of TAGPRIME.
arXiv Detail & Related papers (2022-05-25T08:57:46Z)
Oracle Linguistic Graphs Complement a Pretrained Transformer Language Model: A Cross-formalism Comparison [13.31232311913236]
We examine the extent to which, in principle, linguistic graph representations can complement and improve neural language modeling. We find that, overall, semantic constituency structures are most useful to language modeling performance.
arXiv Detail & Related papers (2021-12-15T04:29:02Z)
Syntactic Persistence in Language Models: Priming as a Window into Abstract Language Representations [0.38498574327875945]
We investigate the extent to which modern, neural language models are susceptible to syntactic priming. We introduce a novel metric and release Prime-LM, a large corpus where we control for various linguistic factors which interact with priming strength. We report surprisingly strong priming effects when priming with multiple sentences, each with different words and meaning but with identical syntactic structure.
arXiv Detail & Related papers (2021-09-30T10:38:38Z)
Model-Based Counterfactual Synthesizer for Interpretation [40.01787107375103]
We propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models. We first analyze the model-based counterfactual process and construct a base synthesizer using a conditional generative adversarial net (CGAN) To better approximate the counterfactual universe for those rare queries, we novelly employ the umbrella sampling technique to conduct the MCS framework training.
arXiv Detail & Related papers (2021-06-16T17:09:57Z)
Prototypical Representation Learning for Relation Extraction [56.501332067073065]
This paper aims to learn predictive, interpretable, and robust relation representations from distantly-labeled data. We learn prototypes for each relation from contextual information to best explore the intrinsic semantics of relations. Results on several relation learning tasks show that our model significantly outperforms the previous state-of-the-art relational models.
arXiv Detail & Related papers (2021-03-22T08:11:43Z)
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors. We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method. Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z)
Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models [47.42249565529833]
Humans can learn structural properties about a word from minimal experience. We assess the ability of modern neural language models to reproduce this behavior in English.
arXiv Detail & Related papers (2020-10-12T14:12:37Z)
Exploiting Syntactic Structure for Better Language Modeling: A Syntactic Distance Approach [78.77265671634454]
We make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called "syntactic distances" Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
arXiv Detail & Related papers (2020-05-12T15:35:00Z)
A Systematic Assessment of Syntactic Generalization in Neural Language Models [20.589737524626745]
We present a systematic evaluation of the syntactic knowledge of neural language models. We find substantial differences in syntactic generalization performance by model architecture. Our results also reveal a dissociation between perplexity and syntactic generalization performance.
arXiv Detail & Related papers (2020-05-07T18:35:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.