Related papers: The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis

Related papers

Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation [48.462734327375536]
Multi-modal large language models (MLLMs) incorporate heterogeneous modalities into LLMs, enabling a comprehensive understanding of diverse scenarios and objects.<n>Despite the proliferation of evaluation benchmarks and leaderboards for MLLMs, they predominantly overlook the critical capacity of MLLMs to comprehend world knowledge with structured abstractions that appear in visual form.<n>We propose M3STR, an innovative benchmark grounded in the Multi-Modal Map for STRuctured understanding.<n>Our findings reveal persistent deficiencies in processing abstractive visual information with structured knowledge, thereby charting a pivotal trajectory for advancing MLLMs' holistic reasoning capacities.
arXiv Detail & Related papers (2025-06-02T04:00:35Z)
How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices? [1.130790932059036]
We show that large language models (LLMs) reinforce the Matthew effect in citations by consistently favoring highly cited papers.<n>We analyze 274,951 references generated by GPT-4o for 10,000 papers.
arXiv Detail & Related papers (2025-04-03T17:04:56Z)
LATex: Leveraging Attribute-based Text Knowledge for Aerial-Ground Person Re-Identification [63.07563443280147]
We propose a novel framework named LATex for AG-ReID. It adopts prompt-tuning strategies to leverage attribute-based text knowledge. Our framework can fully leverage attribute-based text knowledge to improve the AG-ReID.
arXiv Detail & Related papers (2025-03-31T04:47:05Z)
MATHGLANCE: Multimodal Large Language Models Do Not Know Where to Look in Mathematical Diagrams [65.02628814094639]
Diagrams serve as a fundamental form of visual language, representing complex concepts and their inter-relationships through structured symbols, shapes, and spatial arrangements.<n>Current benchmarks conflate perceptual and reasoning tasks, making it difficult to assess whether Multimodal Large Language Models genuinely understand mathematical diagrams beyond superficial pattern recognition.<n>We introduce MATHGLANCE, a benchmark specifically designed to isolate and evaluate mathematical perception in MLLMs.<n>We construct GeoPeP, a perception-oriented dataset of 200K structured geometry image-text annotated with geometric primitives and precise spatial relationships.
arXiv Detail & Related papers (2025-03-26T17:30:41Z)
Steered Generation via Gradient Descent on Sparse Features [1.534667887016089]
We modify the internal structure of large language models (LLMs) by training sparse autoencoders to learn a sparse representation of the query embedding. We demonstrate that manipulating this sparse representation effectively transforms the output toward different stylistic and cognitive targets.
arXiv Detail & Related papers (2025-02-25T21:06:14Z)
The Complexity of Learning Sparse Superposed Features with Feedback [0.9838799448847586]
We investigate whether the underlying learned features of a model can be efficiently retrieved through feedback from an agent.<n>We analyze the feedback complexity associated with learning a feature matrix in sparse settings.<n>Our results establish tight bounds when the agent is permitted to construct activations and demonstrate strong upper bounds in sparse scenarios.
arXiv Detail & Related papers (2025-02-08T01:54:23Z)
Do Large Language Models Truly Understand Geometric Structures? [15.915781154075615]
We introduce the GeomRel dataset to evaluate large language models' understanding of geometric structures.<n>We propose the Geometry Chain-of-Thought (GeoCoT) method, which enhances LLMs' ability to identify geometric relationships.
arXiv Detail & Related papers (2025-01-23T15:52:34Z)
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models [50.98559225639266]
We investigate the contributions of visual features from different encoder layers using 18 benchmarks spanning 6 task categories.<n>Our findings reveal that multilayer features provide complementary strengths with varying task dependencies, and uniform fusion leads to suboptimal performance.<n>We propose the instruction-guided vision aggregator, a module that dynamically integrates multi-layer visual features based on textual instructions.
arXiv Detail & Related papers (2024-12-26T05:41:31Z)
Knowledge Boundary of Large Language Models: A Survey [75.67848187449418]
Large language models (LLMs) store vast amount of knowledge in their parameters, but they still have limitations in the memorization and utilization of certain knowledge.<n>This highlights the critical need to understand the knowledge boundary of LLMs, a concept that remains inadequately defined in existing research.<n>We propose a comprehensive definition of the LLM knowledge boundary and introduce a formalized taxonomy categorizing knowledge into four distinct types.
arXiv Detail & Related papers (2024-12-17T02:14:02Z)
Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning [83.10178754323955]
Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.<n>To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.<n>To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.<n>The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
arXiv Detail & Related papers (2024-11-28T09:50:25Z)
Probing Ranking LLMs: Mechanistic Interpretability in Information Retrieval [22.875174888476295]
We study the workings of state-of-the-art, fine-tuning-based passage-reranking transformer networks. Our approach involves a probing-based, layer-by-layer analysis of neurons within ranking LLMs. We identify individual or groups of known human-engineered and semantic features within the network's activations.
arXiv Detail & Related papers (2024-10-24T08:20:10Z)
Grouped Discrete Representation Guides Object-Centric Learning [18.44580501357929]
Transformer-based Object-Centric Discrete Learning can abstract dense images or textures into sparse object-level features.<n>We propose textitGrouped Representation (GDR) to address these issues by grouping features into attributes and indexing them with numbers.
arXiv Detail & Related papers (2024-07-01T19:00:40Z)
Can Large Language Models Understand DL-Lite Ontologies? An Empirical Study [10.051572826948762]
Large models (LLMs) have shown significant achievements in solving a wide range of tasks. We empirically analyze the LLMs' capability of understanding Description Logic (DL-Lite) We find that LLMs understand formal syntax and model-theoretic semantics of concepts and roles.
arXiv Detail & Related papers (2024-06-25T13:16:34Z)
Dual Relation Mining Network for Zero-Shot Learning [48.89161627050706]
We propose a Dual Relation Mining Network (DRMN) to enable effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer. Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion. For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images.
arXiv Detail & Related papers (2024-05-06T16:31:19Z)
Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? [57.04803703952721]
Large language models (LLMs) have shown remarkable performances across a wide range of tasks.<n>However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood.<n>We introduce the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers.
arXiv Detail & Related papers (2024-04-10T14:56:40Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Scientific Large Language Models: A Survey on Biological & Chemical Domains [47.97810890521825]
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration.
arXiv Detail & Related papers (2024-01-26T05:33:34Z)
Evaluating Spatial Understanding of Large Language Models [26.436450329727645]
Large language models show remarkable capabilities across a variety of tasks. Recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. We design natural-language navigation tasks and evaluate the ability of LLMs to represent and reason about spatial structures.
arXiv Detail & Related papers (2023-10-23T03:44:40Z)
MechGPT, a language-based strategy for mechanics and materials modeling that connects knowledge across scales, disciplines and modalities [0.0]
We use a Large Language Model (LLM) to distill question-answer pairs from raw sources followed by fine-tuning. The resulting MechGPT LLM foundation model is used in a series of computational experiments to explore its capacity for knowledge retrieval, various language tasks, hypothesis generation, and connecting knowledge across disparate areas.
arXiv Detail & Related papers (2023-10-16T14:29:35Z)
RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal. We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z)
The geometry of hidden representations of large transformer models [43.16765170255552]
Large transformers are powerful architectures used for self-supervised data analysis across various data types. We show that the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next. We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets.
arXiv Detail & Related papers (2023-02-01T07:50:26Z)
Probing Linguistic Features of Sentence-Level Representations in Neural Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE) We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets. We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.