Related papers: Contextures: Representations from Contexts

Contextures: Representations from Contexts

URL: http://arxiv.org/abs/2505.01557v1
Date: Fri, 02 May 2025 19:50:56 GMT
Title: Contextures: Representations from Contexts
Authors: Runtian Zhai, Kai Yang, Che-Ping Tsai, Burak Varici, Zico Kolter, Pradeep Ravikumar,
Abstract summary: We show that a large class of representation learning methods can be characterized as learning from the association between the input and a context variable.<n>We show that many popular methods aim to approximate the top-d singular functions of the expectation operator induced by the context.<n>We also prove that the representations that learn the contexture are optimal on those tasks that are compatible with the context.
Score: 31.21738569535616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the empirical success of foundation models, we do not have a systematic characterization of the representations that these models learn. In this paper, we establish the contexture theory. It shows that a large class of representation learning methods can be characterized as learning from the association between the input and a context variable. Specifically, we show that many popular methods aim to approximate the top-d singular functions of the expectation operator induced by the context, in which case we say that the representation learns the contexture. We demonstrate the generality of the contexture theory by proving that representation learning within various learning paradigms -- supervised, self-supervised, and manifold learning -- can all be studied from such a perspective. We also prove that the representations that learn the contexture are optimal on those tasks that are compatible with the context. One important implication of the contexture theory is that once the model is large enough to approximate the top singular functions, further scaling up the model size yields diminishing returns. Therefore, scaling is not all we need, and further improvement requires better contexts. To this end, we study how to evaluate the usefulness of a context without knowing the downstream tasks. We propose a metric and show by experiments that it correlates well with the actual performance of the encoder on many real datasets.

Related papers

Contextures: The Mechanism of Representation Learning [3.7257289916860152]
dissertation establishes the contexture theory to mathematically characterize the mechanism of representation learning, or pretraining.<n>The central argument is that a representation is learned from the association between the input X and a context variable A.<n>We demonstrate that many pretraining objectives can learn the contexture, including supervised learning, self-supervised learning, generative models, etc.
arXiv Detail & Related papers (2025-04-28T13:36:28Z)
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations [69.72654127617058]
Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs)<n>In this work we bring forward empirical evidence that challenges this very notion.<n>We discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer play a crucial role.
arXiv Detail & Related papers (2025-03-01T22:25:11Z)
Specify Robust Causal Representation from Mixed Observations [35.387451486213344]
Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models. We develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures. We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts.
arXiv Detail & Related papers (2023-10-21T02:18:35Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
Relate to Predict: Towards Task-Independent Knowledge Representations for Reinforcement Learning [11.245432408899092]
Reinforcement Learning can enable agents to learn complex tasks. It is difficult to interpret the knowledge and reuse it across tasks. In this paper, we introduce an inductive bias for explicit object-centered knowledge separation. We show that the degree of explicitness in knowledge separation correlates with faster learning, better accuracy, better generalization, and better interpretability.
arXiv Detail & Related papers (2022-12-10T13:33:56Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Learning from Context or Names? An Empirical Study on Neural Relation Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names) We propose an entity-masked contrastive pre-training framework for relation extraction (RE) Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z)
Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations. Our framework well preserves the relations between samples. By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
How Far are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it. We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.