Related papers: You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

URL: http://arxiv.org/abs/2502.05475v1
Date: Sat, 08 Feb 2025 07:24:04 GMT
Title: You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Authors: Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, Daniel Murfet,
Abstract summary: We argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment.<n>Standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems.
Score: 35.44688262764995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this position paper, we argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment. First, we discuss how two neural networks can have equivalent performance on the training set but compute their outputs in essentially different ways and thus generalise differently. For this reason, standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems. We argue that to progress beyond evaluation to a robust mathematical science of AI alignment, we need to develop statistical foundations for an understanding of the relation between structure in the data distribution, internal structure in models, and how these structures underlie generalisation.

Related papers

Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation [3.8073142980733]
This thesis introduces quantitative methods for identifying systematic structure in a mapping between spaces.<n>I identify structural primitives present in a mapping, along with information theoretics of each.<n>I also introduce a novel, performant, approach to estimating the entropy of vector space, that allows this analysis to be applied to models ranging in size from 1 million to 12 billion parameters.
arXiv Detail & Related papers (2025-05-29T19:27:50Z)
Binarized Neural Networks Converge Toward Algorithmic Simplicity: Empirical Support for the Learning-as-Compression Hypothesis [36.24954635616374]
We propose a shift toward algorithmic information theory, using Binarized Neural Networks (BNNs) as a first proxy.<n>We apply the Block Decomposition Method (BDM) and demonstrate it more closely tracks structural changes during training than entropy.<n>These results support the view of training as a process of algorithmic compression, where learning corresponds to the progressive internalization of structured regularities.
arXiv Detail & Related papers (2025-05-27T02:51:36Z)
The Coverage Principle: A Framework for Understanding Compositional Generalization [31.762330857169914]
We show that models relying primarily on pattern matching for compositional tasks cannot reliably generalize beyond substituting fragments that yield identical results when used in the same contexts.<n>We demonstrate that this framework has a strong predictive power for the generalization capabilities of Transformers.
arXiv Detail & Related papers (2025-05-26T17:55:15Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure" We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z)
Principled and Efficient Motif Finding for Structure Learning of Lifted Graphical Models [5.317624228510748]
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning. We present the first principled approach for mining structural motifs in lifted graphical models. We show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.
arXiv Detail & Related papers (2023-02-09T12:21:55Z)
Isometric Representations in Neural Networks Improve Robustness [0.0]
We train neural networks to perform classification while simultaneously maintaining within-class metric structure. We verify that isometric regularization improves the robustness to adversarial attacks on MNIST.
arXiv Detail & Related papers (2022-11-02T16:18:18Z)
Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test. We train a variational inference model to predict the causal structure from observational/interventional data. Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z)
Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all. Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Understanding Dynamics of Nonlinear Representation Learning and Its Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning. We show that the data-architecture alignment condition is sufficient for the global convergence. We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z)
Sheaves as a Framework for Understanding and Interpreting Model Fit [2.867517731896504]
We argue that sheaves can provide a natural framework to analyze how well a statistical model fits at the local level. The sheaf-based approach is suitably general enough to be useful in a range of applications.
arXiv Detail & Related papers (2021-05-21T15:34:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.