You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
- URL: http://arxiv.org/abs/2502.05475v1
- Date: Sat, 08 Feb 2025 07:24:04 GMT
- Title: You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
- Authors: Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, Daniel Murfet,
- Abstract summary: We argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment.
Standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems.
- Score: 35.44688262764995
- License:
- Abstract: In this position paper, we argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment. First, we discuss how two neural networks can have equivalent performance on the training set but compute their outputs in essentially different ways and thus generalise differently. For this reason, standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems. We argue that to progress beyond evaluation to a robust mathematical science of AI alignment, we need to develop statistical foundations for an understanding of the relation between structure in the data distribution, internal structure in models, and how these structures underlie generalisation.
Related papers
- Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Principled and Efficient Motif Finding for Structure Learning of Lifted
Graphical Models [5.317624228510748]
Structure learning is a core problem in AI central to the fields of neuro-symbolic AI and statistical relational learning.
We present the first principled approach for mining structural motifs in lifted graphical models.
We show that we outperform state-of-the-art structure learning approaches by up to 6% in terms of accuracy and up to 80% in terms of runtime.
arXiv Detail & Related papers (2023-02-09T12:21:55Z) - Isometric Representations in Neural Networks Improve Robustness [0.0]
We train neural networks to perform classification while simultaneously maintaining within-class metric structure.
We verify that isometric regularization improves the robustness to adversarial attacks on MNIST.
arXiv Detail & Related papers (2022-11-02T16:18:18Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Nested Named Entity Recognition as Holistic Structure Parsing [92.8397338250383]
This work models the full nested NEs in a sentence as a holistic structure, then we propose a holistic structure parsing algorithm to disclose the entire NEs once for all.
Experiments show that our model yields promising results on widely-used benchmarks which approach or even achieve state-of-the-art.
arXiv Detail & Related papers (2022-04-17T12:48:20Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Understanding Dynamics of Nonlinear Representation Learning and Its
Application [12.697842097171119]
We study the dynamics of implicit nonlinear representation learning.
We show that the data-architecture alignment condition is sufficient for the global convergence.
We derive a new training framework, which satisfies the data-architecture alignment condition without assuming it.
arXiv Detail & Related papers (2021-06-28T16:31:30Z) - Sheaves as a Framework for Understanding and Interpreting Model Fit [2.867517731896504]
We argue that sheaves can provide a natural framework to analyze how well a statistical model fits at the local level.
The sheaf-based approach is suitably general enough to be useful in a range of applications.
arXiv Detail & Related papers (2021-05-21T15:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.