Related papers: Towards Atoms of Large Language Models

Towards Atoms of Large Language Models

URL: http://arxiv.org/abs/2509.20784v1
Date: Thu, 25 Sep 2025 06:13:05 GMT
Title: Towards Atoms of Large Language Models
Authors: Chenhui Hu, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao,
Abstract summary: Internal representations in large language models (LLMs) remain undefined, limiting further understanding of their mechanisms.<n>We propose the Atoms Theory, which defines such units as atoms.<n>We train threshold-activated SAEs on Gemma2-2B, Gemma2-9B, and Llama3.1-8B, achieving 99.9% sparse reconstruction across layers on average.
Score: 33.04392302606777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The fundamental units of internal representations in large language models (LLMs) remain undefined, limiting further understanding of their mechanisms. Neurons or features are often regarded as such units, yet neurons suffer from polysemy, while features face concerns of unreliable reconstruction and instability. To address this issue, we propose the Atoms Theory, which defines such units as atoms. We introduce the atomic inner product (AIP) to correct representation shifting, formally define atoms, and prove the conditions that atoms satisfy the Restricted Isometry Property (RIP), ensuring stable sparse representations over atom set and linking to compressed sensing. Under stronger conditions, we further establish the uniqueness and exact $\ell_1$ recoverability of the sparse representations, and provide guarantees that single-layer sparse autoencoders (SAEs) with threshold activations can reliably identify the atoms. To validate the Atoms Theory, we train threshold-activated SAEs on Gemma2-2B, Gemma2-9B, and Llama3.1-8B, achieving 99.9% sparse reconstruction across layers on average, and more than 99.8% of atoms satisfy the uniqueness condition, compared to 0.5% for neurons and 68.2% for features, showing that atoms more faithfully capture intrinsic representations of LLMs. Scaling experiments further reveal the link between SAEs size and recovery capacity. Overall, this work systematically introduces and validates Atoms Theory of LLMs, providing a theoretical framework for understanding internal representations and a foundation for mechanistic interpretability. Code available at https://github.com/ChenhuiHu/towards_atoms.

Related papers

Acceleration of Atomistic NEGF: Algorithms, Parallelization, and Machine Learning [61.12861060232382]
The Non-equilibrium Green's function (NEGF) formalism is a powerful method to simulate the quantum transport properties of nanoscale devices.<n>This paper summarizes key (algorithmic) achievements that have allowed us to bring DFT+NEGF simulations closer to the dimensions and functionality of realistic systems.
arXiv Detail & Related papers (2026-02-03T12:01:39Z)
AtomDisc: An Atom-level Tokenizer that Boosts Molecular LLMs and Reveals Structure--Property Associations [11.856011146903889]
We introduce AtomDisc, a framework that quantizes atom-level local environments into structure-aware tokens embedded in large language models.<n>Our experiments show that AtomDisc, in a data-driven way, can distinguish chemically meaningful structural features that reveal structure-property associations.
arXiv Detail & Related papers (2025-11-28T02:42:17Z)
Repeated ancilla reuse for logical computation on a neutral atom quantum computer [0.13703179370841895]
Quantum processors based on neutral atoms trapped in optical tweezers are inherently prone to atom loss.<n>We demonstrate the ability to replace lost atoms during a quantum computation while maintaining coherence in other atoms.<n>This is a key step towards execution of logical computations that last longer than the lifetime of an atom in the system.
arXiv Detail & Related papers (2025-06-11T16:58:17Z)
Tokenizing Electron Cloud in Protein-Ligand Interaction Learning [51.74909649330779]
ECBind is a method for tokenizing electron cloud signals into quantized embeddings.<n>It helps uncover binding modes that cannot be fully represented by atom-level models.<n>To extend its applicability to a wider range of scenarios, we utilize knowledge distillation to develop an electron-cloud-agnostic prediction model.
arXiv Detail & Related papers (2025-05-25T07:36:50Z)
Annealing for prediction of grand canonical crystal structures: Efficient implementation of n-body atomic interactions [0.0]
We propose a scheme usable on modern Ising machines for crystal structures prediction (CSP) We take into account the general n-body atomic interactions, and in particular three-body interactions which are necessary to simulate covalent bonds. The crystal structure is represented by discretizing a unit cell and placing binary variables which express the existence or non-existence of an atom on every grid point.
arXiv Detail & Related papers (2023-07-06T16:49:06Z)
Efficient Approximations of Complete Interatomic Potentials for Crystal Property Prediction [63.4049850776926]
A crystal structure consists of a minimal unit cell that is repeated infinitely in 3D space. Current methods construct graphs by establishing edges only between nearby nodes. We propose to model physics-principled interatomic potentials directly instead of only using distances.
arXiv Detail & Related papers (2023-06-12T07:19:01Z)
Atomic and Subgraph-aware Bilateral Aggregation for Molecular Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA) ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information. Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z)
Completeness of Atomic Structure Representations [0.0]
We present a novel approach to construct descriptors of emphfinite correlations based on the relative arrangement of particle triplets. Our strategy is demonstrated on a class of atomic arrangements that are specifically built to defy a broad class of conventional symmetric descriptors.
arXiv Detail & Related papers (2023-02-28T17:11:42Z)
Optical reconstruction of collective density matrix of qutrit [0.0]
Reconstruction of a quantum state is of prime importance for quantum-information science. We present a method of reconstruction of a collective density matrix of an atomic ensemble, consisting of atoms with an $F=1$ ground state.
arXiv Detail & Related papers (2021-07-08T15:54:49Z)
Maximum refractive index of an atomic medium [58.720142291102135]
All optical materials with a positive refractive index have a value of index that is of order unity. Despite the giant response of an isolated atom, we find that the maximum index does not indefinitely grow with increasing density. We propose an explanation based upon strong-disorder renormalization group theory.
arXiv Detail & Related papers (2020-06-02T14:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.