Position: Foundation Models Need Digital Twin Representations
- URL: http://arxiv.org/abs/2505.03798v1
- Date: Thu, 01 May 2025 22:17:41 GMT
- Title: Position: Foundation Models Need Digital Twin Representations
- Authors: Yiqing Shen, Hao Ding, Lalithkumar Seenivasan, Tianmin Shu, Mathias Unberath,
- Abstract summary: Current foundation models (FMs) rely on token representations that directly fragment continuous real-world multimodal data into discrete tokens.<n>This paper argues that the machine learning community should consider digital twin (DT) representations as an alternative to the token representation for building FMs.
- Score: 17.86589733662011
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current foundation models (FMs) rely on token representations that directly fragment continuous real-world multimodal data into discrete tokens. They limit FMs to learning real-world knowledge and relationships purely through statistical correlation rather than leveraging explicit domain knowledge. Consequently, current FMs struggle with maintaining semantic coherence across modalities, capturing fine-grained spatial-temporal dynamics, and performing causal reasoning. These limitations cannot be overcome by simply scaling up model size or expanding datasets. This position paper argues that the machine learning community should consider digital twin (DT) representations, which are outcome-driven digital representations that serve as building blocks for creating virtual replicas of physical processes, as an alternative to the token representation for building FMs. Finally, we discuss how DT representations can address these challenges by providing physically grounded representations that explicitly encode domain knowledge and preserve the continuous nature of real-world processes.
Related papers
- Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - Continuous Representation Methods, Theories, and Applications: An Overview and Perspectives [55.22101595974193]
Recently, continuous representation methods emerge as novel paradigms that characterize the intrinsic structures of real-world data.<n>This review focuses on three aspects: (i) Continuous representation method designs such as basis function representation, statistical modeling, tensor function decomposition, and implicit neural representation; (ii) Theoretical foundations of continuous representations such as approximation error analysis, convergence property, and implicit regularization; and (iii) Real-world applications of continuous representations derived from computer vision, graphics, bioinformatics, and remote sensing.
arXiv Detail & Related papers (2025-05-21T07:50:19Z) - Let Synthetic Data Shine: Domain Reassembly and Soft-Fusion for Single Domain Generalization [68.41367635546183]
Single Domain Generalization aims to train models with consistent performance across diverse scenarios using data from a single source.<n>We propose Discriminative Domain Reassembly and Soft-Fusion (DRSF), a training framework leveraging synthetic data to improve model generalization.
arXiv Detail & Related papers (2025-03-17T18:08:03Z) - Transfer Learning through Enhanced Sufficient Representation: Enriching Source Domain Knowledge with Target Data [2.308168896770315]
We introduce a novel method for transfer learning called Transfer learning through Enhanced Sufficient Representation (TESR)<n>Our approach begins by estimating a sufficient and invariant representation from the source domains.<n>This representation is then enhanced with an independent component derived from the target data, ensuring that it is sufficient for the target domain and adaptable to its specific characteristics.
arXiv Detail & Related papers (2025-02-22T13:18:28Z) - Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance.<n>We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric.<n>The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z) - Transformers Use Causal World Models in Maze-Solving Tasks [49.67445252528868]
We identify World Models in transformers trained on maze-solving tasks.<n>We find that it is easier to activate features than to suppress them.<n> positional encoding schemes appear to influence how World Models are structured within the model's residual stream.
arXiv Detail & Related papers (2024-12-16T15:21:04Z) - Latent Communication in Artificial Neural Networks [2.5947832846531886]
This dissertation focuses on the universality and reusability of neural representations.
A salient observation from our research is the emergence of similarities in latent representations.
arXiv Detail & Related papers (2024-06-16T17:13:58Z) - Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-06-13T02:03:22Z) - Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-05-06T06:23:06Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Discrete, compositional, and symbolic representations through attractor dynamics [51.20712945239422]
We introduce a novel neural systems model that integrates attractor dynamics with symbolic representations to model cognitive processes akin to the probabilistic language of thought (PLoT)
Our model segments the continuous representational space into discrete basins, with attractor states corresponding to symbolic sequences, that reflect the semanticity and compositionality characteristic of symbolic systems through unsupervised learning, rather than relying on pre-defined primitives.
This approach establishes a unified framework that integrates both symbolic and sub-symbolic processing through neural dynamics, a neuroplausible substrate with proven expressivity in AI, offering a more comprehensive model that mirrors the complex duality of cognitive operations
arXiv Detail & Related papers (2023-10-03T05:40:56Z) - From latent dynamics to meaningful representations [0.5728954513076778]
We propose a purely dynamics-constrained representation learning framework.
We show this is a more natural constraint for representation learning in dynamical systems.
We validate our framework for different systems including a real-world fluorescent DNA movie dataset.
arXiv Detail & Related papers (2022-09-02T09:27:37Z) - Continuous-Time and Multi-Level Graph Representation Learning for
Origin-Destination Demand Prediction [52.0977259978343]
This paper proposes a Continuous-time and Multi-level dynamic graph representation learning method for Origin-Destination demand prediction (CMOD)
The state vectors keep historical transaction information and are continuously updated according to the most recently happened transactions.
Experiments are conducted on two real-world datasets from Beijing Subway and New York Taxi, and the results demonstrate the superiority of our model against the state-of-the-art approaches.
arXiv Detail & Related papers (2022-06-30T03:37:50Z) - Cross-domain Imitation from Observations [50.669343548588294]
Imitation learning seeks to circumvent the difficulty in designing proper reward functions for training agents by utilizing expert behavior.
In this paper, we study the problem of how to imitate tasks when there exist discrepancies between the expert and agent MDP.
We present a novel framework to learn correspondences across such domains.
arXiv Detail & Related papers (2021-05-20T21:08:25Z) - On the Transfer of Disentangled Representations in Realistic Settings [44.367245337475445]
We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images.
We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings.
arXiv Detail & Related papers (2020-10-27T16:15:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.