Related papers: Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data

Related papers

Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models [18.841505010078112]
We investigate the potential of a standard patch Transformer, demonstrating that it achieves state-of-the-art zero-shot forecasting performance.<n>We conduct a comprehensive ablation study that covers model scaling, data composition, and training techniques to isolate the essential ingredients for high performance.
arXiv Detail & Related papers (2026-02-06T18:01:44Z)
Data Value in the Age of Scaling: Understanding LLM Scaling Dynamics Under Real-Synthetic Data Mixtures [32.89034139737846]
Large language models (LLMs) are built on datasets that blend real and synthetic data.<n> synthetic data offers scalability and cost-efficiency, but it often introduces systematic distributional discrepancies.<n>We propose an effective yet efficient data valuation method that scales to large-scale datasets.
arXiv Detail & Related papers (2025-11-17T17:53:12Z)
Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data [53.040873127309766]
We propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning.<n>Our method outperforms existing models on both in-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2025-09-08T17:58:06Z)
Learning Causal Structure Distributions for Robust Planning [53.753366558072806]
We find that learning the functional relationships while accounting for the uncertainty about the structural information leads to more robust dynamics models.<n>This in contrast with common model-learning methods that ignore the causal structure and fail to leverage the sparsity of interactions in robotic systems.<n>We show that our model can be used to learn the dynamics of a robot, which together with a sampling-based planner can be used to perform new tasks in novel environments.
arXiv Detail & Related papers (2025-08-08T22:43:17Z)
High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations [51.90920900332569]
Implicit neural representations (INRs) offer a compact and continuous framework for modeling spatially structured data.<n>Recent approaches address this by introducing additional features along rigid geometric structures.<n>We propose a simple yet effective alternative: Feature-Adaptive INR (FA-INR)
arXiv Detail & Related papers (2025-06-07T16:45:17Z)
Transformers Meet Relational Databases [0.0]
Transformer models have continuously expanded into all machine learning domains convertible to the underlying sequence-to-sequence representation.<n>We introduce a modular neural message-passing scheme that closely adheres to the formal relational model.<n>Our results demonstrate a superior performance of this newly proposed class of neural architectures.
arXiv Detail & Related papers (2024-12-06T17:48:43Z)
MOE-Enhanced Explanable Deep Manifold Transformation for Complex Data Embedding and Visualization [47.4136073281818]
Dimensionality reduction (DR) plays a crucial role in various fields, including data engineering and visualization.<n>DR methods face a trade-off between precision and transparency, where optimizing for performance can lead to reduced explainability.<n>This work introduces the MOE-based Explainable Deep Manifold Transformation (DMT-ME)
arXiv Detail & Related papers (2024-10-25T12:11:32Z)
A Survey on Deep Tabular Learning [0.0]
Tabular data presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep learning models for Tabular data, from early fully connected networks (FCNs) to advanced architectures like TabNet, SAINT, TabTranSELU, and MambaNet.
arXiv Detail & Related papers (2024-10-15T20:08:08Z)
Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples. An autoencoder with self-attention layers and optimal transport refines distributional consistency. Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z)
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations [98.7450564309923]
This paper takes initial steps on understanding in-context learning (ICL) in more complex scenarios, by studying learning with representations. We construct synthetic in-context learning problems with a compositional structure, where the label depends on the input through a possibly complex but fixed representation function. We show theoretically the existence of transformers that approximately implement such algorithms with mild depth and size.
arXiv Detail & Related papers (2023-10-16T17:40:49Z)
Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations. We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z)
Solving Reasoning Tasks with a Slot Transformer [7.966351917016229]
We present the Slot Transformer, an architecture that leverages slot attention, transformers and iterative variational inference on video scene data to infer representations. We evaluate the effectiveness of key components of the architecture, the model's representational capacity and its ability to predict from incomplete input.
arXiv Detail & Related papers (2022-10-20T16:40:30Z)
SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization [59.732036564862796]
We propose the Structure Information Modeling Transformer (SIM-Trans) to incorporate object structure information into transformer for enhancing discriminative representation learning. The proposed two modules are light-weighted and can be plugged into any transformer network and trained end-to-end easily. Experiments and analyses demonstrate that the proposed SIM-Trans achieves state-of-the-art performance on fine-grained visual categorization benchmarks.
arXiv Detail & Related papers (2022-08-31T03:00:07Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)
Multilinear Compressive Learning with Prior Knowledge [106.12874293597754]
Multilinear Compressive Learning (MCL) framework combines Multilinear Compressive Sensing and Machine Learning into an end-to-end system. Key idea behind MCL is the assumption of the existence of a tensor subspace which can capture the essential features from the signal for the downstream learning task. In this paper, we propose a novel solution to address both of the aforementioned requirements, i.e., How to find those tensor subspaces in which the signals of interest are highly separable?
arXiv Detail & Related papers (2020-02-17T19:06:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.