Related papers: XY Neural Networks

Related papers

AXLearn: Modular Large Model Training on Heterogeneous Infrastructure [64.33868455931301]
AXLearn is a production deep learning system that facilitates scalable and high-performance training of large deep learning models.<n>Compared to other state-of-the-art deep learning systems, AXLearn has a unique focus on modularity and support for heterogeneous hardware infrastructure.
arXiv Detail & Related papers (2025-07-07T18:50:58Z)
Broad Spectrum Structure Discovery in Large-Scale Higher-Order Networks [1.7273380623090848]
We introduce a class of probabilistic models that efficiently represents and discovers a broad spectrum of mesoscale structure in large-scale hypergraphs.<n>By modeling observed node interactions through latent interactions among classes using low-rank representations, our approach tractably captures rich structural patterns.<n>Our model improves link prediction over state-of-the-art methods and discovers interpretable structures in diverse real-world systems.
arXiv Detail & Related papers (2025-05-27T20:34:58Z)
Hierarchical-embedding autoencoder with a predictor (HEAP) as efficient architecture for learning long-term evolution of complex multi-scale physical systems [41.94295877935867]
Structures of various scales that dynamically emerge in the system interact with each other only locally.<n>The hierarchical fully-convolutional autoencoder transforms the state of a physical system into a series of embedding layers.<n> Interactions between features of various scales are modeled using a combination of convolutional operators.
arXiv Detail & Related papers (2025-05-24T20:27:16Z)
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures [49.19753720526998]
We derive theoretical scaling laws for neural network performance on synthetic datasets.<n>We validate that convolutional networks, whose structure aligns with that of the generative process through locality and weight sharing, enjoy a faster scaling of performance.<n>This finding clarifies the architectural biases underlying neural scaling laws and highlights how representation learning is shaped by the interaction between model architecture and the statistical properties of data.
arXiv Detail & Related papers (2025-05-11T17:44:14Z)
Manifold meta-learning for reduced-complexity neural system identification [1.0276024900942875]
We propose a meta-learning framework that discovers a low-dimensional manifold. This manifold is learned from a meta-dataset of input-output sequences generated by a class of related dynamical systems. Unlike bilevel meta-learning approaches, our method employs an auxiliary neural network to map datasets directly onto the learned manifold.
arXiv Detail & Related papers (2025-04-16T06:49:56Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships. Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models [9.318262213262866]
We introduce a novel framework for learning semi-structured dynamics models for contact-rich systems. We make accurate long-horizon predictions with substantially less data than prior methods. We validate our approach on a real-world Unitree Go1 quadruped robot.
arXiv Detail & Related papers (2024-10-11T18:11:21Z)
On The Specialization of Neural Modules [16.83151955540625]
We study the ability of network modules to specialize to useful structures in a dataset and achieve systematic generalization. Our results shed light on the difficulty of module specialization, what is required for modules to successfully specialize, and the necessity of modular architectures to achieve systematicity.
arXiv Detail & Related papers (2024-09-23T12:58:11Z)
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling [4.190836962132713]
This paper introduces Orchid, a novel architecture designed to address the quadratic complexity of traditional attention mechanisms. At the core of this architecture lies a new data-dependent global convolution layer, which contextually adapts its conditioned kernel on input sequence. We evaluate the proposed model across multiple domains, including language modeling and image classification, to highlight its performance and generality.
arXiv Detail & Related papers (2024-02-28T17:36:45Z)
Mechanistic Neural Networks for Scientific Machine Learning [58.99592521721158]
We present Mechanistic Neural Networks, a neural network design for machine learning applications in the sciences. It incorporates a new Mechanistic Block in standard architectures to explicitly learn governing differential equations as representations. Central to our approach is a novel Relaxed Linear Programming solver (NeuRLP) inspired by a technique that reduces solving linear ODEs to solving linear programs.
arXiv Detail & Related papers (2024-02-20T15:23:24Z)
Enhancing Representations through Heterogeneous Self-Supervised Learning [61.40674648939691]
We propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model. The HSSL endows the base model with new characteristics in a representation learning way without structural changes. The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks.
arXiv Detail & Related papers (2023-10-08T10:44:05Z)
Interpretable learning of effective dynamics for multiscale systems [5.754251195342313]
We propose a novel framework of Interpretable Learning Effective Dynamics (iLED) iLED offers comparable accuracy to state-of-theart recurrent neural network-based approaches. Our results show that the iLED framework can generate accurate predictions and obtain interpretable dynamics.
arXiv Detail & Related papers (2023-09-11T20:29:38Z)
Leveraging the structure of dynamical systems for data-driven modeling [111.45324708884813]
We consider the impact of the training set and its structure on the quality of the long-term prediction. We show how an informed design of the training set, based on invariants of the system and the structure of the underlying attractor, significantly improves the resulting models.
arXiv Detail & Related papers (2021-12-15T20:09:20Z)
Towards a Predictive Processing Implementation of the Common Model of Cognition [79.63867412771461]
We describe an implementation of the common model of cognition grounded in neural generative coding and holographic associative memory. The proposed system creates the groundwork for developing agents that learn continually from diverse tasks as well as model human performance at larger scales.
arXiv Detail & Related papers (2021-05-15T22:55:23Z)
S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures. We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z)
Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks. To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data. We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.