Related papers: Multiway Storage Modification Machines

Related papers

HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding [67.24430397016275]
We propose a new early-fusion LMM that can fuse multi-modal inputs in the early stage and respond to visual instructions in an auto-regressive manner. The proposed model demonstrates superior performance compared to other LMMs using one transformer and significantly narrows the performance gap with compositional LMMs.
arXiv Detail & Related papers (2025-03-12T06:01:05Z)
Learning in a Multifield Coherent Ising Machine [0.0]
We introduce a physical model for self-learning that encodes the learning rule in the Hamiltonian of the system. We numerically demonstrate that, in the presence of suitable nonlinear interactions between the long-term memory Ising machine and the short-term memory auxiliary field, the system autonomously learns from examples.
arXiv Detail & Related papers (2025-02-17T16:54:54Z)
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation [92.73405185996315]
Large Multimodal Models (LMMs) have demonstrated impressive capabilities in multimodal understanding and generation. Existing approaches, such as layout planning for multi-step generation and learning from human feedback or AI feedback, depend heavily on prompt engineering. We introduce a model-agnostic iterative self-feedback framework (SILMM) that can enable LMMs to provide helpful and scalable self-improvement and optimize text-image alignment.
arXiv Detail & Related papers (2024-12-08T05:28:08Z)
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models [92.36510016591782]
We present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs) Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens. Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models.
arXiv Detail & Related papers (2024-08-19T17:48:11Z)
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training [78.93900796545523]
Mini-Sequence Transformer (MsT) is a methodology for highly efficient and accurate LLM training with extremely long sequences. MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage. integrated with the huggingface library, MsT successfully extends the maximum context length of Qwen, Mistral, and Gemma-2 by 12-24x.
arXiv Detail & Related papers (2024-07-22T01:52:30Z)
MoEUT: Mixture-of-Experts Universal Transformers [75.96744719516813]
Universal Transformers (UTs) have advantages over standard Transformers in learning compositional generalizations. Layer-sharing drastically reduces the parameter count compared to the non-shared model with the same dimensionality. No previous work has succeeded in proposing a shared-layer Transformer design that is competitive in parameter count-dominated tasks such as language modeling.
arXiv Detail & Related papers (2024-05-25T03:24:32Z)
UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models [76.30799731147589]
We introduce UniRAG, a plug-and-play technique that adds relevant retrieved information to prompts as few-shot examples during inference. Unlike the common belief that Retrieval Augmentation (RA) mainly improves generation or understanding of uncommon entities, our evaluation results on the MSCOCO dataset with common entities show that both proprietary models like GPT-4o and Gemini-Pro significantly enhance their generation quality when their input prompts are augmented with relevant information retrieved by MM retrievers like UniIR models.
arXiv Detail & Related papers (2024-05-16T17:58:45Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Multithreaded parallelism for heterogeneous clusters of QPUs [0.0]
We present MILQ, a quantum unrelated parallel machines scheduler and cutter. It prioritizes the total execution time of a batch of circuits scheduled on multiple quantum devices. Our results show a total improvement of up to 26 % compared to a baseline approach.
arXiv Detail & Related papers (2023-11-29T09:54:04Z)
Support matrix machine: A review [0.0]
Support matrix machine (SMM) represents one of the emerging methodologies tailored for handling matrix input data. This article provides the first in-depth analysis of the development of the SMM model. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models.
arXiv Detail & Related papers (2023-10-30T16:46:23Z)
Least Squares Maximum and Weighted Generalization-Memorization Machines [14.139758779594667]
We propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM) The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed.
arXiv Detail & Related papers (2023-08-31T04:48:59Z)
Over-the-Air Split Machine Learning in Wireless MIMO Networks [56.27831295707334]
In split machine learning (ML), different partitions of a neural network (NN) are executed by different computing nodes. To ease communication burden, over-the-air computation (OAC) can efficiently implement all or part of the computation at the same time of communication.
arXiv Detail & Related papers (2022-10-07T15:39:11Z)
Compiling Turing Machines into Storage Modification Machines [0.0]
It is well known that Sch"onhage's Storage Modification Machines (SMM) can simulate Turing Machines (TM) We propose a simple transformation of TM into SMM, setting the base for a straightforward TM-to-SMM compiler.
arXiv Detail & Related papers (2021-09-28T10:38:05Z)
Counterfactual Explanations for Machine Learning on Multivariate Time Series Data [0.9274371635733836]
This paper proposes a novel explainability technique for providing counterfactual explanations for supervised machine learning frameworks. The proposed method outperforms state-of-the-art explainability methods on several different ML frameworks and data sets in metrics such as faithfulness and robustness.
arXiv Detail & Related papers (2020-08-25T02:04:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.