Related papers: ARCO:Adaptive Multi-Agent Reinforcement Learning-Based Hardware/Software Co-Optimization Compiler for Improved Performance in DNN Accelerator Design

Related papers

Multi-Agent Reinforcement Learning for Sample-Efficient Deep Neural Network Mapping [54.65536245955678]
We present a decentralized multi-agent reinforcement learning (MARL) framework designed to overcome the challenge of sample inefficiency.<n>We introduce an agent clustering algorithm that assigns similar mapping parameters to the same agents based on correlation analysis.<n> Experimental results show our MARL approach improves sample efficiency by 30-300x over standard single-agent RL.
arXiv Detail & Related papers (2025-07-22T05:51:07Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development. We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents. We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration [8.43012094714496]
This paper presents a unified framework for codifying and automating optimization strategies to deploy deep neural networks (DNNs) on resource-constrained hardware. Our novel approach addresses two key issues: cross-stage co-optimization and optimization search. Experimental results demonstrate up to a 92% DSP and 89% LUT usage reduction for select networks.
arXiv Detail & Related papers (2025-02-09T11:02:06Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System [75.25394449773052]
Large Language Model (LLM) based multi-agent systems (MAS) show remarkable potential in collaborative problem-solving. Yet they still face critical challenges: low communication efficiency, poor scalability, and a lack of effective parameter-updating optimization methods. We present Optima, a novel framework that addresses these issues by significantly enhancing both communication efficiency and task effectiveness.
arXiv Detail & Related papers (2024-10-10T17:00:06Z)
Efficient Federated Learning Using Dynamic Update and Adaptive Pruning with Momentum on Shared Server Data [59.6985168241067]
Federated Learning (FL) encounters two important problems, i.e., low training efficiency and limited computational resources. We propose a new FL framework, FedDUMAP, to leverage the shared insensitive data on the server and the distributed data in edge devices. Our proposed FL model, FedDUMAP, combines the three original techniques and has a significantly better performance compared with baseline approaches.
arXiv Detail & Related papers (2024-08-11T02:59:11Z)
Combining Neural Architecture Search and Automatic Code Optimization: A Survey [0.8796261172196743]
Two notable techniques are Hardware-aware Neural Architecture Search (HW-NAS) and Automatic Code Optimization (ACO) HW-NAS automatically designs accurate yet hardware-friendly neural networks, while ACO involves searching for the best compiler optimizations to apply on neural networks. This survey explores recent works that combine these two techniques within a single framework.
arXiv Detail & Related papers (2024-08-07T22:40:05Z)
Multiobjective Vehicle Routing Optimization with Time Windows: A Hybrid Approach Using Deep Reinforcement Learning and NSGA-II [52.083337333478674]
This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW) The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL.
arXiv Detail & Related papers (2024-07-18T02:46:06Z)
Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization [81.88668100203913]
Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time.
arXiv Detail & Related papers (2024-06-17T16:10:10Z)
Towards Hyperparameter-Agnostic DNN Training via Dynamical System Insights [4.513581513983453]
We present a first-order optimization method specialized for deep neural networks (DNNs), ECCO-DNN. This method models the optimization variable trajectory as a dynamical system and develops a discretization algorithm that adaptively selects step sizes based on the trajectory's shape.
arXiv Detail & Related papers (2023-10-21T03:45:13Z)
Federated Multi-Level Optimization over Decentralized Networks [55.776919718214224]
We study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors. We propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale. Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications.
arXiv Detail & Related papers (2023-10-10T00:21:10Z)
Characterizing Speed Performance of Multi-Agent Reinforcement Learning [5.313762764969945]
Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc. Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation. We analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations.
arXiv Detail & Related papers (2023-09-13T17:26:36Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration [5.2487252195308844]
This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators. We introduce novel optimization and transformation tasks for building design-flow architectures. Our results demonstrate considerable reductions of up to 92% in DSP usage and 89% in LUT usage for two networks.
arXiv Detail & Related papers (2023-06-14T21:06:07Z)
VeLO: Training Versatile Learned Optimizers by Scaling Up [67.90237498659397]
We leverage the same scaling approach behind the success of deep learning to learn versatiles. We train an ingest for deep learning which is itself a small neural network that ingests and outputs parameter updates. We open source our learned, meta-training code, the associated train test data, and an extensive benchmark suite with baselines at velo-code.io.
arXiv Detail & Related papers (2022-11-17T18:39:07Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
Distributed Multi-agent Meta Learning for Trajectory Design in Wireless Drone Networks [151.27147513363502]
This paper studies the problem of the trajectory design for a group of energyconstrained drones operating in dynamic wireless network environments. A value based reinforcement learning (VDRL) solution and a metatraining mechanism is proposed.
arXiv Detail & Related papers (2020-12-06T01:30:12Z)
Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations [15.659251804042748]
Woodpecker-DL (WPK) is a hardware-aware deep learning framework. WPK uses graph optimization, automated searches, domain-specific language ( DSL) and system-level exploration to accelerate inference. We show that on a maximum P100 GPU, we can achieve the speedup of 5.40 over cuDNN and 1.63 over TVM on individual operators, and run up to 1.18 times faster than TeslaRT for end-to-end model inference.
arXiv Detail & Related papers (2020-08-11T07:50:34Z)
Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN) There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.