Related papers: To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA

To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA

URL: http://arxiv.org/abs/2508.09146v4
Date: Wed, 10 Sep 2025 23:13:54 GMT
Title: To Theoretically Understand Transformer-Based In-Context Learning for Optimizing CSMA
Authors: Shugang Hao, Hongbo Li, Lingjie Duan,
Abstract summary: The binary exponential backoff scheme is widely used in WiFi 7 and still incurs poor throughput performance under dynamic channel environments.<n>Recent model-based approaches simply optimize backoff strategies under a known and fixed node density.<n>This paper is the first to propose transformer-based in-context learning (ICL) theory for optimizing channel access.
Score: 26.87533852488578
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The binary exponential backoff scheme is widely used in WiFi 7 and still incurs poor throughput performance under dynamic channel environments. Recent model-based approaches (e.g., non-persistent and $p$-persistent CSMA) simply optimize backoff strategies under a known and fixed node density, still leading to a large throughput loss due to inaccurate node density estimation. This paper is the first to propose LLM transformer-based in-context learning (ICL) theory for optimizing channel access. We design a transformer-based ICL optimizer to pre-collect collision-threshold data examples and a query collision case. They are constructed as a prompt as the input for the transformer to learn the pattern, which then generates a predicted contention window threshold (CWT). To train the transformer for effective ICL, we develop an efficient algorithm and guarantee a near-optimal CWT prediction within limited training steps. As it may be hard to gather perfect data examples for ICL in practice, we further extend to allow erroneous data input in the prompt. We prove that our optimizer maintains minimal prediction and throughput deviations from the optimal values. Experimental results on NS-3 further demonstrate our approach's fast convergence and near-optimal throughput over existing model-based and DRL-based approaches under unknown node densities.

Related papers

LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model [14.53308613746613]
This paper proposes a log-domain attention prediction algorithm-architecture co-design, named LAPA.<n>Results show that LAPA achieves 3.52x, 3.24x and 2.79x higher energy efficiency than the state-of-the-art (SOTA) works Spatten, Sanger and FACT.
arXiv Detail & Related papers (2025-11-26T07:24:48Z)
NeuCLIP: Efficient Large-Scale CLIP Training with Neural Normalizer Optimization [42.298647858844895]
Accurately estimating the normalization term in the contrastive loss is a central challenge for Contrastive Language-Image Pre-training models.<n>We propose NeuCLIP, a novel and elegant optimization framework based on two key ideas.<n>Experiments on large-scale CLIP training, spanning datasets from millions to billions of samples, demonstrate that NeuCLIP outperforms previous methods.
arXiv Detail & Related papers (2025-11-11T16:27:51Z)
Towards Optimal Multi-draft Speculative Decoding [102.67837141152232]
Multi-Draft Speculative Decoding (MDSD) is a recent approach where, when generating each token, a small draft model generates multiple drafts.<n>This paper discusses the dual of the optimal transport problem, providing a way to efficiently compute the optimal acceptance rate.
arXiv Detail & Related papers (2025-02-26T03:22:44Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.<n>We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning that learns the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.<n>We achieve this by learning an underlying Bernoulli distribution to sample binary pruning masks.<n>Experiments conducted on LLaMA, LLaMA-2, LLaMA-3, Vicuna, and Mistral models demonstrate the promising performance of our method in efficiency and effectiveness.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
A Meta-Learning Based Precoder Optimization Framework for Rate-Splitting Multiple Access [53.191806757701215]
We propose the use of a meta-learning based precoder optimization framework to directly optimize the Rate-Splitting Multiple Access (RSMA) precoders with partial Channel State Information at the Transmitter (CSIT) By exploiting the overfitting of the compact neural network to maximize the explicit Average Sum-Rate (ASR) expression, we effectively bypass the need for any other training data while minimizing the total running time. Numerical results reveal that the meta-learning based solution achieves similar ASR performance to conventional precoder optimization in medium-scale scenarios, and significantly outperforms sub-optimal low complexity precoder algorithms in the large-scale
arXiv Detail & Related papers (2023-07-17T20:31:41Z)
Sample-Efficient and Surrogate-Based Design Optimization of Underwater Vehicle Hulls [0.4543820534430522]
We show that theBO-LCB algorithm is the most sample-efficient optimization framework and has the best convergence behavior of those considered. We also show that our DNN-based surrogate model predicts drag force on test data in tight agreement with CFD simulations, with a mean absolute percentage error (MAPE) of 1.85%. We demonstrate a two-orders-of-magnitude speedup for the design optimization process when the surrogate model is used.
arXiv Detail & Related papers (2023-04-24T19:52:42Z)
Feasible Low-thrust Trajectory Identification via a Deep Neural Network Classifier [1.5076964620370268]
This work proposes a deep neural network (DNN) to accurately identify feasible low thrust transfer prior to the optimization process. The DNN-classifier achieves an overall accuracy of 97.9%, which has the best performance among the tested algorithms.
arXiv Detail & Related papers (2022-02-10T11:34:37Z)
Latency Adjustable Transformer Encoder for Language Understanding [0.8287206589886879]
This paper proposes an efficient Transformer architecture that adjusts the inference computational cost adaptively with a desired inference latency speedup. The proposed method detects less important hidden sequence elements (word-vectors) and eliminates them in each encoder layer using a proposed Attention Context Contribution (ACC) metric. The proposed method mathematically and experimentally improves the inference latency of BERT_base and GPT-2 by up to 4.8 and 3.72 times with less than 0.75% accuracy drop and passable perplexity on average.
arXiv Detail & Related papers (2022-01-10T13:04:39Z)
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC) We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer. Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z)
Exact Optimization of Conformal Predictors via Incremental and Decremental Learning [46.9970555048259]
Conformal Predictors (CP) are wrappers around ML methods, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. We show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental&decremental learning.
arXiv Detail & Related papers (2021-02-05T15:31:37Z)
BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods to Deep Binary Model [34.093978443640616]
Recent methods have significantly reduced the performance of Binary Neural Networks (BNNs) guaranteeing the effective and efficient training of BNNs is an unsolved problem. We propose a BAMSProd algorithm with a key observation that the convergence property of optimizing deep binary model is strongly related to the quantization errors.
arXiv Detail & Related papers (2020-09-29T06:12:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.