Related papers: Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning

Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning

URL: http://arxiv.org/abs/2512.01113v1
Date: Sun, 30 Nov 2025 22:19:55 GMT
Title: Efficiently Learning Branching Networks for Multitask Algorithmic Reasoning
Authors: Dongyue Li, Zhenshuo Zhang, Minxuan Duan, Edgar Dobriban, Hongyang R. Zhang,
Abstract summary: AutoBRANE is a principled architecture for multitask algorithmic reasoning.<n>It clusters tasks using gradient-based affinity scores and can be used on top of any base model.<n>It achieves a 28% accuracy gain over existing multitask and branching architectures.
Score: 28.332657856837788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Algorithmic reasoning -- the ability to perform step-by-step logical inference -- has become a core benchmark for evaluating reasoning in graph neural networks (GNNs) and large language models (LLMs). Ideally, one would like to design a single model capable of performing well on multiple algorithmic reasoning tasks simultaneously. However, this is challenging when the execution steps of algorithms differ from one another, causing negative interference when they are trained together. We propose branching neural networks, a principled architecture for multitask algorithmic reasoning. Searching for the optimal $k$-ary tree with $L$ layers over $n$ algorithmic tasks is combinatorial, requiring exploration of up to $k^{nL}$ possible structures. We develop AutoBRANE, an efficient algorithm that reduces this search to $O(nL)$ time by solving a convex relaxation at each layer to approximate an optimal task partition. The method clusters tasks using gradient-based affinity scores and can be used on top of any base model, including GNNs and LLMs. We validate AutoBRANE on a broad suite of graph-algorithmic and text-based reasoning benchmarks. We show that gradient features estimate true task performance within 5% error across four GNNs and four LLMs (up to 34B parameters). On the CLRS benchmark, it outperforms the strongest single multitask GNN by 3.7% and the best baseline by 1.2%, while reducing runtime by 48% and memory usage by 26%. The learned branching structures reveal an intuitively reasonable hierarchical clustering of related algorithms. On three text-based graph reasoning benchmarks, AutoBRANE improves over the best non-branching multitask baseline by 3.2%. Finally, on a large graph dataset with 21M edges and 500 tasks, AutoBRANE achieves a 28% accuracy gain over existing multitask and branching architectures, along with a 4.5$\times$ reduction in runtime.

Related papers

DeepPrune: Parallel Scaling without Inter-trace Redundancy [53.62015294143274]
Over 80% of parallel reasoning traces yield identical final answers, representing substantial wasted computation.<n>We propose DeepPrune, a novel framework that enables efficient parallel scaling through dynamic pruning.<n>Our work establishes a new standard for efficient parallel reasoning, making high-performance reasoning more efficient.
arXiv Detail & Related papers (2025-10-09T17:24:54Z)
Log-Time K-Means Clustering for 1D Data: Novel Approaches with Proof and Implementation [0.0]
This thesis bridges theory and practice for 1D $k$-means clustering, delivering efficient and sound algorithms.<n> Benchmarks demonstrate over a 4500x speedup compared to scikit-learn for large datasets.
arXiv Detail & Related papers (2024-12-19T09:03:39Z)
On Statistical Learning of Branch and Bound for Vehicle Routing Optimization [3.6922704509753084]
We train neural networks to emulate the decision-making process of the computationally expensive Strong Branching strategy. We find that this approach can match or improve upon the performance of the branch and bound algorithm.
arXiv Detail & Related papers (2023-10-15T23:59:57Z)
Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques. Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks. We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z)
Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds [0.23633885460047763]
We study parallel algorithms for the correlation clustering problem. The goal is to partition the entities into clusters to minimize disagreements with labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. We propose the first poly-logarithmic algorithm that achieves a better approximation ratio than 3.
arXiv Detail & Related papers (2023-07-13T12:32:49Z)
On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure [77.60508571062958]
We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor) We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as $H(Glog(delta_G)+ Xlog(delta_H))$, with $X,G,H$ being, respectively, the number of tasks, representations and predictors.
arXiv Detail & Related papers (2022-11-28T08:40:12Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
DOGE-Train: Discrete Optimization on GPU with End-to-end Training [28.795080637690095]
We present a fast, scalable, data-driven approach for solving relaxations of 0-1 integer linear programs. We use a combination of graph neural networks (GNN) and the Lagrange decomposition based algorithm FastDOG.
arXiv Detail & Related papers (2022-05-23T21:09:41Z)
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search [96.31315520244605]
Arch-Graph is a transferable NAS method that predicts task-specific optimal architectures. We show Arch-Graph's transferability and high sample efficiency across numerous tasks. It is able to find top 0.16% and 0.29% architectures on average on two search spaces under the budget of only 50 models.
arXiv Detail & Related papers (2022-04-12T16:46:06Z)
Neural network relief: a pruning algorithm based on neural activity [47.57448823030151]
We propose a simple importance-score metric that deactivates unimportant connections. We achieve comparable performance for LeNet architectures on MNIST. The algorithm is not designed to minimize FLOPs when considering current hardware and software implementations.
arXiv Detail & Related papers (2021-09-22T15:33:49Z)
An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment [9.207173776826403]
We empirically compare 11 off-policy prediction learning algorithms with linear function approximation on two small tasks. The algorithms' performance is highly affected by the variance induced by the importance sampling ratios. Emphatic TD($lambda$) tends to have lower error than other algorithms, but might learn more slowly in some cases.
arXiv Detail & Related papers (2021-09-10T21:15:41Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.