Related papers: Leveraging Large Language Models for Solving Rare MIP Challenges

Leveraging Large Language Models for Solving Rare MIP Challenges

URL: http://arxiv.org/abs/2409.04464v2
Date: Wed, 18 Sep 2024 07:43:12 GMT
Title: Leveraging Large Language Models for Solving Rare MIP Challenges
Authors: Teng Wang, Wing-Yin Yu, Ruifeng She, Wenhan Yang, Taijie Chen, Jianping Zhang,
Abstract summary: Mixed Programming (MIP) has been extensively applied in areas requiring mathematical solvers to address complex instances within tight time constraints. The model-building cost for end-to-end models, such as large language models (LLMs), remains largely unaffected by problem scale due to their pattern recognition capabilities.
Score: 35.38992171089948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mixed Integer Programming (MIP) has been extensively applied in areas requiring mathematical solvers to address complex instances within tight time constraints. However, as the problem scale increases, the complexity of model formulation and finding feasible solutions escalates significantly. In contrast, the model-building cost for end-to-end models, such as large language models (LLMs), remains largely unaffected by problem scale due to their pattern recognition capabilities. While LLMs, like GPT-4, without fine-tuning, can handle some traditional medium-scale MIP problems, they struggle with uncommon or highly specialized MIP scenarios. Fine-tuning LLMs can yield some feasible solutions for medium-scale MIP instances, but these models typically fail to explore diverse solutions when constrained by a low and constant temperature, limiting their performance. In this paper, we propose and evaluate a recursively dynamic temperature method integrated with a chain-of-thought approach. Our findings show that starting with a high temperature and gradually lowering it leads to better feasible solutions compared to other dynamic temperature strategies. Additionally, by comparing results generated by the LLM with those from Gurobi, we demonstrate that the LLM can produce solutions that complement traditional solvers by accelerating the pruning process and improving overall efficiency.

Related papers

Multi-Objective Infeasibility Diagnosis for Routing Problems Using Large Language Models [8.538624566791189]
In real-world routing problems, users propose conflicting or unreasonable requirements, leading to an empty feasible solution set.<n>Existing Large Language Model (LLM)-based methods attempt to diagnose infeasible models.<n>We introduce Multi-Objective Infeasibility Diagnosis (MOID), which combines LLM agents and multi-objective optimization within an automatic routing solver.
arXiv Detail & Related papers (2025-08-05T12:53:20Z)
Advancing Multi-Step Mathematical Reasoning in Large Language Models through Multi-Layered Self-Reflection with Auto-Prompting [0.0]
We propose a novel approach to enhance multi-step mathematical reasoning in Large Language Models (LLMs)<n>The Multi-Layered Self-Reflection with Auto-Prompting (MAPS) framework integrates techniques such as Chain of Thought (CoT), Self-Reflection, and Auto-Prompting.<n>Experiments show that MAPS significantly outperforms standard CoT and achieves competitive results with reasoning-optimized models.
arXiv Detail & Related papers (2025-06-30T14:18:35Z)
Mixture of Small and Large Models for Chinese Spelling Check [10.634101727583127]
In the era of large language models (LLMs), the Chinese Spelling Check (CSC) task has seen various LLM methods developed.<n>Fine-tuned BERT-based models, relying on high-quality in-domain data, show excellent performance but suffer from edit pattern overfitting.<n>This paper proposes a novel dynamic mixture approach that effectively combines the probability distributions of small models and LLMs during the beam search decoding phase.
arXiv Detail & Related papers (2025-06-07T18:29:10Z)
PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z)
Large Language Models as Particle Swarm Optimizers [0.0]
In LMPSO, the velocity of each particle is represented as a prompt that generates the next candidate solution. The proposed LMPSO approach is evaluated across multiple problem domains, including the Traveling Salesman Problem (TSP) Experimental results demonstrate that LMPSO is particularly effective for solving problems where solutions are represented as structured sequences.
arXiv Detail & Related papers (2025-04-12T15:04:13Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Rational Tuning of LLM Cascades via Probabilistic Modeling [0.9208007322096532]
We present a probabilistic model for the joint performance distribution of a sequence of large language models (LLMs) Compared to selecting confidence thresholds using grid search, our model significantly improves runtime scaling with respect to the length of the cascade and the desired resolution of the cost-error curve.
arXiv Detail & Related papers (2025-01-16T07:58:33Z)
Fast and Interpretable Mixed-Integer Linear Program Solving by Learning Model Reduction [24.3088703166792]
This paper aims to learn a reduced and equivalent model of the original MILP as an intermediate step. The reduced model often corresponds to interpretable operations and is much simpler, enabling us to solve large-scale MILP problems much faster than existing commercial solvers. We introduce an attention mechanism to capture and represent preference information, which helps improve the performance of model reduction learning tasks.
arXiv Detail & Related papers (2024-12-31T06:50:42Z)
Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning [14.857842644246634]
This paper introduces Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT) SG focuses on problem understanding and decomposition at the semantic and logical levels, rather than specific computations. SGFT can fine-tune a SLM to produce accurate problem-solving guidances, which can be flexibly fed to any SLM as prompts.
arXiv Detail & Related papers (2024-12-13T06:45:26Z)
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem [71.3332971315821]
We present a "line theoremarity" establishing a direct relationship between the layer-wise $ell$ reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels.
arXiv Detail & Related papers (2024-11-26T15:35:44Z)
Solving General Natural-Language-Description Optimization Problems with Large Language Models [34.50671063271608]
We propose a novel framework called OptLLM that augments LLMs with external solvers. OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results. Some features of OptLLM framework have been available for trial since June 2023.
arXiv Detail & Related papers (2024-07-09T07:11:10Z)
Delta-CoMe: Training-Free Delta-Compression with Mixed-Precision for Large Language Models [79.46938238953916]
Fine-tuning large language models (LLMs) to diverse applications is crucial to meet complex demands. Recent studies suggest decomposing a fine-tuned LLM into a base model and corresponding delta weights, which are then compressed using low-rank or low-bit approaches to reduce costs. In this work, we observe that existing low-rank and low-bit compression methods can significantly harm the model performance for task-specific fine-tuned LLMs.
arXiv Detail & Related papers (2024-06-13T07:57:27Z)
SparseLLM: Towards Global Pruning for Pre-trained Language Models [12.057369029549534]
We propose SparseLLM, a novel framework that redefines the global pruning process into manageable, coordinated subproblems. SparseLLM's approach conceptualizes LLMs as a chain of modular functions and leverages auxiliary variables for problem decomposition. It demonstrates significant performance improvements, particularly in high-sparsity regimes.
arXiv Detail & Related papers (2024-02-28T00:09:07Z)
Deep learning enhanced mixed integer optimization: Learning to reduce model dimensionality [0.0]
This work introduces a framework to address the computational complexity inherent in Mixed-Integer Programming. By employing deep learning, we construct problem-specific models that identify and exploit common structures across MIP instances. We present an algorithm for generating synthetic data enhancing the robustness and generalizability of our models.
arXiv Detail & Related papers (2024-01-17T19:15:13Z)
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains. In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z)
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models [70.45441031021291]
Large Vision-Language Models (LVLMs) can understand the world comprehensively by integrating rich information from different modalities. LVLMs are often problematic due to their massive computational/energy costs and carbon consumption. We propose Efficient Coarse-to-Fine LayerWise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs.
arXiv Detail & Related papers (2023-10-04T17:34:00Z)
Optimization and Optimizers for Adversarial Robustness [10.279287131070157]
In this paper, we introduce a novel framework that blends a general-purpose constrained-optimization solver with Constraint Folding. Regarding reliability, PWCF provides solutions with stationarity measures and feasibility tests to assess the solution quality. We further explore the distinct patterns in the solutions found for solving these problems using various combinations of losses, perturbation models, and optimization algorithms.
arXiv Detail & Related papers (2023-03-23T16:22:59Z)
Minimizing Entropy to Discover Good Solutions to Recurrent Mixed Integer Programs [0.0]
Current solvers for mixed-integer programming (MIP) problems are designed to perform well on a wide range of problems. Recent works have shown that machine learning (ML) can be integrated with an MIP solver to inject domain knowledge and efficiently close the optimality gap. This paper proposes an online solver that uses the notion of entropy to efficiently build a model with minimal training data and tuning.
arXiv Detail & Related papers (2022-02-07T18:52:56Z)
Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation. The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.