Related papers: Large Language Model Agent as a Mechanical Designer

Related papers

Enhancing Large Language Models through Structured Reasoning [15.472375478049823]
We introduce a novel approach to enhance Large Language Models (LLMs) through explicit structured reasoning.<n>First, we convert unstructured data into structured formats by explicitly annotating reasoning steps.<n>We then employ this structured dataset to train LLMs through Supervised Fine-Tuning (SFT)
arXiv Detail & Related papers (2025-06-25T08:36:12Z)
Leveraging Importance Sampling to Detach Alignment Modules from Large Language Models [50.19188692497892]
Traditional alignment methods often require retraining large pretrained models.<n>We propose a novel textitResidual Alignment Model (textitRAM) that formalizes the alignment process as a type of importance sampling.<n>We develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods.
arXiv Detail & Related papers (2025-05-26T08:53:02Z)
CALM: Co-evolution of Algorithms and Language Model for Automatic Heuristic Design [11.639825726501659]
Large language models (LLMs) can autonomously discover high-performings at a fraction of the traditional cost.<n>We propose a hybrid framework that combines verbal and numerical guidance.<n>Our method outperforms state-of-the-art (SOTA) baselines across various optimization tasks.
arXiv Detail & Related papers (2025-05-18T07:48:47Z)
A Domain Adaptation of Large Language Models for Classifying Mechanical Assembly Components [0.9134277125744795]
Functional modeling enables designers to reason about product functions before specific structural details are determined.<n>The effectiveness of function-based design is often hindered by the lack of well-structured and comprehensive functional data.<n>This study proposes a novel LLM-based domain adaptation (DA) framework using fine-tuning for the automated classification of mechanical assembly parts' functions.
arXiv Detail & Related papers (2025-05-02T23:32:50Z)
LLM-Guided Evolution: An Autonomous Model Optimization for Object Detection [0.0]
In machine learning, Neural Architecture Search (NAS) requires domain knowledge of model design and a large amount of trial-and-error to achieve promising performance. The Large Language Model (LLM)-Guided Evolution (GE) framework transformed this approach by incorporating LLMs to directly modify model source code for image classification algorithms on CIFAR data. We show that LLM-GE produced variants with significant performance improvements, such as an increase in Mean Average Precision from 92.5% to 94.5%.
arXiv Detail & Related papers (2025-04-03T05:06:06Z)
Efficient Model Selection for Time Series Forecasting via LLMs [52.31535714387368]
We propose to leverage Large Language Models (LLMs) as a lightweight alternative for model selection. Our method eliminates the need for explicit performance matrices by utilizing the inherent knowledge and reasoning capabilities of LLMs.
arXiv Detail & Related papers (2025-04-02T20:33:27Z)
Beyond Standard MoE: Mixture of Latent Experts for Resource-Efficient Language Models [10.623996218106564]
We introduce a novel parameterization methodology that facilitates the mapping of specific experts into a shared latent space. All expert operations are systematically decomposed into two principal components: a shared projection into a lower-dimensional latent space, followed by expert-specific transformations. This factorized approach substantially diminishes parameter count and computational requirements.
arXiv Detail & Related papers (2025-03-29T14:35:34Z)
IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Agents [17.301758094000125]
Large language model (LLM) agents have emerged as a promising solution to automate the development of computer vision models. We introduce Iterative Refinement, a novel strategy for LLM-driven ML pipeline design. Iterative Refinement improves stability, interpretability, and overall model performance.
arXiv Detail & Related papers (2025-02-25T01:52:37Z)
Scalable Language Models with Posterior Inference of Latent Thought Vectors [52.63299874322121]
Latent-Thought Language Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space. LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space. LTMs significantly outperform conventional autoregressive models and discrete diffusion models in validation perplexity and zero-shot language modeling.
arXiv Detail & Related papers (2025-02-03T17:50:34Z)
A Layered Architecture for Developing and Enhancing Capabilities in Large Language Model-based Software Systems [18.615283725693494]
This paper introduces a layered architecture that organizes Large Language Models (LLMs) software system development into distinct layers. By aligning capabilities with these layers, the framework encourages the systematic implementation of capabilities in effective and efficient ways.
arXiv Detail & Related papers (2024-11-19T09:18:20Z)
On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly. In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z)
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. Our research explores task-specific model pruning to inform decisions about designing SMoE architectures. We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z)
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities [0.35998666903987897]
This report examines the fine-tuning of Large Language Models (LLMs) It outlines the historical evolution of LLMs from traditional Natural Language Processing (NLP) models to their pivotal role in AI. The report introduces a structured seven-stage pipeline for fine-tuning LLMs.
arXiv Detail & Related papers (2024-08-23T14:48:02Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts [49.950419707905944]
We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts. Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data. Our findings highlight the critical role of modularity, the applicability of Self-MoE to multiple base LLMs, and the potential of self-improvement in achieving efficient, scalable, and adaptable systems.
arXiv Detail & Related papers (2024-06-17T19:06:54Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling [15.673219028826173]
We introduce a semi-automated data synthesis framework designed for optimization modeling issues, named OR-Instruct. We train various open-source LLMs with a capacity of 7 billion parameters (dubbed ORLMs) The resulting model demonstrates significantly enhanced optimization modeling capabilities, achieving state-of-the-art performance across the NL4OPT, MAMO, and IndustryOR benchmarks.
arXiv Detail & Related papers (2024-05-28T01:55:35Z)
CourseGPT-zh: an Educational Large Language Model Based on Knowledge Distillation Incorporating Prompt Optimization [22.080563239179618]
Large language models (LLMs) have demonstrated astonishing capabilities in natural language processing (NLP) tasks. We propose CourseGPT-zh, a course-oriented education LLM that supports customization and low-cost deployment.
arXiv Detail & Related papers (2024-05-08T03:11:12Z)
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models [0.8133739801185272]
The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) We propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios.
arXiv Detail & Related papers (2024-05-01T09:10:27Z)
Towards Modeling Learner Performance with Large Language Models [7.002923425715133]
This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing. We compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing. While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches.
arXiv Detail & Related papers (2024-02-29T14:06:34Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
PerfRL: A Small Language Model Framework for Efficient Code Optimization [14.18092813639534]
In this paper, we introduce PerfRL, an innovative framework designed to tackle the problem of code optimization. Our framework leverages the capabilities of small language models (SLMs) and reinforcement learning (RL) Our approach achieves similar or better results compared to state-of-the-art models using shorter training times and smaller pre-trained models.
arXiv Detail & Related papers (2023-12-09T19:50:23Z)
Knowledge Plugins: Enhancing Large Language Models for Domain-Specific Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE. This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z)
End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures. We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z)
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models [125.91897197446379]
We find that MoE models benefit more from instruction tuning than dense models. Our most powerful model, FLAN-MOE-32B, surpasses the performance of FLAN-PALM-62B on four benchmark tasks.
arXiv Detail & Related papers (2023-05-24T04:22:26Z)
Optimization-Inspired Learning with Architecture Augmentations and Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles. We construct three propagative modules to effectively solve the optimization models with flexible combinations. Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.