Mockingbird: How does LLM perform in general machine learning tasks?
- URL: http://arxiv.org/abs/2508.04279v1
- Date: Wed, 06 Aug 2025 10:08:47 GMT
- Title: Mockingbird: How does LLM perform in general machine learning tasks?
- Authors: Haoyu Jia, Yoshiki Obinata, Kento Kawaharazuka, Kei Okada,
- Abstract summary: Large language models (LLMs) are now being used with increasing frequency as chat bots.<n>This work is conducted out of the curiosity about such potential.
- Score: 10.621603415982523
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are now being used with increasing frequency as chat bots, tasked with the summarizing information or generating text and code in accordance with user instructions. The rapid increase in reasoning capabilities and inference speed of LLMs has revealed their remarkable potential for applications extending beyond the domain of chat bots to general machine learning tasks. This work is conducted out of the curiosity about such potential. In this work, we propose a framework Mockingbird to adapt LLMs to general machine learning tasks and evaluate its performance and scalability on several general machine learning tasks. The core concept of this framework is instructing LLMs to role-play functions and reflect on its mistakes to improve itself. Our evaluation and analysis result shows that LLM-driven machine learning methods, such as Mockingbird, can achieve acceptable results on common machine learning tasks; however, solely reflecting on its own currently cannot outperform the effect of domain-specific documents and feedback from human experts.
Related papers
- Enhancing LLM's Ability to Generate More Repository-Aware Unit Tests Through Precise Contextual Information Injection [4.367526927436771]
Large Language Models (LLMs) guided by prompt engineering have gained attention for their ability to handle a broad range of tasks.<n>LLMs may exhibit hallucinations when generating unit tests for focal methods or functions due to their lack of awareness regarding the project's global context.<n>We propose RATester, which enhances the LLM's ability to generate more repository-aware unit tests.
arXiv Detail & Related papers (2025-01-13T15:43:36Z) - Scoring with Large Language Models: A Study on Measuring Empathy of Responses in Dialogues [3.2162648244439684]
We develop a framework for investigating how effective Large Language Models are at measuring and scoring empathy of responses in dialogues.<n>Our strategy is to approximate the performance of state-of-the-art and fine-tuned LLMs with explicit and explainable features.<n>Our results show that when only using embeddings, it is possible to achieve performance close to that of generic LLMs.
arXiv Detail & Related papers (2024-12-28T20:37:57Z) - Learning to Ask: When LLM Agents Meet Unclear Instruction [55.65312637965779]
Large language models (LLMs) can leverage external tools for addressing a range of tasks unattainable through language skills alone.<n>We evaluate the performance of LLMs tool-use under imperfect instructions, analyze the error patterns, and build a challenging tool-use benchmark called Noisy ToolBench.<n>We propose a novel framework, Ask-when-Needed (AwN), which prompts LLMs to ask questions to users whenever they encounter obstacles due to unclear instructions.
arXiv Detail & Related papers (2024-08-31T23:06:12Z) - Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents [56.822238860147024]
Augmenting large language models with external tools has emerged as a promising approach to extend their utility.<n>Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning.<n>We propose AutoTools, a framework that enables LLMs to automate the tool-use workflow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - "Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students [2.6043678412433713]
This study evaluates the effectiveness of large language models (LLMs) in performing tasks common among undergraduate computer science students.
Our research systematically assesses some of the publicly available LLMs such as Google Bard, ChatGPT(3.5), GitHub Copilot Chat, and Microsoft Copilot Chat.
arXiv Detail & Related papers (2024-01-22T15:11:36Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks.
Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z) - Automated Assessment of Students' Code Comprehension using LLMs [0.3293989832773954]
Large Language Models (LLMs) and encoder-based Semantic Textual Similarity (STS) models are assessed.
Our findings indicate that LLMs, when prompted in few-shot and chain-of-thought setting, perform comparable to fine-tuned encoder-based models in evaluating students' short answers in programming domain.
arXiv Detail & Related papers (2023-12-19T20:39:12Z) - Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners.
We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting.
Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.