OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
- URL: http://arxiv.org/abs/2412.16849v1
- Date: Sun, 22 Dec 2024 04:21:30 GMT
- Title: OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
- Authors: Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Yuhang Wang, Jinlin Xiao, Jitao Sang,
- Abstract summary: OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model.<n>This technical report presents emphOpenRFT, our attempt to fine-tune generalist reasoning models for domain-specific tasks.<n>The evaluation is conducted on SciKnowEval, where OpenRFT achieves notable performance gains with only $100$ domain-specific samples for each task.
- Score: 21.931666005603553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: OpenAI's recent introduction of Reinforcement Fine-Tuning (RFT) showcases the potential of reasoning foundation model and offers a new paradigm for fine-tuning beyond simple pattern imitation. This technical report presents \emph{OpenRFT}, our attempt to fine-tune generalist reasoning models for domain-specific tasks under the same settings as RFT. OpenRFT addresses two key challenges of lacking reasoning step data and the limited quantity of training samples, by leveraging the domain-specific samples in three ways: question augmentation, synthesizing reasoning-process data, and few-shot ICL. The evaluation is conducted on SciKnowEval, where OpenRFT achieves notable performance gains with only $100$ domain-specific samples for each task. More experimental results will be updated continuously in later versions. Source codes, datasets, and models are disclosed at: https://github.com/ADaM-BJTU/OpenRFT
Related papers
- A Llama walks into the 'Bar': Efficient Supervised Fine-Tuning for Legal Reasoning in the Multi-state Bar Exam [38.71998082580061]
Legal reasoning tasks present unique challenges for large language models (LLMs) due to the complexity of domain-specific knowledge and reasoning processes.
This paper investigates how effectively smaller language models (Llama 2 7B and Llama 3 8B) can be fine-tuned with a limited dataset of 1,514 Multi-state Bar Examination (MBE) questions.
arXiv Detail & Related papers (2025-04-07T11:31:22Z) - START: Self-taught Reasoner with Tools [51.38785489790888]
We introduce START (Self-Taught Reasoner with Tools), a tool-integrated long Chain-of-thought (CoT) reasoning LLM.
START is capable of performing complex computations, self-checking, exploring diverse methods, and self-ging.
It significantly outperforms the base QwQ-32B and achieves performance comparable to the state-of-the-art open-weight model R1-Distill-Qwen-32B.
arXiv Detail & Related papers (2025-03-06T17:11:51Z) - Visual-RFT: Visual Reinforcement Fine-Tuning [75.20572976629646]
Reinforcement Fine-Tuning (RFT) in Large Reasoning Models like OpenAI o1 learns from feedback on its answers.
Visual-RFT further extends the application areas of RFT on visual tasks.
arXiv Detail & Related papers (2025-03-03T18:16:32Z) - SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning [3.4023074295549014]
Subtask-oriented Reinforced Fine-Tuning (SoRFT) is a novel training approach to enhance the issue resolving capability of LLMs.
We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite, achieving state-of-the-art (SOTA) performance among open-source models.
arXiv Detail & Related papers (2025-02-27T14:19:45Z) - OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models [61.14336781917986]
We introduce OpenR, an open-source framework for enhancing the reasoning capabilities of large language models (LLMs)
OpenR unifies data acquisition, reinforcement learning training, and non-autoregressive decoding into a cohesive software platform.
Our work is the first to provide an open-source framework that explores the core techniques of OpenAI's o1 model with reinforcement learning.
arXiv Detail & Related papers (2024-10-12T23:42:16Z) - FsPONER: Few-shot Prompt Optimization for Named Entity Recognition in Domain-specific Scenarios [0.5106912532044251]
We introduce FsPONER, a novel approach for optimizing few-shot prompts, and evaluate its performance on domain-specific NER datasets.
FsPONER consists of three few-shot selection methods based on random sampling, TF-IDF, and a combination of both.
In the considered real-world scenarios with data scarcity, FsPONER with TF-IDF surpasses fine-tuned models by approximately 10% in F1 score.
arXiv Detail & Related papers (2024-07-10T20:32:50Z) - Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models [0.8399688944263842]
Large Language Models (LLMs) have the capability to understand and generate human-like text from input queries.
This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines.
We evaluate the impact of fine-tuning on the LLMs' capacity for data extraction and contextual understanding.
arXiv Detail & Related papers (2024-06-17T04:35:17Z) - Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting [0.0]
We try to push the understanding of different fine-tuning strategies for large language models (LLMs)
We compare state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI.
Our findings suggest that these alternative strategies can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT.
arXiv Detail & Related papers (2024-05-21T20:08:52Z) - Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph [28.13334909565348]
In this paper, we unveil that simple domain-specific graph methods outperform the model, using the intrinsic dependencies within the patent data.
We propose a novel Fine-grained cLAim depeNdency (FLAN) Graph through meticulous patent data analyses.
arXiv Detail & Related papers (2024-04-22T17:22:31Z) - RAFT: Adapting Language Model to Domain Specific RAG [75.63623523051491]
We present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "openbook" in-domain settings.
RAFT accomplishes this by citing the verbatim right sequence from the relevant document that would help answer the question.
RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets.
arXiv Detail & Related papers (2024-03-15T09:26:02Z) - Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge [15.553942864736989]
Language Models (LMs) memorize a vast amount of factual knowledge, exhibiting strong performance across diverse tasks and domains.<n>The two prominent approaches to enhance the performance of LMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data.<n>This paper explores and evaluates the impact of RAG and FT on customizing LMs in handling low-frequency entities on question answering tasks.
arXiv Detail & Related papers (2024-03-03T08:07:55Z) - FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the
Power of Heterogeneous Clients [50.13097183691517]
In real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources.
We propose a novel federated tuning algorithm, FedRA.
In each communication round, FedRA randomly generates an allocation matrix.
It reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using adapters.
arXiv Detail & Related papers (2023-11-19T04:43:16Z) - MinPrompt: Graph-based Minimal Prompt Data Augmentation for Few-shot Question Answering [64.6741991162092]
We present MinPrompt, a minimal data augmentation framework for open-domain question answering.
We transform the raw text into a graph structure to build connections between different factual sentences.
We then apply graph algorithms to identify the minimal set of sentences needed to cover the most information in the raw text.
We generate QA pairs based on the identified sentence subset and train the model on the selected sentences to obtain the final model.
arXiv Detail & Related papers (2023-10-08T04:44:36Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.