Related papers: Conversation Forests: The Key to Fine Tuning Large Language Models for Multi-Turn Medical Conversations is Branching

Conversation Forests: The Key to Fine Tuning Large Language Models for Multi-Turn Medical Conversations is Branching

URL: http://arxiv.org/abs/2507.04099v2
Date: Tue, 15 Jul 2025 16:49:25 GMT
Title: Conversation Forests: The Key to Fine Tuning Large Language Models for Multi-Turn Medical Conversations is Branching
Authors: Thomas Savage,
Abstract summary: In medicine, a multi-turn perspective is critical for learning diagnostic schemas and better understanding conversation dynamics.<n>I introduce Savage Conversation Forests (SCF), a reinforcement learning framework that leverages a branched conversation architecture to fine-tune LLMs for multi-turn dialogue.<n>SCF generates multiple possible conversation continuations at each turn, enabling the model to learn how different early responses affect downstream interactions and diagnostic outcomes.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning methods such as Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO) have demonstrated success in training large language models (LLMs) for single-turn tasks. However, these methods fall short in multi-turn applications, such as diagnostic patient interviewing, where understanding how early conversational turns influence downstream completions and outcomes is essential. In medicine, a multi-turn perspective is critical for learning diagnostic schemas and better understanding conversation dynamics. To address this gap, I introduce Savage Conversation Forests (SCF), a reinforcement learning framework that leverages a branched conversation architecture to fine-tune LLMs for multi-turn dialogue. SCF generates multiple possible conversation continuations at each turn, enabling the model to learn how different early responses affect downstream interactions and diagnostic outcomes. In experiments simulating doctor-patient conversations, SCF with branching outperforms linear conversation architectures on diagnostic accuracy. I hypothesize that SCF's improvements stem from its ability to provide richer, interdependent training signals across conversation turns. These results suggest that a branched training architecture is an important strategy for fine tuning LLMs in complex multi-turn conversational tasks.

Related papers

DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue [5.0037050098387805]
Large language models (LLMs) have demonstrated excellent capabilities in the field of biomedical question answering, but their application in real-world clinical consultations still faces core challenges.<n>We propose DoctorAgent-RL, a reinforcement learning (RL)-based multi-agent collaborative framework that models medical consultations as a dynamic decision-making process under uncertainty.<n>Experiments demonstrate that DoctorAgent-RL outperforms existing models in both multi-turn reasoning capability and final diagnostic performance.
arXiv Detail & Related papers (2025-05-26T07:48:14Z)
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models [79.90523648823522]
Multi-stage continual learning can lead to catastrophic forgetting.<n>This paper evaluates three mitigation strategies-model merging, discounting the LoRA scaling factor, and experience replay.<n>Results show that experience replay is the most effective, with further gains achieved by combining it with other methods.
arXiv Detail & Related papers (2025-05-23T05:50:14Z)
Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations [74.83732294523402]
We introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards.<n>We also explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes.<n>Experiments show that dialogue-tuned models outperform traditional methods, with improvements of $9.64%$ in multi-round reasoning scenarios and $6.18%$ in accuracy in a noisy environment.
arXiv Detail & Related papers (2025-01-29T18:58:48Z)
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions. VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z)
Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts [4.403408362362806]
We introduce the Chain-of-Interaction prompting method to contextualize large language models for psychiatric decision support by the dyadic interactions. This approach enables large language models to leverage the coding scheme, patient state, and domain knowledge for patient behavioral coding.
arXiv Detail & Related papers (2024-03-20T17:47:49Z)
Integrating Physician Diagnostic Logic into Large Language Models: Preference Learning from Process Feedback [19.564416963801268]
We propose an approach called preference learning from process feedback. PLPF integrates the doctor's diagnostic logic into LLMs. We show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%.
arXiv Detail & Related papers (2024-01-11T06:42:45Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations [70.7884839812069]
Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of language generation are interactive, where an agent has to talk to a person to reach a desired outcome. In this work, we explore a new method for adapting LLMs with RL for such goal-directed dialogue.
arXiv Detail & Related papers (2023-11-09T18:45:16Z)
Emotion Recognition in Conversation using Probabilistic Soft Logic [17.62924003652853]
emotion recognition in conversation (ERC) is a sub-field of emotion recognition that focuses on conversations that contain two or more utterances. We implement our approach in a framework called Probabilistic Soft Logic (PSL), a declarative templating language. PSL provides functionality for the incorporation of results from neural models into PSL models. We compare our method with state-of-the-art purely neural ERC systems, and see almost a 20% improvement.
arXiv Detail & Related papers (2022-07-14T23:59:06Z)
Learning to Ask Conversational Questions by Optimizing Levenshtein Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions. RISE is able to pay attention to tokens that are related to conversational characteristics. Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.