Related papers: Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain

Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain

URL: http://arxiv.org/abs/2309.04198v3
Date: Fri, 23 Feb 2024 11:58:10 GMT
Title: Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain
Authors: Yanrui Du, Sendong Zhao, Muzhen Cai, Ming Ma, Danyang Zhao, Jiawei Cao, Bing Qin
Abstract summary: We study how the dual logic ability of LLMs is affected during the privatization process in the medical domain. Our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also improves their accuracy.
Score: 19.46334739319516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both positive and negative statements about the same fact. Our study focuses on how the dual logic ability of LLMs is affected during the privatization process in the medical domain. We conduct several experiments to analyze the dual logic ability of LLMs by examining the consistency of the stance in responses to paired questions about the same fact. In our experiments, interestingly, we observed a significant decrease in the dual logic ability of existing LLMs after privatization. Besides, our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also further improves their accuracy. These findings underscore the importance of prioritizing LLMs' dual logic ability during the privatization process. Our study establishes a benchmark for future research aimed at exploring LLMs' dual logic ability during the privatization process and offers valuable guidance for privatization efforts in real-world applications.

Related papers

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs [63.580867975515474]
We present the first systematic investigation comparing the long-context performance of diffusion LLMs and traditional auto-regressive LLMs.<n>We propose LongLLaDA, a training-free method that integrates LLaDA with the NTK-based RoPE extrapolation.
arXiv Detail & Related papers (2025-06-17T11:45:37Z)
LLM Cannot Discover Causality, and Should Be Restricted to Non-Decisional Support in Causal Discovery [30.24849564413826]
We demonstrate that LLMs' autoregressive, correlation-driven modeling inherently lacks the theoretical grounding for causal reasoning.<n>We show that deliberate prompt engineering could overstate their performance, helping to explain the consistently favorable results reported in much of the current literature.<n>We conclude with a call for the community to shift focus from naively applying LLMs to developing specialized models and training method that respect the core principles of causal discovery.
arXiv Detail & Related papers (2025-06-01T05:38:56Z)
Bridging the LLM Accessibility Divide? Performance, Fairness, and Cost of Closed versus Open LLMs for Automated Essay Scoring [18.33969226071914]
We compare nine leading large language models (LLMs) across text assessment and generation tasks related to automated essay scoring. Our findings reveal that for few-shot learning-based assessment of human generated essays, open LLMs such as Llama 3 and Qwen2.5 perform comparably to GPT-4 in terms of predictive performance. For generative tasks, we find that essays generated by top open LLMs are comparable to closed LLMs in terms of their semantic composition/embeddings and ML assessed scores.
arXiv Detail & Related papers (2025-03-14T19:34:40Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios [58.90106984375913]
RuleArena is a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning.<n> Covering three practical domains -- airline baggage fees, NBA transactions, and tax regulations -- RuleArena assesses LLMs' proficiency in handling intricate natural language instructions.
arXiv Detail & Related papers (2024-12-12T06:08:46Z)
RuAG: Learned-rule-augmented Generation for Large Language Models [62.64389390179651]
We propose a novel framework, RuAG, to automatically distill large volumes of offline data into interpretable first-order logic rules. We evaluate our framework on public and private industrial tasks, including natural language processing, time-series, decision-making, and industrial tasks.
arXiv Detail & Related papers (2024-11-04T00:01:34Z)
Aligning with Logic: Measuring, Evaluating and Improving Logical Consistency in Large Language Models [31.558429029429863]
We study logical consistency of Large Language Models (LLMs) as a prerequisite for more reliable and trustworthy systems. We first propose a universal framework to quantify the logical consistency via three fundamental proxies: transitivity, commutativity and negation invariance. We then evaluate logical consistency, using the defined measures, of a wide range of LLMs, demonstrating that it can serve as a strong proxy for overall robustness.
arXiv Detail & Related papers (2024-10-03T04:34:04Z)
Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation [15.184067502284007]
Even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input. We propose and analyze three discriminative prompts: direct, inverse, and hybrid. Our insights reveal which discriminative prompt is most promising and when to use it.
arXiv Detail & Related papers (2024-06-27T02:26:47Z)
Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL [78.80673954827773]
Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. We propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. We are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.
arXiv Detail & Related papers (2024-05-10T11:44:05Z)
GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations [87.99872683336395]
Large Language Models (LLMs) are integrated into critical real-world applications. This paper evaluates LLMs' reasoning abilities in competitive environments. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks.
arXiv Detail & Related papers (2024-02-19T18:23:36Z)
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers. We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z)
LLMs for Relational Reasoning: How Far are We? [8.840750655261251]
Large language models (LLMs) have revolutionized many areas by achieving state-of-the-art performance on downstream tasks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems.
arXiv Detail & Related papers (2024-01-17T08:22:52Z)
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs) As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
Exploring Self-supervised Logic-enhanced Training for Large Language Models [59.227222647741094]
In this paper, we make the first attempt to investigate the feasibility of incorporating logical knowledge through self-supervised post-training. We devise an auto-regressive objective variant of MERIt and integrate it with two LLM series, i.e., FLAN-T5 and LLaMA, with parameter size ranging from 3 billion to 13 billion. The results on two challenging logical reasoning benchmarks demonstrate the effectiveness of LogicLLM.
arXiv Detail & Related papers (2023-05-23T06:13:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.