Related papers: Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models

Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models

URL: http://arxiv.org/abs/2512.08480v1
Date: Tue, 09 Dec 2025 10:55:33 GMT
Title: Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models
Authors: Ju-Young Kim, Ji-Hong Park, Se-Yeon Lee, Sujin Park, Gun-Woo Kim,
Abstract summary: We propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process.<n>We fine-tune a Korean large language model using the proposed method and conduct both quantitative performance comparisons and qualitative evaluations.<n> Experimental results show that the Kanana-1.5 model achieves an average accuracy of 846, improving by approximately 3.89 percent over standard supervised learning.
Score: 7.271743970152478
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal behavior, raising significant social concerns. Consequently, there is a growing need for research on techniques that can detect inappropriate utterances within conversational texts to help build a safer communication environment. Although large-scale language models trained on Korean corpora and chain-of-thought reasoning have recently gained attention, research applying these approaches to inappropriate utterance detection remains limited. In this study, we propose a soft inductive bias approach that explicitly defines reasoning perspectives to guide the inference process, thereby promoting rational decision-making and preventing errors that may arise during reasoning. We fine-tune a Korean large language model using the proposed method and conduct both quantitative performance comparisons and qualitative evaluations across different training strategies. Experimental results show that the Kanana-1.5 model achieves an average accuracy of 87.0046, improving by approximately 3.89 percent over standard supervised learning. These findings indicate that the proposed method goes beyond simple knowledge imitation by large language models and enables more precise and consistent judgments through constrained reasoning perspectives, demonstrating its effectiveness for inappropriate utterance detection.

Related papers

A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [58.32070787537946]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z)
Selective Demonstration Retrieval for Improved Implicit Hate Speech Detection [4.438698005789677]
Hate speech detection is a crucial area of research in natural language processing, essential for ensuring online community safety.<n>Unlike explicit hate speech, implicit expressions often depend on context, cultural subtleties, and hidden biases.<n>Large Language Models often show heightened sensitivity to toxic language and references to vulnerable groups, which can lead to misclassifications.<n>We propose a novel method, which utilizes in-context learning without requiring model fine-tuning.
arXiv Detail & Related papers (2025-04-16T13:43:23Z)
Reasoner Outperforms: Generative Stance Detection with Rationalization for Social Media [12.479554210753664]
This study adopts a generative approach, where stance predictions include explicit, interpretable rationales.<n>We find that incorporating reasoning into stance detection enables the smaller model (FlanT5) to outperform GPT-3.5's zero-shot performance.
arXiv Detail & Related papers (2024-12-13T16:34:39Z)
On Uncertainty In Natural Language Processing [2.5076643086429993]
This thesis studies how uncertainty in natural language processing can be characterized from a linguistic, statistical and neural perspective. We propose a method for calibrated sampling in natural language generation based on non-exchangeable conformal prediction. Lastly, we develop an approach to quantify confidence in large black-box language models using auxiliary predictors.
arXiv Detail & Related papers (2024-10-04T14:08:02Z)
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation [12.921225188504643]
We propose a novel Uncertainty-aware Reward Model (URM) that introduces a robust uncertainty estimation for the quality of paired responses.<n> Empirical results demonstrate significant benefits of incorporating the proposed proxy into language model training.
arXiv Detail & Related papers (2024-05-10T12:14:11Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
Conceptual and Unbiased Reasoning in Language Models [98.90677711523645]
We propose a novel conceptualization framework that forces models to perform conceptual reasoning on abstract questions. We show that existing large language models fall short on conceptual reasoning, dropping 9% to 28% on various benchmarks. We then discuss how models can improve since high-level abstract reasoning is key to unbiased and generalizable decision-making.
arXiv Detail & Related papers (2024-03-30T00:53:53Z)
How to Determine the Most Powerful Pre-trained Language Model without Brute Force Fine-tuning? An Empirical Survey [23.757740341834126]
We show that H-Score generally performs well with superiorities in effectiveness and efficiency. We also outline the difficulties of consideration of training details, applicability to text generation, and consistency to certain metrics which shed light on future directions.
arXiv Detail & Related papers (2023-12-08T01:17:28Z)
Contrastive Chain-of-Thought Prompting [74.10511560147293]
We propose contrastive chain of thought to enhance language model reasoning. Compared to the conventional chain of thought, our approach provides both valid and invalid reasoning demonstrations. Our experiments on reasoning benchmarks demonstrate that contrastive chain of thought can serve as a general enhancement of chain-of-thought prompting.
arXiv Detail & Related papers (2023-11-15T18:54:01Z)
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations. Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation. We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z)
Contrastive Learning for Inference in Dialogue [56.20733835058695]
Inference, especially those derived from inductive processes, is a crucial component in our conversation. Recent large language models show remarkable advances in inference tasks. But their performance in inductive reasoning, where not all information is present in the context, is far behind deductive reasoning.
arXiv Detail & Related papers (2023-10-19T04:49:36Z)
Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks. We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.