Related papers: Safe and Responsible Large Language Model : Can We Balance Bias Reduction and Language Understanding in Large Language Models?

Safe and Responsible Large Language Model : Can We Balance Bias Reduction and Language Understanding in Large Language Models?

URL: http://arxiv.org/abs/2404.01399v3
Date: Mon, 1 Jul 2024 17:40:13 GMT
Title: Safe and Responsible Large Language Model : Can We Balance Bias Reduction and Language Understanding in Large Language Models?
Authors: Shaina Raza, Oluwanifemi Bamgbose, Shardul Ghuge, Fatemeh Tavakol, Deepak John Reji, Syed Raza Bashir,
Abstract summary: Current approaches to produce unbiased outputs from Large Language Models can reduce biases but at the expense of knowledge retention. We develop the Safety and Responsible Large Language Model (textbfSR$_textLLM$) to diminish biases in generated text. The results confirm that textbfSR$textLLM$ outperforms traditional fine-tuning and prompting methods in both reducing biases and preserving the integrity of language knowledge.
Score: 2.089112028396727
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) have significantly advanced various NLP tasks. However, these models often risk generating unsafe text that perpetuates biases. Current approaches to produce unbiased outputs from LLMs can reduce biases but at the expense of knowledge retention. In this research, we address the question of whether producing safe (unbiased) outputs through LLMs can retain knowledge and language understanding. In response, we developed the Safety and Responsible Large Language Model (\textbf{SR}$_{\text{LLM}}$), an LLM that has been instruction fine-tuned on top of already safe LLMs (e.g., Llama2 or related) to diminish biases in generated text. To achieve our goals, we compiled a specialized dataset designed to train our model in identifying and correcting biased text. We conduct experiments, both on this custom data and out-of-distribution test sets, to show the bias reduction and knowledge retention. The results confirm that \textbf{SR}$_{\text{LLM}}$ outperforms traditional fine-tuning and prompting methods in both reducing biases and preserving the integrity of language knowledge. The significance of our findings lies in demonstrating that instruction fine-tuning can provide a more robust solution for bias reduction in LLMs. We have made our code and data available at \href{https://github.com/shainarazavi/Safe-Responsible-LLM}{Safe-LLM}.

Related papers

Language Bottleneck Models: A Framework for Interpretable Knowledge Tracing and Beyond [55.984684518346924]
We recast Knowledge Tracing as an inverse problem: learning the minimum natural-language summary that makes past answers explainable and future answers predictable.<n>Our Language Bottleneck Model (LBM) consists of an encoder LLM that writes an interpretable knowledge summary and a frozen decoder LLM that must reconstruct and predict student responses using only that summary text.<n> Experiments on synthetic arithmetic benchmarks and the large-scale Eedi dataset show that LBMs rival the accuracy of state-of-the-art KT and direct LLM methods while requiring orders-of-magnitude fewer student trajectories.
arXiv Detail & Related papers (2025-06-20T13:21:14Z)
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models? [83.53005932513155]
Multi-modal large language models (MLLMs) have made significant progress, yet their safety alignment remains limited. We propose finetuning MLLMs on a small set of benign instruct-following data with responses replaced by simple, clear rejection sentences.
arXiv Detail & Related papers (2025-04-14T09:03:51Z)
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu [53.437954702561065]
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT. This study systematically investigates how each resource and its quality affects the translation performance, with the Manchu language. Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help.
arXiv Detail & Related papers (2025-02-17T14:53:49Z)
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet [55.39571645315926]
Large Language Models (LLMs) rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages, the limited availability of such data hampers the models' ability to generalize effectively. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages.
arXiv Detail & Related papers (2025-01-31T12:23:28Z)
Extracting Memorized Training Data via Decomposition [24.198975804570072]
We demonstrate a simple, query-based decompositional method to extract news articles from two frontier Large Language Models. We extract at least one sentence from 73 articles, and over 20% of verbatim sentences from 6 articles. If replicable at scale, this training data extraction methodology could expose new LLM security and safety vulnerabilities.
arXiv Detail & Related papers (2024-09-18T23:59:32Z)
Course-Correction: Safety Alignment Using Synthetic Preferences [17.897817682322053]
We introduce the textscC$2$-Eval benchmark for quantitative assessment and analyze 10 popular language models. Using an automated pipeline, we create textscC$2$-Syn, a synthetic dataset with 750K pairwise preferences. Experiments on 2 LLMs, textscLlama2-Chat 7B and textscQwen2 7B, show that our method effectively enhances course-correction skills without affecting general performance.
arXiv Detail & Related papers (2024-07-23T16:54:28Z)
Robustness of LLMs to Perturbations in Text [2.0670689746336]
Large language models (LLMs) have shown impressive performance, but can they handle the inevitable noise in real-world data? This work tackles this critical question by investigating LLMs' resilience against morphological variations in text. Our findings show that contrary to popular beliefs, generative LLMs are quiet robust to noisy perturbations in text.
arXiv Detail & Related papers (2024-07-12T04:50:17Z)
$\orall$uto$\exists$val: Autonomous Assessment of LLMs in Formal Synthesis and Interpretation Tasks [21.12437562185667]
This paper presents a new approach for scaling LLM assessment in translating formal syntax to natural language. We use context-free grammars (CFGs) to generate out-of-distribution datasets on the fly. We also conduct an assessment of several SOTA closed and open-source LLMs to showcase the feasibility and scalability of this paradigm.
arXiv Detail & Related papers (2024-03-27T08:08:00Z)
Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing. Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z)
TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement [26.26493253161022]
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT) We introduce a systematic LLM-based self-refinement translation framework, named textbfTEaR.
arXiv Detail & Related papers (2024-02-26T07:58:12Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Tuna: Instruction Tuning using Feedback from Large Language Models [74.04950416204551]
We propose finetuning an instruction-tuned large language model using our novel textitprobabilistic ranking and textitcontextual ranking approaches. Probabilistic ranking enables the instruction-tuned model to inherit the relative rankings of high-quality and low-quality responses from the teacher LLM. On the other hand, learning with contextual ranking allows the model to refine its own response distribution using the contextual understanding ability of stronger LLMs.
arXiv Detail & Related papers (2023-10-20T09:55:06Z)
On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused? [49.99955642001019]
We show that open-sourced, aligned large language models could be easily misguided to generate undesired content. Our key idea is to directly manipulate the generation process of open-sourced LLMs to misguide it to generate undesired content.
arXiv Detail & Related papers (2023-10-02T19:22:01Z)
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs [59.596335292426105]
This paper collects the first open-source dataset to evaluate safeguards in large language models. We train several BERT-like classifiers to achieve results comparable with GPT-4 on automatic safety evaluation.
arXiv Detail & Related papers (2023-08-25T14:02:12Z)
TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation. Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning. Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.