Related papers: Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation

Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation

URL: http://arxiv.org/abs/2402.12593v1
Date: Mon, 19 Feb 2024 23:18:18 GMT
Title: Standardize: Aligning Language Models with Expert-Defined Standards for Content Generation
Authors: Joseph Marvin Imperial, Gail Forey, Harish Tayyar Madabushi
Abstract summary: We introduce Standardize, a retrieval-style in-context learning-based framework to guide large language models to align with expert-defined standards. Our findings show that models can gain 40% to 100% increase in precise accuracy for Llama2 and GPT-4, respectively.
Score: 4.1205832766381985
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials. However, current works in controllable text generation have yet to explore using these standards as references for control. Towards this end, we introduce Standardize, a retrieval-style in-context learning-based framework to guide large language models to align with expert-defined standards. Focusing on English language standards in the education domain as a use case, we consider the Common European Framework of Reference for Languages (CEFR) and Common Core Standards (CCS) for the task of open-ended content generation. Our findings show that models can gain 40% to 100% increase in precise accuracy for Llama2 and GPT-4, respectively, demonstrating that the use of knowledge artifacts extracted from standards and integrating them in the generation process can effectively guide models to produce better standard-aligned content.

Related papers

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark [50.89916747049978]
Existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope.<n>We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data.<n>We propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models, enabling smaller models to achieve performance comparable to state-of-the-art LLMs.
arXiv Detail & Related papers (2025-06-12T17:57:05Z)
UniversalCEFR: Enabling Open Multilingual Research on Language Proficiency Assessment [41.01607386452566]
We introduce UniversalCEFR, a large-scale multilingual multidimensional dataset of annotated texts according to the CEFR scale in 13 languages.<n>To enable open research in both automated readability and language proficiency assessment, UniversalCEFR comprises 505,807 CEFR-labeled texts.
arXiv Detail & Related papers (2025-06-02T08:21:16Z)
MoRE-LLM: Mixture of Rule Experts Guided by a Large Language Model [54.14155564592936]
We propose a Mixture of Rule Experts guided by a Large Language Model (MoRE-LLM) MoRE-LLM steers the discovery of local rule-based surrogates during training and their utilization for the classification task. LLM is responsible for enhancing the domain knowledge alignment of the rules by correcting and contextualizing them.
arXiv Detail & Related papers (2025-03-26T11:09:21Z)
LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning. TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z)
Controllable Text Generation for Large Language Models: A Survey [27.110528099257156]
This paper systematically reviews the latest advancements in Controllable Text Generation for Large Language Models. We categorize CTG tasks into two primary types: content control and control. We address key challenges in current research, including reduced fluency and practicality.
arXiv Detail & Related papers (2024-08-22T17:59:04Z)
LLMCRIT: Teaching Large Language Models to Use Criteria [38.12026374220591]
We propose a framework that enables large language models (LLMs) to use comprehensive criteria for a task in delivering natural language feedback on task execution. In particular, we present a model-in-the-loop framework that semi-automatically derives criteria from collected guidelines for different writing tasks and constructs in-context demonstrations for each criterion. The results reveal the fine-grained effects of incorporating criteria and demonstrations and provide valuable insights on how to teach LLMs to use criteria more effectively.
arXiv Detail & Related papers (2024-03-02T02:25:55Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution [48.86322922826514]
This paper defines a new task of Knowledge-aware Language Model Attribution (KaLMA) First, we extend attribution source from unstructured texts to Knowledge Graph (KG), whose rich structures benefit both the attribution performance and working scenarios. Second, we propose a new Conscious Incompetence" setting considering the incomplete knowledge repository. Third, we propose a comprehensive automatic evaluation metric encompassing text quality, citation quality, and text citation alignment.
arXiv Detail & Related papers (2023-10-09T11:45:59Z)
Flesch or Fumble? Evaluating Readability Standard Alignment of Instruction-Tuned Language Models [4.867923281108005]
We select a diverse set of open and closed-source instruction-tuned language models and investigate their performances in writing story completions and simplifying narratives. Our findings provide empirical proof of how globally recognized models like ChatGPT may be considered less effective and may require more refined prompts for these generative tasks.
arXiv Detail & Related papers (2023-09-11T13:50:38Z)
KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA) We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks. We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z)
Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model. It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z)
CPL-NoViD: Context-Aware Prompt-based Learning for Norm Violation Detection in Online Communities [28.576099654579437]
We introduce Context-aware Prompt-based Learning for Norm Violation Detection (CPL-NoViD) CPL-NoViD outperforms the baseline by incorporating context through natural language prompts. It establishes a new state-of-the-art in norm violation detection, surpassing existing benchmarks.
arXiv Detail & Related papers (2023-05-16T23:27:59Z)
Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context. We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability. Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z)
Controllable Text Generation with Language Constraints [39.741059642044874]
We consider the task of text generation in language models with constraints specified in natural language. Our benchmark contains knowledge-intensive constraints sourced from databases like Wordnet and Wikidata. We propose a solution to leverage a language model's own internal knowledge to guide generation.
arXiv Detail & Related papers (2022-12-20T17:39:21Z)
Towards Making the Most of Context in Neural Machine Translation [112.9845226123306]
We argue that previous research did not make a clear use of the global context. We propose a new document-level NMT framework that deliberately models the local context of each sentence.
arXiv Detail & Related papers (2020-02-19T03:30:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.