Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
- URL: http://arxiv.org/abs/2402.14453v1
- Date: Thu, 22 Feb 2024 11:16:23 GMT
- Title: Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
- Authors: Seiji Gobara, Hidetaka Kamigaito and Taro Watanabe
- Abstract summary: We show that large language models can implicitly handle text difficulty between user input and generated text.
Some LLMs can surpass humans in handling text difficulty and the importance of instruction-tuning.
- Score: 29.6000895693808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Education that suits the individual learning level is necessary to improve
students' understanding. The first step in achieving this purpose by using
large language models (LLMs) is to adjust the textual difficulty of the
response to students. This work analyzes how LLMs can implicitly adjust text
difficulty between user input and its generated text. To conduct the
experiments, we created a new dataset from Stack-Overflow to explore the
performance of question-answering-based conversation. Experimental results on
the Stack-Overflow dataset and the TSCC dataset, including multi-turn
conversation show that LLMs can implicitly handle text difficulty between user
input and its generated response. We also observed that some LLMs can surpass
humans in handling text difficulty and the importance of instruction-tuning.
Related papers
- Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts [20.933548500888595]
Using large language models (LLMs) for educational applications like dialogue-based teaching is a hot topic.
Current Static metrics for text difficulty, like the Flesch-Kincaid Reading Ease score, are known to be crude and brittle.
We introduce and evaluate a new set of Prompt-based metrics for text difficulty.
arXiv Detail & Related papers (2024-05-15T16:22:16Z) - User-LLM: Efficient LLM Contextualization with User Embeddings [24.099604517203606]
We propose User-LLM, a novel framework that leverages user embeddings to contextualize large language models (LLMs)
Our experiments on MovieLens, Amazon Review, and Google Local Review datasets demonstrate significant performance gains across various tasks.
arXiv Detail & Related papers (2024-02-21T08:03:27Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks.
Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z) - Let the LLMs Talk: Simulating Human-to-Human Conversational QA via
Zero-Shot LLM-to-LLM Interactions [19.365615476223635]
Conversational question-answering systems aim to create interactive search systems that retrieve information by interacting with users.
Existing work uses human annotators to play the roles of the questioner (student) and the answerer (teacher)
We propose a simulation framework that employs zero-shot learner LLMs for simulating teacher-student interactions.
arXiv Detail & Related papers (2023-12-05T17:38:02Z) - How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text Detection [39.254432080406346]
Even task-oriented constraints -- constraints that would naturally be included in an instruction and are not related to detection-evasion -- cause existing powerful detectors to have a large variance in detection performance.
Our experiments show that the standard deviation (SD) of current detector performance on texts generated by an instruction with such a constraint is significantly larger (up to an SD of 14.4 F1-score) than that by generating texts multiple times or paraphrasing the instruction.
arXiv Detail & Related papers (2023-11-14T18:32:52Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - SeqXGPT: Sentence-Level AI-Generated Text Detection [62.3792779440284]
We introduce a sentence-level detection challenge by synthesizing documents polished with large language models (LLMs)
We then propose textbfSequence textbfX (Check) textbfGPT, a novel method that utilizes log probability lists from white-box LLMs as features for sentence-level AIGT detection.
arXiv Detail & Related papers (2023-10-13T07:18:53Z) - Enhancing In-Context Learning with Answer Feedback for Multi-Span
Question Answering [9.158919909909146]
In this paper, we propose a novel way of employing labeled data such as it informs LLM of some undesired output.
Experiments on three multi-span question answering datasets and a keyphrase extraction dataset show that our new prompting strategy consistently improves LLM's in-context learning performance.
arXiv Detail & Related papers (2023-06-07T15:20:24Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.