Can Large Language Models Understand Real-World Complex Instructions?
- URL: http://arxiv.org/abs/2309.09150v2
- Date: Mon, 8 Jan 2024 07:49:23 GMT
- Title: Can Large Language Models Understand Real-World Complex Instructions?
- Authors: Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He,
Xunzhe Zhou, Lida Chen, Xintao Wang, Yuncheng Huang, Haoning Ye, Zihan Li,
Shisong Chen, Yikai Zhang, Zhouhong Gu, Jiaqing Liang, Yanghua Xiao
- Abstract summary: Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
- Score: 54.86632921036983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) can understand human instructions, showing their
potential for pragmatic applications beyond traditional NLP tasks. However,
they still struggle with complex instructions, which can be either complex task
descriptions that require multiple tasks and constraints, or complex input that
contains long context, noise, heterogeneous information and multi-turn format.
Due to these features, LLMs often ignore semantic constraints from task
descriptions, generate incorrect formats, violate length or sample count
constraints, and be unfaithful to the input text. Existing benchmarks are
insufficient to assess LLMs' ability to understand complex instructions, as
they are close-ended and simple. To bridge this gap, we propose CELLO, a
benchmark for evaluating LLMs' ability to follow complex instructions
systematically. We design eight features for complex instructions and construct
a comprehensive evaluation dataset from real-world scenarios. We also establish
four criteria and develop corresponding metrics, as current ones are
inadequate, biased or too strict and coarse-grained. We compare the performance
of representative Chinese-oriented and English-oriented models in following
complex instructions through extensive experiments. Resources of CELLO are
publicly available at https://github.com/Abbey4799/CELLO.
Related papers
- Constraint Back-translation Improves Complex Instruction Following of Large Language Models [55.60192044049083]
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc.
Previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs.
We propose a novel data generation technique, constraint back-translation.
arXiv Detail & Related papers (2024-10-31T17:42:26Z) - Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts [5.520335305387487]
We propose a novel prompting strategy Multi-Lingual Prompt, namely MLPrompt.
MLPrompt translates the error-prone rule that an LLM struggles to follow into another language, thus drawing greater attention to it.
We introduce a framework integrating MLPrompt with an auto-checking mechanism for structured data generation, with a specific case study in text-to-MIP instances.
arXiv Detail & Related papers (2024-09-17T10:33:27Z) - Enhancing LLM's Cognition via Structurization [41.13997892843677]
Large language models (LLMs) process input contexts through a causal and sequential perspective.
This paper presents a novel concept of context structurization.
Specifically, we transform the plain, unordered contextual sentences into well-ordered and hierarchically structurized elements.
arXiv Detail & Related papers (2024-07-23T12:33:58Z) - Benchmarking Complex Instruction-Following with Multiple Constraints Composition [72.82640456309821]
How to evaluate the ability of complex instruction-following of large language models (LLMs) has become a critical research problem.
Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints.
We propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints.
arXiv Detail & Related papers (2024-07-04T14:50:45Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks.
Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language.
We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z) - kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest
Neighbor In-Context Learning [50.40636157214161]
Task-Oriented Parsing (TOP) enables conversational assistants to interpret user commands expressed in natural language.
LLMs have achieved impressive performance in computer programs based on a natural language prompt.
This paper focuses on harnessing the capabilities of LLMs for semantic parsing tasks.
arXiv Detail & Related papers (2023-12-17T17:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.