ChatCite: LLM Agent with Human Workflow Guidance for Comparative
Literature Summary
- URL: http://arxiv.org/abs/2403.02574v1
- Date: Tue, 5 Mar 2024 01:13:56 GMT
- Title: ChatCite: LLM Agent with Human Workflow Guidance for Comparative
Literature Summary
- Authors: Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen
- Abstract summary: ChatCite is an LLM agent with human workflow guidance for comparative literature summary.
The ChatCite agent outperformed other models in various dimensions in the experiments.
The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
- Score: 30.409552944905915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The literature review is an indispensable step in the research process. It
provides the benefit of comprehending the research problem and understanding
the current research situation while conducting a comparative analysis of prior
works. However, literature summary is challenging and time consuming. The
previous LLM-based studies on literature review mainly focused on the complete
process, including literature retrieval, screening, and summarization. However,
for the summarization step, simple CoT method often lacks the ability to
provide extensive comparative summary. In this work, we firstly focus on the
independent literature summarization step and introduce ChatCite, an LLM agent
with human workflow guidance for comparative literature summary. This agent, by
mimicking the human workflow, first extracts key elements from relevant
literature and then generates summaries using a Reflective Incremental
Mechanism. In order to better evaluate the quality of the generated summaries,
we devised a LLM-based automatic evaluation metric, G-Score, in refer to the
human evaluation criteria. The ChatCite agent outperformed other models in
various dimensions in the experiments. The literature summaries generated by
ChatCite can also be directly used for drafting literature reviews.
Related papers
- Are LLMs Good Literature Review Writers? Evaluating the Literature Review Writing Ability of Large Language Models [2.048226951354646]
We propose a framework to assess the literature review writing ability of large language models automatically.
We evaluate the performance of LLMs across three tasks: generating references, writing abstracts, and writing literature reviews.
arXiv Detail & Related papers (2024-12-18T08:42:25Z) - LLMs for Literature Review: Are we there yet? [15.785989492351684]
This paper explores the zero-shot abilities of recent Large Language Models in assisting with the writing of literature reviews based on an abstract.
For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper.
In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review.
arXiv Detail & Related papers (2024-12-15T01:12:26Z) - Leveraging Large Language Models for Comparative Literature Summarization with Reflective Incremental Mechanisms [44.99833362998488]
ChatCite is a novel method leveraging large language models (LLMs) for generating comparative literature summaries.
We evaluate ChatCite on a custom dataset, CompLit-LongContext, consisting of 1000 research papers with annotated comparative summaries.
arXiv Detail & Related papers (2024-12-03T04:09:36Z) - LLAssist: Simple Tools for Automating Literature Review Using Large Language Models [0.0]
LLAssist is an open-source tool designed to streamline literature reviews in academic research.
It uses Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process.
arXiv Detail & Related papers (2024-07-19T02:48:54Z) - A Comparative Study of Quality Evaluation Methods for Text Summarization [0.5512295869673147]
This paper proposes a novel method based on large language models (LLMs) for evaluating text summarization.
Our results show that LLMs evaluation aligns closely with human evaluation, while widely-used automatic metrics such as ROUGE-2, BERTScore, and SummaC do not and also lack consistency.
arXiv Detail & Related papers (2024-06-30T16:12:37Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.
ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Information-Theoretic Distillation for Reference-less Summarization [67.51150817011617]
We present a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization.
We start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization.
We arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT.
arXiv Detail & Related papers (2024-03-20T17:42:08Z) - Summarization is (Almost) Dead [49.360752383801305]
We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of large language models (LLMs)
Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models.
arXiv Detail & Related papers (2023-09-18T08:13:01Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Benchmarking Large Language Models for News Summarization [79.37850439866938]
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
We find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability.
arXiv Detail & Related papers (2023-01-31T18:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.