ChatCite: LLM Agent with Human Workflow Guidance for Comparative
Literature Summary
- URL: http://arxiv.org/abs/2403.02574v1
- Date: Tue, 5 Mar 2024 01:13:56 GMT
- Title: ChatCite: LLM Agent with Human Workflow Guidance for Comparative
Literature Summary
- Authors: Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen
- Abstract summary: ChatCite is an LLM agent with human workflow guidance for comparative literature summary.
The ChatCite agent outperformed other models in various dimensions in the experiments.
The literature summaries generated by ChatCite can also be directly used for drafting literature reviews.
- Score: 30.409552944905915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The literature review is an indispensable step in the research process. It
provides the benefit of comprehending the research problem and understanding
the current research situation while conducting a comparative analysis of prior
works. However, literature summary is challenging and time consuming. The
previous LLM-based studies on literature review mainly focused on the complete
process, including literature retrieval, screening, and summarization. However,
for the summarization step, simple CoT method often lacks the ability to
provide extensive comparative summary. In this work, we firstly focus on the
independent literature summarization step and introduce ChatCite, an LLM agent
with human workflow guidance for comparative literature summary. This agent, by
mimicking the human workflow, first extracts key elements from relevant
literature and then generates summaries using a Reflective Incremental
Mechanism. In order to better evaluate the quality of the generated summaries,
we devised a LLM-based automatic evaluation metric, G-Score, in refer to the
human evaluation criteria. The ChatCite agent outperformed other models in
various dimensions in the experiments. The literature summaries generated by
ChatCite can also be directly used for drafting literature reviews.
Related papers
- LLAssist: Simple Tools for Automating Literature Review Using Large Language Models [0.0]
LLAssist is an open-source tool designed to streamline literature reviews in academic research.
It uses Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process.
arXiv Detail & Related papers (2024-07-19T02:48:54Z) - Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews [7.355182982314533]
Large Language Models (LLMs) can be used to enhance the efficiency, speed, and precision of literature review filtering.
We show that using advanced LLMs with simple prompting can significantly reduce the time required for literature filtering.
We also show that false negatives can indeed be controlled through a consensus scheme, achieving recalls >98.8% at or even beyond the typical human error threshold.
arXiv Detail & Related papers (2024-07-15T12:13:53Z) - A Comparative Study of Quality Evaluation Methods for Text Summarization [0.5512295869673147]
This paper proposes a novel method based on large language models (LLMs) for evaluating text summarization.
Our results show that LLMs evaluation aligns closely with human evaluation, while widely-used automatic metrics such as ROUGE-2, BERTScore, and SummaC do not and also lack consistency.
arXiv Detail & Related papers (2024-06-30T16:12:37Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation.
Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z) - Information-Theoretic Distillation for Reference-less Summarization [67.51150817011617]
We present a novel framework to distill a powerful summarizer based on the information-theoretic objective for summarization.
We start off from Pythia-2.8B as the teacher model, which is not yet capable of summarization.
We arrive at a compact but powerful summarizer with only 568M parameters that performs competitively against ChatGPT.
arXiv Detail & Related papers (2024-03-20T17:42:08Z) - Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers [25.268709339109893]
We evaluate recent Large Language Models (LLMs) on the challenging task of summarizing short stories.
We work directly with authors to ensure that the stories have not been shared online (and therefore are unseen by the models)
We compare GPT-4, Claude-2.1, and LLama-2-70B and find that all three models make faithfulness mistakes in over 50% of summaries.
arXiv Detail & Related papers (2024-03-02T01:52:14Z) - Summarization is (Almost) Dead [49.360752383801305]
We develop new datasets and conduct human evaluation experiments to evaluate the zero-shot generation capability of large language models (LLMs)
Our findings indicate a clear preference among human evaluators for LLM-generated summaries over human-written summaries and summaries generated by fine-tuned models.
arXiv Detail & Related papers (2023-09-18T08:13:01Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Benchmarking Large Language Models for News Summarization [79.37850439866938]
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
We find instruction tuning, and not model size, is the key to the LLM's zero-shot summarization capability.
arXiv Detail & Related papers (2023-01-31T18:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.