HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation
- URL: http://arxiv.org/abs/2410.03761v1
- Date: Wed, 2 Oct 2024 13:02:03 GMT
- Title: HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation
- Authors: Yuntong Hu, Zhuofeng Li, Zheng Zhang, Chen Ling, Raasikh Kanjiani, Boxin Zhao, Liang Zhao,
- Abstract summary: HiReview is a novel framework for hierarchical taxonomy-driven automatic literature review generation.
Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods.
- Score: 15.188580557890942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we present HiReview, a novel framework for hierarchical taxonomy-driven automatic literature review generation. With the exponential growth of academic documents, manual literature reviews have become increasingly labor-intensive and time-consuming, while traditional summarization models struggle to generate comprehensive document reviews effectively. Large language models (LLMs), with their powerful text processing capabilities, offer a potential solution; however, research on incorporating LLMs for automatic document generation remains limited. To address key challenges in large-scale automatic literature review generation (LRG), we propose a two-stage taxonomy-then-generation approach that combines graph-based hierarchical clustering with retrieval-augmented LLMs. First, we retrieve the most relevant sub-community within the citation network, then generate a hierarchical taxonomy tree by clustering papers based on both textual content and citation relationships. In the second stage, an LLM generates coherent and contextually accurate summaries for clusters or topics at each hierarchical level, ensuring comprehensive coverage and logical organization of the literature. Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods, achieving superior hierarchical organization, content relevance, and factual accuracy in automatic literature review generation tasks.
Related papers
- Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark [63.97125827026949]
This paper explores the feasibility of using text-to-image models in a zero-shot setup to generate images for taxonomy concepts.
A benchmark is proposed that assesses models' abilities to understand taxonomy concepts and generate relevant, high-quality images.
The 12 models are evaluated using 9 novel taxonomy-related text-to-image metrics and human feedback.
arXiv Detail & Related papers (2025-03-13T13:37:54Z) - Integrating Planning into Single-Turn Long-Form Text Generation [66.08871753377055]
We propose to use planning to generate long form content.
Our main novelty lies in a single auxiliary task that does not require multiple rounds of prompting or planning.
Our experiments demonstrate on two datasets from different domains, that LLMs fine-tuned with the auxiliary task generate higher quality documents.
arXiv Detail & Related papers (2024-10-08T17:02:40Z) - Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - Refining Wikidata Taxonomy using Large Language Models [2.392329079182226]
We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques.
Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM.
arXiv Detail & Related papers (2024-09-06T06:53:45Z) - Meta Knowledge for Retrieval Augmented Large Language Models [0.0]
We introduce a novel data-centric RAG workflow for Large Language Models (LLMs)
Our methodology relies on generating metadata and synthetic Questions and Answers (QA) for each document.
We demonstrate that using augmented queries with synthetic question matching significantly outperforms traditional RAG pipelines.
arXiv Detail & Related papers (2024-08-16T20:55:21Z) - CHIME: LLM-Assisted Hierarchical Organization of Scientific Studies for Literature Review Support [31.327873791724326]
Literature review requires researchers to synthesize a large amount of information and is increasingly challenging as the scientific literature expands.
In this work, we investigate the potential of LLMs for producing hierarchical organizations of scientific studies to assist researchers with literature review.
arXiv Detail & Related papers (2024-07-23T03:18:00Z) - Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
RAG has been widely adopted to enhance Large Language Models (LLMs)
Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG.
This paper proposes a fine-grained ATG method called ReClaim(Refer & Claim), which alternates the generation of references and answers step by step.
arXiv Detail & Related papers (2024-07-01T20:47:47Z) - Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling [0.9095496510579351]
We investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora.
Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics.
arXiv Detail & Related papers (2024-03-24T17:39:51Z) - Incremental hierarchical text clustering methods: a review [49.32130498861987]
This study aims to analyze various hierarchical and incremental clustering techniques.
The main contribution of this research is the organization and comparison of the techniques used by studies published between 2010 and 2018 that aimed to texts documents clustering.
arXiv Detail & Related papers (2023-12-12T22:27:29Z) - Towards Visual Taxonomy Expansion [50.462998483087915]
We propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion task.
We propose a textual hypernymy learning task and a visual prototype learning task to cluster textual and visual semantics.
Our method is evaluated on two datasets, where we obtain compelling results.
arXiv Detail & Related papers (2023-09-12T10:17:28Z) - Hierarchical Catalogue Generation for Literature Review: A Benchmark [36.22298354302282]
We construct a novel English Hierarchical Catalogues of Literature Reviews dataset with 7.6k literature review catalogues and 389k reference papers.
To accurately assess the model performance, we design two evaluation metrics for informativeness and similarity to ground truth from semantics and structure.
arXiv Detail & Related papers (2023-04-07T07:13:35Z) - Bringing motion taxonomies to continuous domains via GPLVM on hyperbolic manifolds [8.385386712928785]
Human motion serves as high-level hierarchical abstractions that classify how humans move and interact with their environment.
We propose to model taxonomy data via hyperbolic embeddings that capture the associated hierarchical structure.
We show that our model properly encodes unseen data from existing or new taxonomy categories, and outperforms its Euclidean and VAE-based counterparts.
arXiv Detail & Related papers (2022-10-04T15:19:24Z) - TaxoEnrich: Self-Supervised Taxonomy Completion via Structure-Semantic
Representations [28.65753036636082]
We propose a new taxonomy completion framework, which effectively leverages both semantic features and structural information in the existing taxonomy.
TaxoEnrich consists of four components: (1) taxonomy-contextualized embedding which incorporates both semantic meanings of concept and taxonomic relations based on powerful pretrained language models; (2) a taxonomy-aware sequential encoder which learns candidate position representations by encoding the structural information of taxonomy.
Experiments on four large real-world datasets from different domains show that TaxoEnrich achieves the best performance among all evaluation metrics and outperforms previous state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-02-10T08:10:43Z) - Taxonomy Enrichment with Text and Graph Vector Representations [61.814256012166794]
We address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy.
We present a new method that allows achieving high results on this task with little effort.
We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
arXiv Detail & Related papers (2022-01-21T09:01:12Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - Path Based Hierarchical Clustering on Knowledge Graphs [1.713291434132985]
We present a novel approach for inducing a hierarchy of subject clusters.
Our method first constructs a tag hierarchy before assigning subjects to clusters on this hierarchy.
We quantitatively demonstrate our method's ability to induce a coherent cluster hierarchy on three real-world datasets.
arXiv Detail & Related papers (2021-09-27T16:42:43Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.