SciLit: A Platform for Joint Scientific Literature Discovery,
Summarization and Citation Generation
- URL: http://arxiv.org/abs/2306.03535v2
- Date: Mon, 6 Nov 2023 15:53:23 GMT
- Title: SciLit: A Platform for Joint Scientific Literature Discovery,
Summarization and Citation Generation
- Authors: Nianlong Gu, Richard H.R. Hahnloser
- Abstract summary: We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper.
SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system.
- Score: 11.186252009101077
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scientific writing involves retrieving, summarizing, and citing relevant
papers, which can be time-consuming processes in large and rapidly evolving
fields. By making these processes inter-operable, natural language processing
(NLP) provides opportunities for creating end-to-end assistive writing tools.
We propose SciLit, a pipeline that automatically recommends relevant papers,
extracts highlights, and suggests a reference sentence as a citation of a
paper, taking into consideration the user-provided context and keywords. SciLit
efficiently recommends papers from large databases of hundreds of millions of
papers using a two-stage pre-fetching and re-ranking literature search system
that flexibly deals with addition and removal of a paper database. We provide a
convenient user interface that displays the recommended papers as extractive
summaries and that offers abstractively-generated citing sentences which are
aligned with the provided context and which mention the chosen keyword(s). Our
assistive tool for literature discovery and scientific writing is available at
https://scilit.vercel.app
Related papers
- Context-Enhanced Language Models for Generating Multi-Paper Citations [35.80247519023821]
We propose a method that leverages Large Language Models (LLMs) to generate multi-citation sentences.
Our approach involves a single source paper and a collection of target papers, culminating in a coherent paragraph containing multi-sentence citation text.
arXiv Detail & Related papers (2024-04-22T04:30:36Z) - Bridging Research and Readers: A Multi-Modal Automated Academic Papers
Interpretation System [47.13932723910289]
We introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages.
It employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately.
It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section.
It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs.
arXiv Detail & Related papers (2024-01-17T11:50:53Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Non-Parametric Memory Guidance for Multi-Document Summarization [0.0]
We propose a retriever-guided model combined with non-parametric memory for summary generation.
This model retrieves relevant candidates from a database and then generates the summary considering the candidates with a copy mechanism and the source documents.
Our method is evaluated on the MultiXScience dataset which includes scientific articles.
arXiv Detail & Related papers (2023-11-14T07:41:48Z) - QuOTeS: Query-Oriented Technical Summarization [0.2936007114555107]
We propose QuOTeS, an interactive system designed to retrieve sentences related to a summary of the research from a collection of potential references.
QuOTeS integrates techniques from Query-Focused Extractive Summarization and High-Recall Information Retrieval to provide Interactive Query-Focused Summarization of scientific documents.
The results show that QuOTeS provides a positive user experience and consistently provides query-focused summaries that are relevant, concise, and complete.
arXiv Detail & Related papers (2023-06-20T18:43:24Z) - Generate rather than Retrieve: Large Language Models are Strong Context
Generators [74.87021992611672]
We present a novel perspective for solving knowledge-intensive tasks by replacing document retrievers with large language model generators.
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
arXiv Detail & Related papers (2022-09-21T01:30:59Z) - iFacetSum: Coreference-based Interactive Faceted Summarization for
Multi-Document Exploration [63.272359227081836]
iFacetSum integrates interactive summarization together with faceted search.
Fine-grained facets are automatically produced based on cross-document coreference pipelines.
arXiv Detail & Related papers (2021-09-23T20:01:11Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - Interactive Extractive Search over Biomedical Corpora [41.72755714431404]
We present a system that allows life-science researchers to search a linguistically annotated corpus of texts.
We introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations.
Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine.
arXiv Detail & Related papers (2020-06-07T13:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.