De-jargonizing Science for Journalists with GPT-4: A Pilot Study
- URL: http://arxiv.org/abs/2410.12069v1
- Date: Tue, 15 Oct 2024 21:10:01 GMT
- Title: De-jargonizing Science for Journalists with GPT-4: A Pilot Study
- Authors: Sachita Nishal, Eric Lee, Nicholas Diakopoulos,
- Abstract summary: The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification.
The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.
- Score: 3.730699089967391
- License:
- Abstract: This study offers an initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge. The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification, suggesting personalization as a feasible use-case for LLMs to support sense-making of complex information. Surprisingly, using only abstracts for context to generate definitions yields slightly more accurate and higher quality definitions than using RAG-based context from the fulltext of an article. The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.
Related papers
- Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text.
We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts.
We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z) - Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting [59.97247234955861]
We introduce a novel framework based on large language models (LLMs) that combines a progressive prompting algorithm with a dual-agent system, named LLM-Duo.
Our method identifies 2,421 interventions from 64,177 research articles in the speech-language therapy domain.
arXiv Detail & Related papers (2024-08-20T16:42:23Z) - Large Language Models for Scientific Information Extraction: An
Empirical Study for Virology [0.0]
We champion the use of structured and semantic content representation of discourse-based scholarly communication.
Inspired by tools like Wikipedia infoboxes or structured Amazon product descriptions, we develop an automated approach to produce structured scholarly contribution summaries.
Our results show that finetuned FLAN-T5 with 1000x fewer parameters than the state-of-the-art GPT-davinci is competitive for the task.
arXiv Detail & Related papers (2024-01-18T15:04:55Z) - Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems [58.561904356651276]
We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems.
KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities.
KERL achieves state-of-the-art results in both recommendation and response generation tasks.
arXiv Detail & Related papers (2023-12-18T06:41:23Z) - Personalized Jargon Identification for Enhanced Interdisciplinary
Communication [22.999616448996303]
Current methods of jargon identification mainly use corpus-level familiarity indicators.
We collect a dataset of over 10K term familiarity annotations from 11 computer science researchers.
We investigate features representing individual, sub-domain, and domain knowledge to predict individual jargon familiarity.
arXiv Detail & Related papers (2023-11-16T00:51:25Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Text Simplification of Scientific Texts for Non-Expert Readers [3.4761212729163318]
Simplification of scientific abstracts helps non-experts to access the core information.
This is especially relevant for, e.g., cancer patients reading about novel treatment options.
arXiv Detail & Related papers (2023-07-07T13:05:11Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Large-Scale Text Analysis Using Generative Language Models: A Case Study
in Discovering Public Value Expressions in AI Patents [2.246222223318928]
This paper employs a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis.
We collect a database comprising 154,934 patent documents using an advanced Boolean query submitted to InnovationQ+.
We design a framework for identifying and labeling public value expressions in these AI patent sentences.
arXiv Detail & Related papers (2023-05-17T17:18:26Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - NLPContributions: An Annotation Scheme for Machine Reading of Scholarly
Contributions in Natural Language Processing Literature [0.0]
We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles.
We develop the annotation task based on a pilot exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks.
We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.
arXiv Detail & Related papers (2020-06-23T10:04:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.