Related papers: LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models

LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models

URL: http://arxiv.org/abs/2312.00909v1
Date: Fri, 1 Dec 2023 20:13:08 GMT
Title: LLM-TAKE: Theme Aware Keyword Extraction Using Large Language Models
Authors: Reza Yousefi Maragheh, Chenhao Fang, Charan Chand Irugu, Parth Parikh, Jason Cho, Jianpeng Xu, Saranyan Sukumar, Malay Patel, Evren Korpeoglu, Sushant Kumar and Kannan Achan
Abstract summary: We explore using Large Language Models (LLMs) in generating keywords for items that are inferred from the items textual metadata. Our modeling framework includes several stages to fine grain the results by avoiding outputting keywords that are non informative or sensitive. We propose two variations of framework for generating extractive and abstractive themes for products in an E commerce setting.
Score: 10.640773460677542
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Keyword extraction is one of the core tasks in natural language processing. Classic extraction models are notorious for having a short attention span which make it hard for them to conclude relational connections among the words and sentences that are far from each other. This, in turn, makes their usage prohibitive for generating keywords that are inferred from the context of the whole text. In this paper, we explore using Large Language Models (LLMs) in generating keywords for items that are inferred from the items textual metadata. Our modeling framework includes several stages to fine grain the results by avoiding outputting keywords that are non informative or sensitive and reduce hallucinations common in LLM. We call our LLM-based framework Theme-Aware Keyword Extraction (LLM TAKE). We propose two variations of framework for generating extractive and abstractive themes for products in an E commerce setting. We perform an extensive set of experiments on three real data sets and show that our modeling framework can enhance accuracy based and diversity based metrics when compared with benchmark models.

Related papers

Towards Text-Image Interleaved Retrieval [49.96332254241075]
We introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences. We construct a TIIR benchmark based on naturally interleaved wikiHow tutorials, where a specific pipeline is designed to generate interleaved queries. We propose a novel Matryoshka Multimodal Embedder (MME), which compresses the number of visual tokens at different granularity.
arXiv Detail & Related papers (2025-02-18T12:00:47Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs) We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy. We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization [0.27624021966289597]
This paper introduces EYEGLAXS, a framework that leverages Large Language Models (LLMs) for extractive summarization. EYEGLAXS focuses on extractive summarization to ensure factual and grammatical integrity. The system sets new performance benchmarks on well-known datasets like PubMed and ArXiv.
arXiv Detail & Related papers (2024-08-28T13:52:19Z)
Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs) We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT. We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z)
Comparative Study of Domain Driven Terms Extraction Using Large Language Models [0.0]
Keywords play a crucial role in bridging the gap between human understanding and machine processing of textual data. This review focuses on keyword extraction methods, emphasizing the use of three major Large Language Models (LLMs): Llama2-7B, GPT-3.5, and Falcon-7B.
arXiv Detail & Related papers (2024-04-02T22:04:51Z)
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception [63.03288425612792]
We propose bfAnyRef, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references. Our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
arXiv Detail & Related papers (2024-03-05T13:45:46Z)
Meta-Task Prompting Elicits Embeddings from Large Language Models [54.757445048329735]
We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation. We generate high-quality sentence embeddings from Large Language Models without the need for model fine-tuning. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
arXiv Detail & Related papers (2024-02-28T16:35:52Z)
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models [58.95889895912716]
We introduce a new benchmark, named as CODIS, designed to assess the ability of models to use context provided in free-form text to enhance visual comprehension. Our findings indicate that MLLMs consistently fall short of human performance on this benchmark. This underscores the pressing need to enhance the ability of MLLMs to comprehend visuals in a context-dependent manner.
arXiv Detail & Related papers (2024-02-21T08:21:12Z)
GIELLM: Japanese General Information Extraction Large Language Model Utilizing Mutual Reinforcement Effect [0.0]
We introduce the General Information Extraction Large Language Model (GIELLM) It integrates text Classification, Sentiment Analysis, Named Entity Recognition, Relation Extraction, and Event Extraction using a uniform input-output schema. This innovation marks the first instance of a model simultaneously handling such a diverse array of IE subtasks.
arXiv Detail & Related papers (2023-11-12T13:30:38Z)
CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models [77.45934004406283]
We systematically study decompounding, the task of splitting compound words into their constituents. We introduce a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We introduce a novel methodology to train dedicated models for decompounding.
arXiv Detail & Related papers (2023-05-23T16:32:27Z)
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing [139.77117915309023]
CRITIC allows large language models to validate and amend their own outputs in a manner similar to human interaction with tools. Comprehensive evaluations involving free-form question answering, mathematical program synthesis, and toxicity reduction demonstrate that CRITIC consistently enhances the performance of LLMs.
arXiv Detail & Related papers (2023-05-19T15:19:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.