Fine-grainedly Synthesize Streaming Data Based On Large Language Models
With Graph Structure Understanding For Data Sparsity
- URL: http://arxiv.org/abs/2403.06139v1
- Date: Sun, 10 Mar 2024 08:59:04 GMT
- Title: Fine-grainedly Synthesize Streaming Data Based On Large Language Models
With Graph Structure Understanding For Data Sparsity
- Authors: Xin Zhang, Linhai Zhang, Deyu Zhou, Guoqiang Xu
- Abstract summary: Due to the sparsity of user data, sentiment analysis on user reviews in e-commerce platforms often suffers from poor performance.
We propose a fine-grained streaming data synthesis framework that categorizes sparse users into three categories: Mid-tail, Long-tail, and Extreme.
- Score: 24.995442293434643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Due to the sparsity of user data, sentiment analysis on user reviews in
e-commerce platforms often suffers from poor performance, especially when faced
with extremely sparse user data or long-tail labels. Recently, the emergence of
LLMs has introduced new solutions to such problems by leveraging graph
structures to generate supplementary user profiles. However, previous
approaches have not fully utilized the graph understanding capabilities of LLMs
and have struggled to adapt to complex streaming data environments. In this
work, we propose a fine-grained streaming data synthesis framework that
categorizes sparse users into three categories: Mid-tail, Long-tail, and
Extreme. Specifically, we design LLMs to comprehensively understand three key
graph elements in streaming data, including Local-global Graph Understanding,
Second-Order Relationship Extraction, and Product Attribute Understanding,
which enables the generation of high-quality synthetic data to effectively
address sparsity across different categories. Experimental results on three
real datasets demonstrate significant performance improvements, with
synthesized data contributing to MSE reductions of 45.85%, 3.16%, and 62.21%,
respectively.
Related papers
- DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models [38.59653405736706]
We introduce DiffLM, a controllable data synthesis framework based on variational autoencoder (VAE)
We show that DiffLM generates high-quality data, with performance on downstream tasks surpassing that of real data by 2-7 percent in certain cases.
arXiv Detail & Related papers (2024-11-05T16:47:53Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Let's Ask GNN: Empowering Large Language Model for Graph In-Context Learning [28.660326096652437]
We introduce AskGNN, a novel approach that bridges the gap between sequential text processing and graph-structured data.
AskGNN employs a Graph Neural Network (GNN)-powered structure-enhanced retriever to select labeled nodes across graphs.
Experiments across three tasks and seven LLMs demonstrate AskGNN's superior effectiveness in graph task performance.
arXiv Detail & Related papers (2024-10-09T17:19:12Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - DynLLM: When Large Language Models Meet Dynamic Graph Recommendation [43.05028974086236]
We propose a novel framework, called DynLLM, to deal with the dynamic graph recommendation task with Large Language Models.
Specifically, DynLLM harnesses the power of LLMs to generate multi-faceted user profiles based on the rich textual features of historical purchase records.
Experiments on two real e-commerce datasets have validated the superior improvements of DynLLM over a wide range of state-of-the-art baseline methods.
arXiv Detail & Related papers (2024-05-13T09:36:17Z) - Addressing Shortcomings in Fair Graph Learning Datasets: Towards a New Benchmark [26.233696733521757]
We develop and introduce a collection of synthetic, semi-synthetic, and real-world datasets that fulfill a broad spectrum of requirements.
These datasets are thoughtfully designed to include relevant graph structures and bias information crucial for the fair evaluation of models.
Our extensive experimental results with fair graph learning methods across our datasets demonstrate their effectiveness in benchmarking the performance of these methods.
arXiv Detail & Related papers (2024-03-09T21:33:26Z) - LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data.
LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks.
Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Efficient and Scalable Recommendation via Item-Item Graph Partitioning [10.390315462253726]
Collaborative filtering (CF) is a widely searched problem in recommender systems.
We propose an efficient and scalable recommendation via item-item graph partitioning (ERGP)
arXiv Detail & Related papers (2022-07-13T04:37:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.