A Unified Framework for Generative Data Augmentation: A Comprehensive Survey
- URL: http://arxiv.org/abs/2310.00277v2
- Date: Sun, 21 Apr 2024 08:45:41 GMT
- Title: A Unified Framework for Generative Data Augmentation: A Comprehensive Survey
- Authors: Yunhao Chen, Zihui Yan, Yunjie Zhu,
- Abstract summary: Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications.
This thesis presents a comprehensive survey and unified framework of the GDA landscape.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative data augmentation (GDA) has emerged as a promising technique to alleviate data scarcity in machine learning applications. This thesis presents a comprehensive survey and unified framework of the GDA landscape. We first provide an overview of GDA, discussing its motivation, taxonomy, and key distinctions from synthetic data generation. We then systematically analyze the critical aspects of GDA - selection of generative models, techniques to utilize them, data selection methodologies, validation approaches, and diverse applications. Our proposed unified framework categorizes the extensive GDA literature, revealing gaps such as the lack of universal benchmarks. The thesis summarises promising research directions, including , effective data selection, theoretical development for large-scale models' application in GDA and establishing a benchmark for GDA. By laying a structured foundation, this thesis aims to nurture more cohesive development and accelerate progress in the vital arena of generative data augmentation.
Related papers
- A Survey on Data Synthesis and Augmentation for Large Language Models [35.59526251210408]
This paper reviews and summarizes data generation techniques throughout the lifecycle of Large Language Models.
We discuss the current constraints faced by these methods and investigate potential pathways for future development and research.
arXiv Detail & Related papers (2024-10-16T16:12:39Z) - A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions [0.0]
RAG combines retrieval mechanisms with generative language models to enhance the accuracy of outputs.
Recent research breakthroughs are discussed, highlighting novel methods for improving retrieval efficiency.
Future research directions are proposed, focusing on improving the robustness of RAG models.
arXiv Detail & Related papers (2024-10-03T22:29:47Z) - Deep Graph Anomaly Detection: A Survey and New Perspectives [86.84201183954016]
Graph anomaly detection (GAD) aims to identify unusual graph instances (nodes, edges, subgraphs, or graphs)
Deep learning approaches, graph neural networks (GNNs) in particular, have been emerging as a promising paradigm for GAD.
arXiv Detail & Related papers (2024-09-16T03:05:11Z) - Wiping out the limitations of Large Language Models -- A Taxonomy for Retrieval Augmented Generation [0.46498278084317696]
This research aims to create a taxonomy to conceptualize a comprehensive overview of Retrieval-Augmented Generation (RAG) applications.
To the best of our knowledge, no RAG application have been developed so far.
arXiv Detail & Related papers (2024-08-05T22:34:28Z) - A Survey on Retrieval-Augmented Text Generation for Large Language Models [1.4579344926652844]
Retrieval-Augmented Generation (RAG) merges retrieval methods with deep learning advancements.
This paper organizes the RAG paradigm into four categories: pre-retrieval, retrieval, post-retrieval, and generation.
It outlines RAG's evolution and discusses the field's progression through the analysis of significant studies.
arXiv Detail & Related papers (2024-04-17T01:27:42Z) - Data-Centric Long-Tailed Image Recognition [49.90107582624604]
Long-tail models exhibit a strong demand for high-quality data.
Data-centric approaches aim to enhance both the quantity and quality of data to improve model performance.
There is currently a lack of research into the underlying mechanisms explaining the effectiveness of information augmentation.
arXiv Detail & Related papers (2023-11-03T06:34:37Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - A Comprehensive Survey on Source-free Domain Adaptation [69.17622123344327]
The research of Source-Free Domain Adaptation (SFDA) has drawn growing attention in recent years.
We provide a comprehensive survey of recent advances in SFDA and organize them into a unified categorization scheme.
We compare the results of more than 30 representative SFDA methods on three popular classification benchmarks.
arXiv Detail & Related papers (2023-02-23T06:32:09Z) - A Survey on Heterogeneous Graph Embedding: Methods, Techniques,
Applications and Sources [79.48829365560788]
Heterogeneous graphs (HGs) also known as heterogeneous information networks have become ubiquitous in real-world scenarios.
HG embedding aims to learn representations in a lower-dimension space while preserving the heterogeneous structures and semantics for downstream tasks.
arXiv Detail & Related papers (2020-11-30T15:03:47Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.