Related papers: Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey

Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey

URL: http://arxiv.org/abs/2402.17944v4
Date: Fri, 21 Jun 2024 19:59:54 GMT
Title: Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey
Authors: Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, Christos Faloutsos,
Abstract summary: There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
Score: 17.19337964440007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table understanding. Each task presents unique challenges and opportunities. However, there is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain. This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized. It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field. It also provides relevant code and datasets references. Through this comprehensive review, we hope to provide interested readers with pertinent references and insightful perspectives, empowering them with the necessary tools and knowledge to effectively navigate and address the prevailing challenges in the field.

Related papers

Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation [37.43210238341124]
This survey examines the key aspects of data-centric AI, emphasizing feature selection and feature generation as essential techniques for data space refinement. We provide a systematic review of feature selection methods, which identify and retain the most relevant data attributes, and feature generation approaches, which create new features to simplify the capture of complex data patterns.
arXiv Detail & Related papers (2025-01-17T21:05:09Z)
Abstractive Text Summarization: State of the Art, Challenges, and Improvements [6.349503549199403]
This review takes a comprehensive approach encompassing state-of-the-art methods, challenges, solutions, comparisons, limitations and charts out future improvements. The paper highlights challenges such as inadequate meaning representation, factual consistency, controllable text summarization, cross-lingual summarization, and evaluation metrics.
arXiv Detail & Related papers (2024-09-04T03:39:23Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation. Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z)
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Large foundation models, such as large language models, have revolutionized various natural language processing tasks. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z)
A Systematic Review of Data-to-Text NLG [2.4769539696439677]
Methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation. Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages.
arXiv Detail & Related papers (2024-02-13T14:51:45Z)
Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data. We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z)
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks. We also develop five innovative and effective annotation methods. We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z)
Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications [41.24492058141363]
Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. We propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications.
arXiv Detail & Related papers (2023-11-10T05:24:04Z)
Deep Learning Schema-based Event Extraction: Literature Review and Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot. This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z)
Data and its (dis)contents: A survey of dataset development and use in machine learning research [11.042648980854487]
We survey the many concerns raised about the way we collect and use data in machine learning. We advocate that a more cautious and thorough understanding of data is necessary to address several of the practical and ethical issues of the field.
arXiv Detail & Related papers (2020-12-09T22:13:13Z)
Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models [6.998536937701312]
Recent years have seen a growing number of publications that analyse Natural Language Inference (NLI) datasets for superficial cues. This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets.
arXiv Detail & Related papers (2020-05-29T17:55:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.