Flowco: Rethinking Data Analysis in the Age of LLMs
- URL: http://arxiv.org/abs/2504.14038v1
- Date: Fri, 18 Apr 2025 19:01:27 GMT
- Title: Flowco: Rethinking Data Analysis in the Age of LLMs
- Authors: Stephen N. Freund, Brooke Simon, Emery D. Berger, Eunice Jun,
- Abstract summary: Large language models (LLMs) are now capable of generating such code for simple, routine analyses.<n>LLMs promise to democratize data science by enabling those with limited programming expertise to conduct data analyses.<n>Analysts in many real-world settings must often exercise fine-grained control over specific analysis steps.<n>This paper introduces Flowco, a new mixed-initiative system to address these challenges.
- Score: 2.1874189959020427
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conducting data analysis typically involves authoring code to transform, visualize, analyze, and interpret data. Large language models (LLMs) are now capable of generating such code for simple, routine analyses. LLMs promise to democratize data science by enabling those with limited programming expertise to conduct data analyses, including in scientific research, business, and policymaking. However, analysts in many real-world settings must often exercise fine-grained control over specific analysis steps, verify intermediate results explicitly, and iteratively refine their analytical approaches. Such tasks present barriers to building robust and reproducible analyses using LLMs alone or even in conjunction with existing authoring tools (e.g., computational notebooks). This paper introduces Flowco, a new mixed-initiative system to address these challenges. Flowco leverages a visual dataflow programming model and integrates LLMs into every phase of the authoring process. A user study suggests that Flowco supports analysts, particularly those with less programming experience, in quickly authoring, debugging, and refining data analyses.
Related papers
- DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify [11.10351765834947]
Large Language Models (LLMs) are transforming data analytics, but their widespread adoption is hindered by two critical limitations.<n>They are not explainable (opaque reasoning processes) and not verifiable (prone to hallucinations and unchecked errors)<n>We propose DataMosaic, a framework designed to make LLM-powered analytics both explainable and verifiable.
arXiv Detail & Related papers (2025-04-14T09:38:23Z) - SoK: LLM-based Log Parsing [2.2779174914142346]
This paper systematically reviews 29 large language models (LLMs)-based log parsing methods.<n>We analyze the learning and prompt-engineering paradigms employed, efficiency- and effectiveness-enhancing techniques, and the role of LLMs in the parsing process.
arXiv Detail & Related papers (2025-04-07T09:41:04Z) - Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets [3.8740749765622167]
Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis.<n>This paper explores the role of LLMs for different code analysis tasks, focusing on three key aspects.
arXiv Detail & Related papers (2025-03-21T19:29:50Z) - Learning on LLM Output Signatures for gray-box LLM Behavior Analysis [52.81120759532526]
Large Language Models (LLMs) have achieved widespread adoption, yet our understanding of their behavior remains limited.
We develop a transformer-based approach to process that theoretically guarantees approximation of existing techniques.
Our approach achieves superior performance on hallucination and data contamination detection in gray-box settings.
arXiv Detail & Related papers (2025-03-18T09:04:37Z) - Can LLMs Help Uncover Insights about LLMs? A Large-Scale, Evolving Literature Analysis of Frontier LLMs [32.48924329288906]
This study presents a semi-automated approach for literature analysis that accelerates data extraction using LLMs.<n>It automatically identifies relevant arXiv papers, extracts experimental results and related attributes, and organizes them into a structured dataset, LLMEvalDB.<n>We then conduct an automated literature analysis of frontier LLMs, reducing the effort of paper surveying and data extraction by more than 93% compared to manual approaches.
arXiv Detail & Related papers (2025-02-26T03:56:34Z) - Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges.
We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow.
We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z) - The Emergence of Large Language Models in Static Analysis: A First Look
through Micro-Benchmarks [3.848607479075651]
We investigate the role that current Large Language Models (LLMs) can play in improving callgraph analysis and type inference for Python programs.
Our study reveals that LLMs show promising results in type inference, demonstrating higher accuracy than traditional methods, yet they exhibit limitations in callgraph analysis.
arXiv Detail & Related papers (2024-02-27T16:53:53Z) - LLM Inference Unveiled: Survey and Roofline Model Insights [62.92811060490876]
Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges.
Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model.
This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems.
arXiv Detail & Related papers (2024-02-26T07:33:05Z) - Large Language Models for Data Annotation and Synthesis: A Survey [49.8318827245266]
This survey focuses on the utility of Large Language Models for data annotation and synthesis.
It includes an in-depth taxonomy of data types that LLMs can annotate, a review of learning strategies for models utilizing LLM-generated annotations, and a detailed discussion of the primary challenges and limitations associated with using LLMs for data annotation and synthesis.
arXiv Detail & Related papers (2024-02-21T00:44:04Z) - Sentiment Analysis in the Era of Large Language Models: A Reality Check [69.97942065617664]
This paper investigates the capabilities of large language models (LLMs) in performing various sentiment analysis tasks.
We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets.
arXiv Detail & Related papers (2023-05-24T10:45:25Z) - Demonstration of InsightPilot: An LLM-Empowered Automated Data
Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process.
InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining.
In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.