Related papers: Exploring Data Management Challenges and Solutions in Agile Software Development: A Literature Review and Practitioner Survey

Exploring Data Management Challenges and Solutions in Agile Software Development: A Literature Review and Practitioner Survey

URL: http://arxiv.org/abs/2402.00462v2
Date: Fri, 12 Jul 2024 15:33:59 GMT
Title: Exploring Data Management Challenges and Solutions in Agile Software Development: A Literature Review and Practitioner Survey
Authors: Ahmed Fawzy, Amjed Tahir, Matthias Galster, Peng Liang,
Abstract summary: Managing data related to a software product and its development poses significant challenges for software projects and agile development teams. Challenges include integrating data from diverse sources and ensuring data quality in light of continuous change and adaptation.
Score: 4.45543024542181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Managing data related to a software product and its development poses significant challenges for software projects and agile development teams. Challenges include integrating data from diverse sources and ensuring data quality in light of continuous change and adaptation. To this end, we aimed to systematically explore data management challenges and potential solutions in agile projects. We employed a mixed-methods approach, utilizing a systematic literature review (SLR) to understand the state-of-research followed by a survey with practitioners to reflect on the state-of-practice. In the SLR, we reviewed 45 studies in which we identified and categorized data management aspects and the associated challenges and solutions. In the practitioner survey, we captured practical experiences and solutions from 32 industry experts to complement the findings from the SLR. Our findings reveal major data management challenges reported in both the SLR and practitioner survey, such as managing data integration processes, capturing diverse data, automating data collection, and meeting real-time analysis requirements. Based on our findings, we present implications for practitioners and researchers, which include the necessity of developing clear data management policies, training on data management tools, and adopting new data management strategies that enhance agility, improve product quality, and facilitate better project outcomes.

Related papers

Data Science and Technology Towards AGI Part I: Tiered Data Management [53.64581824953229]
We argue that the development of artificial intelligence is entering a new phase of data-model co-evolution.<n>We introduce an L0-L4 tiered data management framework, ranging from raw uncurated resources to organized and verifiable knowledge.<n>We validate the effectiveness of the proposed framework through empirical studies.
arXiv Detail & Related papers (2026-02-09T18:47:51Z)
Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey [59.3507264893654]
Issue resolution is a complex Software Engineering task integral to real-world development.<n> benchmarks like SWE-bench revealed this task as profoundly difficult for large language models.<n>This paper presents a systematic survey of this emerging domain.
arXiv Detail & Related papers (2026-01-15T18:55:03Z)
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [53.736407871322314]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z)
DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation [59.79833777420334]
Large language model (LLM) agents have shown promising performance in generating code for solving complex data science problems.<n>We develop a novel inference-time optimization framework, referred to as DSMentor, to enhance LLM agent performance.<n>Our work underscores the importance of developing effective strategies for accumulating and utilizing knowledge during inference.
arXiv Detail & Related papers (2025-05-20T10:16:21Z)
LLM-Powered Knowledge Graphs for Enterprise Intelligence and Analytics [4.968761545765129]
This paper introduces a framework that uses large language models (LLMs) to unify various data sources into a comprehensive, activity-centric knowledge graph. The framework automates tasks such as entity extraction, relationship inference, and semantic enrichment. It supports applications such as contextual search, task prioritization, expertise discovery, personalized recommendations, and advanced analytics.
arXiv Detail & Related papers (2025-03-11T02:50:45Z)
A Comprehensive Survey on Imbalanced Data Learning [56.65067795190842]
imbalanced data is prevalent in various types of raw data and hinders the performance of machine learning.<n>This survey systematically analyzes various real-world data formats.<n>It concludes existing researches for different data formats into four categories: data re-balancing, feature representation, training strategy, and ensemble learning.
arXiv Detail & Related papers (2025-02-13T04:53:17Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
Sustainable Digitalization of Business with Multi-Agent RAG and LLM [1.6385815610837167]
This research aims to explore the integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) We propose a sustainable business solution using pre-existing LLMs that can work with diverse datasets.
arXiv Detail & Related papers (2025-01-06T08:14:23Z)
Deep Learning, Machine Learning, Advancing Big Data Analytics and Management [26.911181864764117]
Advances in artificial intelligence, machine learning, and deep learning have catalyzed the transformation of big data analytics and management. This work explores the theoretical foundations, methodological advancements, and practical implementations of these technologies. It equips researchers, practitioners, and data enthusiasts with the tools to navigate the complexities of modern data analytics.
arXiv Detail & Related papers (2024-12-03T05:59:34Z)
Deploying Large Language Models With Retrieval Augmented Generation [0.21485350418225244]
Retrieval Augmented Generation has emerged as a key approach for integrating knowledge from data sources outside of the large language model's training set. We present insights from the development and field-testing of a pilot project that integrates LLMs with RAG for information retrieval.
arXiv Detail & Related papers (2024-11-07T22:11:51Z)
A Systematic Review of NeurIPS Dataset Management Practices [7.974245534539289]
We present a systematic review of datasets published at the NeurIPS track, focusing on four key aspects: provenance, distribution, ethical disclosure, and licensing. Our findings reveal that dataset provenance is often unclear due to ambiguous filtering and curation processes. These inconsistencies underscore the urgent need for standardized data infrastructures for the publication and management of datasets.
arXiv Detail & Related papers (2024-10-31T23:55:41Z)
Data Analysis in the Era of Generative AI [56.44807642944589]
This paper explores the potential of AI-powered tools to reshape data analysis, focusing on design considerations and challenges. We explore how the emergence of large language and multimodal models offers new opportunities to enhance various stages of data analysis workflow. We then examine human-centered design principles that facilitate intuitive interactions, build user trust, and streamline the AI-assisted analysis workflow across multiple apps.
arXiv Detail & Related papers (2024-09-27T06:31:03Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review [0.29998889086656577]
This study proposes an AIOps terminology and taxonomy, establishing a structured incident management procedure and providing guidelines for constructing an AIOps framework. The goal is to provide a comprehensive review of technical and research aspects in AIOps for incident management, aiming to structure knowledge, identify gaps, and establish a foundation for future developments in the field.
arXiv Detail & Related papers (2024-04-01T17:32:22Z)
An Empirical Study of Challenges in Machine Learning Asset Management [15.07444988262748]
Despite existing research, a significant knowledge gap remains in operational challenges like model versioning, data traceability, and collaboration. Our study aims to address this gap by analyzing 15,065 posts from developer forums and platforms. We uncover 133 topics related to asset management challenges, grouped into 16 macro-topics, with software dependency, model deployment, and model training being the most discussed.
arXiv Detail & Related papers (2024-02-25T05:05:52Z)
Data Management For Training Large Language Models: A Survey [64.18200694790787]
Data plays a fundamental role in training Large Language Models (LLMs) This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs.
arXiv Detail & Related papers (2023-12-04T07:42:16Z)
Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets. We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers. Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z)
Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z)
A Field Guide to Federated Optimization [161.3779046812383]
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data. This paper provides recommendations and guidelines on formulating, designing, evaluating and analyzing federated optimization algorithms.
arXiv Detail & Related papers (2021-07-14T18:09:08Z)
Scaling up Search Engine Audits: Practical Insights for Algorithm Auditing [68.8204255655161]
We set up experiments for eight search engines with hundreds of virtual agents placed in different regions. We demonstrate the successful performance of our research infrastructure across multiple data collections. We conclude that virtual agents are a promising venue for monitoring the performance of algorithms across long periods of time.
arXiv Detail & Related papers (2021-06-10T15:49:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.