Multilingual Crowd-Based Requirements Engineering Using Large Language Models
- URL: http://arxiv.org/abs/2408.06505v1
- Date: Mon, 12 Aug 2024 21:40:39 GMT
- Title: Multilingual Crowd-Based Requirements Engineering Using Large Language Models
- Authors: Arthur Pilone, Paulo Meirelles, Fabio Kon, Walid Maalej,
- Abstract summary: We present an LLM-powered approach that helps agile teams use crowd-based requirements engineering (CrowdRE) in their issue and task management.
We are currently implementing a command-line tool that enables developers to match issues with relevant user reviews.
- Score: 9.93427497289912
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A central challenge for ensuring the success of software projects is to assure the convergence of developers' and users' views. While the availability of large amounts of user data from social media, app store reviews, and support channels bears many benefits, it still remains unclear how software development teams can effectively use this data. We present an LLM-powered approach called DeeperMatcher that helps agile teams use crowd-based requirements engineering (CrowdRE) in their issue and task management. We are currently implementing a command-line tool that enables developers to match issues with relevant user reviews. We validated our approach on an existing English dataset from a well-known open-source project. Additionally, to check how well DeeperMatcher works for other languages, we conducted a single-case mechanism experiment alongside developers of a local project that has issues and user feedback in Brazilian Portuguese. Our preliminary analysis indicates that the accuracy of our approach is highly dependent on the text embedding method used. We discuss further refinements needed for reliable crowd-based requirements engineering with multilingual support.
Related papers
- The AI Language Proficiency Monitor -- Tracking the Progress of LLMs on Multilingual Benchmarks [0.0]
We introduce the AI Language Monitor, a comprehensive benchmark that assesses large language models (LLMs) performance across up to 200 languages.<n>Our benchmark aggregates diverse tasks including translation, question answering, math, and reasoning, using datasets such as FLORES+, MMLU, GSM8K, TruthfulQA, and ARC.<n>We provide an open-source, auto-updating leaderboard and dashboard that supports researchers, developers, and policymakers in identifying strengths and gaps in model performance.
arXiv Detail & Related papers (2025-07-11T12:38:02Z) - What Challenges Do Developers Face When Using Verification-Aware Programming Languages? [45.44831696628473]
In software development, increasing software reliability often involves testing.<n>For complex and critical systems, developers can use Design by Contract (DbC) methods to define precise specifications that software components must satisfy.<n> Verification-Aware (VA) programming languages support DbC and formal verification at compile-time or run-time, offering stronger correctness guarantees than traditional testing.
arXiv Detail & Related papers (2025-06-30T10:17:39Z) - Language Models in Software Development Tasks: An Experimental Analysis of Energy and Accuracy [40.793232371852795]
We investigate the trade-off between model accuracy and energy consumption when deploying language models locally.
Our findings reveal that employing a big LLM with a higher energy budget does not always translate to significantly improved accuracy.
quantized versions of large models generally offer better efficiency and accuracy compared to full-precision versions of medium-sized ones.
arXiv Detail & Related papers (2024-11-30T03:02:50Z) - Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application [54.984348122105516]
Large Language Models (LLMs) pretrained on massive text corpus presents a promising avenue for enhancing recommender systems.
We propose an Llm-driven knowlEdge Adaptive RecommeNdation (LEARN) framework that synergizes open-world knowledge with collaborative knowledge.
arXiv Detail & Related papers (2024-05-07T04:00:30Z) - Effort and Size Estimation in Software Projects with Large Language Model-based Intelligent Interfaces [0.4043859792291222]
We propose a new way to enhance specifications of natural language-based questions that allows for the estimation of development effort.
We provide a comparison against traditional methods and propose a new way to enhance specifications of natural language-based questions.
arXiv Detail & Related papers (2024-02-11T11:03:08Z) - DIALIGHT: Lightweight Multilingual Development and Evaluation of
Task-Oriented Dialogue Systems with Large Language Models [76.79929883963275]
DIALIGHT is a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems.
It features a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level.
Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses.
arXiv Detail & Related papers (2024-01-04T11:27:48Z) - ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate.
These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z) - XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented
Languages [105.54207724678767]
Data scarcity is a crucial issue for the development of highly multilingual NLP systems.
We propose XTREME-UP, a benchmark defined by its focus on the scarce-data scenario rather than zero-shot.
XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies.
arXiv Detail & Related papers (2023-05-19T18:00:03Z) - Study of Encoder-Decoder Architectures for Code-Mix Search Query
Translation [0.0]
Many of the queries we receive are code-mix, specifically Hinglish i.e. queries with one or more Hindi words written in English (Latin) script.
We propose a transformer-based approach for code-mix query translation to enable users to search with these queries.
The model is currently live on app and website, serving millions of queries.
arXiv Detail & Related papers (2022-08-07T12:59:50Z) - Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining
for Task-Oriented Dialog [67.20796950016735]
Multi2WOZ dataset spans four typologically diverse languages: Chinese, German, Arabic, and Russian.
We introduce a new framework for multilingual conversational specialization of pretrained language models (PrLMs) that aims to facilitate cross-lingual transfer for arbitrary downstream TOD tasks.
Our experiments show that, in most setups, the best performance entails the combination of (I) conversational specialization in the target language and (ii) few-shot transfer for the concrete TOD task.
arXiv Detail & Related papers (2022-05-20T18:35:38Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Towards More Equitable Question Answering Systems: How Much More Data Do
You Need? [15.401330338654203]
We take a step back and study which approaches allow us to take the most advantage of existing resources in order to produce QA systems in many languages.
Specifically, we perform extensive analysis to measure the efficacy of few-shot approaches augmented with automatic translations and permutations of context-question-answer pairs.
We make suggestions for future dataset development efforts that make better use of a fixed annotation budget, with a goal of increasing the language coverage of QA datasets and systems.
arXiv Detail & Related papers (2021-05-28T21:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.