Related papers: An efficient approach to represent enterprise web application structure using Large Language Model in the service of Intelligent Quality Engineering

An efficient approach to represent enterprise web application structure using Large Language Model in the service of Intelligent Quality Engineering

URL: http://arxiv.org/abs/2501.06837v1
Date: Sun, 12 Jan 2025 15:10:57 GMT
Title: An efficient approach to represent enterprise web application structure using Large Language Model in the service of Intelligent Quality Engineering
Authors: Zaber Al Hassan Ayon, Gulam Husain, Roshankumar Bisoi, Waliur Rahman, Dr Tom Osborn,
Abstract summary: This paper presents a novel approach to represent enterprise web application structures using Large Language Models (LLMs)<n>We introduce a hierarchical representation methodology that optimize the few-shot learning capabilities of LLMs.<n>Our methodology addresses existing challenges around usage of Generative AI techniques in automated software testing.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a novel approach to represent enterprise web application structures using Large Language Models (LLMs) to enable intelligent quality engineering at scale. We introduce a hierarchical representation methodology that optimizes the few-shot learning capabilities of LLMs while preserving the complex relationships and interactions within web applications. The approach encompasses five key phases: comprehensive DOM analysis, multi-page synthesis, test suite generation, execution, and result analysis. Our methodology addresses existing challenges around usage of Generative AI techniques in automated software testing by developing a structured format that enables LLMs to understand web application architecture through in-context learning. We evaluated our approach using two distinct web applications: an e-commerce platform (Swag Labs) and a healthcare application (MediBox) which is deployed within Atalgo engineering environment. The results demonstrate success rates of 90\% and 70\%, respectively, in achieving automated testing, with high relevance scores for test cases across multiple evaluation criteria. The findings suggest that our representation approach significantly enhances LLMs' ability to generate contextually relevant test cases and provide better quality assurance overall, while reducing the time and effort required for testing.

Related papers

Towards Automated Page Object Generation for Web Testing using Large Language Models [2.451367554740889]
This paper presents an empirical study on the feasibility of using Large Language Models (LLMs) to automatically generate Page Objects (POs) for web testing.<n>Our results show that LLMs can generate syntactically correct and functionally useful POs with accuracy values ranging from 32.6% to 54.0% and element recognition rate exceeding 70% in most cases.
arXiv Detail & Related papers (2026-02-22T18:06:57Z)
Finetuning LLMs for Automatic Form Interaction on Web-Browser in Selenium Testing Framework [4.53273595732354]
This paper introduces a novel method for training large language models (LLMs) to generate high-quality test cases in Selenium.<n>We curate both synthetic and human-annotated datasets for training and evaluation, covering diverse real-world forms and testing scenarios.<n>Our approach significantly outperforms strong baselines, including GPT-4o and other popular LLMs, across all evaluation metrics.
arXiv Detail & Related papers (2025-11-19T06:43:21Z)
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
Using LLMs and Essence to Support Software Practice Adoption [0.3609538870261841]
This study explores the integration of Essence, a standard and thinking framework for managing software engineering practices, with large language models (LLMs)<n>The proposed system consistently outperforms its baseline counterpart in domain-specific tasks.
arXiv Detail & Related papers (2025-08-22T14:59:35Z)
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code [57.45181837786448]
Multimodal Large Language Models (MLLMs) have the potential to act as AI software engineers capable of executing complex web application development.<n>Existing benchmarks usually fail to provide an assessment of sub-capabilities and focus solely on webpage generation outcomes.<n>We propose WebUIBench, a benchmark systematically designed to evaluate MLLMs in four key areas: WebUI Perception, HTML Programming,WebUI-HTML Understanding, and WebUI-to-Code.
arXiv Detail & Related papers (2025-06-09T14:46:02Z)
From Prompts to Templates: A Systematic Prompt Template Analysis for Real-world LLMapps [20.549178260624043]
Large Language Models (LLMs) have revolutionized human-AI interaction by enabling intuitive task execution through natural language prompts. Small variations in structure or wording can result in substantial differences in output. This paper presents a comprehensive analysis of prompt templates in practical LLMapps.
arXiv Detail & Related papers (2025-04-02T18:20:06Z)
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [43.75154489681047]
We propose a novel framework leveraging test-time scaling for Multi-Document Summarization (MDS)<n>Our approach employs prompt ensemble techniques to generate multiple candidate summaries using various prompts, then combines them with an aggregator to produce a refined summary.<n>To evaluate our method effectively, we also introduce two new LLM-based metrics: the Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (LLM-ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding. Existing solutions often rely on task-specific architectures and objectives for individual tasks. In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework [81.29965270493238]
We develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. We introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data.
arXiv Detail & Related papers (2025-01-16T16:19:53Z)
The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents.<n>We conduct the first large-scale, multi-benchmark web agent experiment.<n>Results highlight a large discrepancy between OpenAI and Anthropic's latests models.
arXiv Detail & Related papers (2024-12-06T23:43:59Z)
A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources. We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z)
Automating Pharmacovigilance Evidence Generation: Using Large Language Models to Produce Context-Aware SQL [0.0]
We utilize OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework. Business context document is enriched with a business context document, to transform NLQs into Structured Query Language queries. Performance achieved a maximum of 85% when high complexity queries are excluded.
arXiv Detail & Related papers (2024-06-15T17:07:31Z)
Large Language Models for Automated Web-Form-Test Generation: An Empirical Study [8.32635005234879]
Large Language Models (LLMs) have shown great potential for contextual text generation.<n>No comparative study examining different LLMs has yet been reported for web-form-test generation.<n>We propose three HTML-structure-pruning methods to extract key contextual information.
arXiv Detail & Related papers (2024-05-16T10:21:03Z)
Automating REST API Postman Test Cases Using LLM [0.0]
This research paper is dedicated to the exploration and implementation of an automated approach to generate test cases using Large Language Models. The methodology integrates the use of Open AI to enhance the efficiency and effectiveness of test case generation. The model that is developed during the research is trained using manually collected postman test cases or instances for various Rest APIs.
arXiv Detail & Related papers (2024-04-16T15:53:41Z)
RITFIS: Robust input testing framework for LLMs-based intelligent software [6.439196068684973]
RITFIS is the first framework designed to assess the robustness of intelligent software against natural language inputs. RITFIS adapts 17 automated testing methods, originally designed for Deep Neural Network (DNN)-based intelligent software. It demonstrates the effectiveness of RITFIS in evaluating LLM-based intelligent software through empirical validation.
arXiv Detail & Related papers (2024-02-21T04:00:54Z)
A Case Study on Test Case Construction with Large Language Models: Unveiling Practical Insights and Challenges [2.7029792239733914]
This paper examines the application of Large Language Models in the construction of test cases within the context of software engineering. Through a blend of qualitative and quantitative analyses, this study assesses the impact of LLMs on test case comprehensiveness, accuracy, and efficiency.
arXiv Detail & Related papers (2023-12-19T20:59:02Z)
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark. Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs. We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability. We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.