Related papers: Bridging Subjective and Objective QoE: Operator-Level Aggregation Using LLM-Based Comment Analysis and Network MOS Comparison

Bridging Subjective and Objective QoE: Operator-Level Aggregation Using LLM-Based Comment Analysis and Network MOS Comparison

URL: http://arxiv.org/abs/2506.00924v2
Date: Mon, 30 Jun 2025 05:21:31 GMT
Title: Bridging Subjective and Objective QoE: Operator-Level Aggregation Using LLM-Based Comment Analysis and Network MOS Comparison
Authors: Parsa Hassani Shariat Panahi, Amir Hossein Jalilvand, M. Hassan Najafi,
Abstract summary: This paper introduces a dual-layer framework for network operator-side quality of experience (QoE) assessment.<n>On the objective side, we develop a machine learning model trained on mean opinion scores (MOS) computed via the ITU-T P.1203 reference implementation.<n>On the subjective side, we present a semantic filtering and scoring pipeline that processes user comments from live streams to extract performance-related feedback.
Score: 1.5817271400571666
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces a dual-layer framework for network operator-side quality of experience (QoE) assessment that integrates both objective network modeling and subjective user perception extracted from live-streaming platforms. On the objective side, we develop a machine learning model trained on mean opinion scores (MOS) computed via the ITU-T P.1203 reference implementation, allowing accurate prediction of user-perceived video quality using only network parameters such as packet loss, delay, jitter, and throughput without reliance on video content or client-side instrumentation. On the subjective side, we present a semantic filtering and scoring pipeline that processes user comments from live streams to extract performance-related feedback. A large language model is used to assign scalar MOS scores to filtered comments in a deterministic and reproducible manner. To support scalable and interpretable analysis, we construct a labeled dataset of 47,894 live-stream comments, of which about 34,000 are identified as QoE-relevant through multi-layer semantic filtering. Each comment is enriched with simulated Internet Service Provider attribution and temporally aligned using synthetic timestamps in 5-min intervals. The resulting dataset enables operator-level aggregation and time-series analysis of user-perceived quality. A delta MOS metric is proposed to measure each Internet service provider's deviation from platform-wide sentiment, allowing detection of localized degradations even in the absence of direct network telemetry. A controlled outage simulation confirms the framework's effectiveness in identifying service disruptions through comment-based trends alone. The system provides each operator with its own subjective MOS and the global platform average per interval, enabling real-time interpretation of performance deviations and comparison with objective network-based QoE estimates.

Related papers

ReGAIN: Retrieval-Grounded AI Framework for Network Traffic Analysis [5.887997322139195]
ReGAIN is a framework that combines traffic summarization, retrieval-augmented generation (RAG), and Large Language Model (LLM) reasoning for transparent and accurate network traffic analysis.<n> evaluated on ICMP ping flood and TCP SYN flood traces from the real-world traffic dataset.
arXiv Detail & Related papers (2025-12-23T00:16:14Z)
MAWIFlow Benchmark: Realistic Flow-Based Evaluation for Network Intrusion Detection [47.86433139298671]
This paper introduces MAWIFlow, a flow-based benchmark derived from the MAWILAB v1.1 dataset.<n>The resulting datasets comprise temporally distinct samples from January 2011, 2016, and 2021, drawn from trans-Pacific backbone traffic.<n>Traditional machine learning methods, including Decision Trees, Random Forests, XGBoost, and Logistic Regression, are compared to a deep learning model based on a CNN-BiLSTM architecture.
arXiv Detail & Related papers (2025-06-20T14:51:35Z)
Debugging and Runtime Analysis of Neural Networks with VLMs (A Case Study) [20.420310876464924]
We show the utility of semantic heatmaps for fault localization in vision models.<n>We propose a lightweight runtime analysis to detect and filter-out defects at runtime.
arXiv Detail & Related papers (2025-03-21T01:12:57Z)
Joint-Optimized Unsupervised Adversarial Domain Adaptation in Remote Sensing Segmentation with Prompted Foundation Model [32.03242732902217]
This paper addresses the challenge of adapting a model trained on source domain data to target domain samples. We propose a joint-optimized adversarial network incorporating the "Segment Anything Model (SAM) (SAM-JOANet)"
arXiv Detail & Related papers (2024-11-08T02:15:20Z)
Satellite Streaming Video QoE Prediction: A Real-World Subjective Database and Network-Level Prediction Models [59.061552498630874]
We introduce the LIVE-Viasat Real-World Satellite QoE Database. This database consists of 179 videos recorded from real-world streaming services affected by various authentic distortion patterns. We demonstrate the usefulness of this unique new resource by evaluating the efficacy of QoE-prediction models on it. We also created a new model that maps the network parameters to predicted human perception scores, which can be used by ISPs to optimize the video streaming quality of their networks.
arXiv Detail & Related papers (2024-10-17T18:22:50Z)
End-to-end Evaluation of Practical Video Analytics Systems for Face Detection and Recognition [9.942007083253479]
Video analytics systems are deployed in bandwidth constrained environments like autonomous vehicles. In an end-to-end face analytics system, inputs are first compressed using popular video codecs like HEVC. We demonstrate how independent task evaluations, dataset imbalances, and inconsistent annotations can lead to incorrect system performance estimates.
arXiv Detail & Related papers (2023-10-10T19:06:10Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Concurrent Misclassification and Out-of-Distribution Detection for Semantic Segmentation via Energy-Based Normalizing Flow [0.0]
Recent semantic segmentation models accurately classify test-time examples that are similar to a training dataset distribution. We propose a generative model for concurrent in-distribution misclassification (IDM) and OOD detection that relies on a normalizing flow framework. FlowEneDet achieves promising results on Cityscapes, Cityscapes-C, FishyScapes and SegmentMeIfYouCan benchmarks in IDM/OOD detection when applied to pretrained DeepLabV3+ and SegFormer semantic segmentation models.
arXiv Detail & Related papers (2023-05-16T17:02:57Z)
Revisiting the Evaluation of Image Synthesis with GANs [55.72247435112475]
This study presents an empirical investigation into the evaluation of synthesis performance, with generative adversarial networks (GANs) as a representative of generative models. In particular, we make in-depth analyses of various factors, including how to represent a data point in the representation space, how to calculate a fair distance using selected samples, and how many instances to use from each set.
arXiv Detail & Related papers (2023-04-04T17:54:32Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features [31.59528815233441]
We propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously.<n> Experimental results show that MOSA-Net can improve the linear correlation coefficient (LCC) by 0.026 (0.990 vs 0.964 in seen noise environments) and 0.012 (0.969 vs 0.957 in unseen noise environments) in perceptual evaluation of speech quality (PESQ) prediction.
arXiv Detail & Related papers (2021-11-03T17:30:43Z)
LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction. We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.