Related papers: Diagnosing Robotics Systems Issues with Large Language Models

Diagnosing Robotics Systems Issues with Large Language Models

URL: http://arxiv.org/abs/2410.09084v1
Date: Sun, 6 Oct 2024 11:58:12 GMT
Title: Diagnosing Robotics Systems Issues with Large Language Models
Authors: Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Müller,
Abstract summary: Large language models (LLMs) excel at analyzing large amounts of data. Here, we extend this work to the challenging and largely unexplored domain of robotics systems.
Score: 5.30112395683561
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quickly resolving issues reported in industrial applications is crucial to minimize economic impact. However, the required data analysis makes diagnosing the underlying root causes a challenging and time-consuming task, even for experts. In contrast, large language models (LLMs) excel at analyzing large amounts of data. Indeed, prior work in AI-Ops demonstrates their effectiveness in analyzing IT systems. Here, we extend this work to the challenging and largely unexplored domain of robotics systems. To this end, we create SYSDIAGBENCH, a proprietary system diagnostics benchmark for robotics, containing over 2500 reported issues. We leverage SYSDIAGBENCH to investigate the performance of LLMs for root cause analysis, considering a range of model sizes and adaptation techniques. Our results show that QLoRA finetuning can be sufficient to let a 7B-parameter model outperform GPT-4 in terms of diagnostic accuracy while being significantly more cost-effective. We validate our LLM-as-a-judge results with a human expert study and find that our best model achieves similar approval ratings as our reference labels.

Related papers

VACT: A Video Automatic Causal Testing System and a Benchmark [55.53300306960048]
VACT is an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. We introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs.
arXiv Detail & Related papers (2025-03-08T10:54:42Z)
Can Large Language Models Help Experimental Design for Causal Discovery? [94.66802142727883]
Large Language Model Guided Intervention Targeting (LeGIT) is a robust framework that effectively incorporates LLMs to augment existing numerical approaches for the intervention targeting in causal discovery. LeGIT demonstrates significant improvements and robustness over existing methods and even surpasses humans.
arXiv Detail & Related papers (2025-03-03T03:43:05Z)
Are Large Language Models Useful for Time Series Data Analysis? [3.44393516559102]
Time series data plays a critical role across diverse domains such as healthcare, energy, and finance. This study investigates whether large language models (LLMs) are effective for time series data analysis.
arXiv Detail & Related papers (2024-12-16T02:47:44Z)
AD-LLM: Benchmarking Large Language Models for Anomaly Detection [50.57641458208208]
This paper introduces AD-LLM, the first benchmark that evaluates how large language models can help with anomaly detection. We examine three key tasks: zero-shot detection, using LLMs' pre-trained knowledge to perform AD without tasks-specific training; data augmentation, generating synthetic data and category descriptions to improve AD models; and model selection, using LLMs to suggest unsupervised AD models.
arXiv Detail & Related papers (2024-12-15T10:22:14Z)
Linear Discriminant Analysis in Credit Scoring: A Transparent Hybrid Model Approach [9.88281854509076]
We implement Linear Discriminant Analysis (LDA) as a feature reduction technique, which reduces the burden of the models complexity. Our hybrid model, XG-DNN, outperformed other models with the highest accuracy of 99.45% and a 99% F1 score with LDA. To interpret model decisions, we have applied 2 different explainable AI techniques named LIME (local) and Morris Sensitivity Analysis (global)
arXiv Detail & Related papers (2024-12-05T14:21:18Z)
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability [53.51560766150442]
Critical tokens are elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling. We show that identifying and replacing critical tokens significantly improves model accuracy.
arXiv Detail & Related papers (2024-11-29T18:58:22Z)
ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models [43.895478182631116]
Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications. To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH. For breadth, we consider three scenarios based on the characteristics of the toolset: missing necessary tools, potential tools, and limited functionality tools. The results show the significant challenges presented by the ToolBH benchmark.
arXiv Detail & Related papers (2024-06-28T16:03:30Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making. We present a process-based benchmark MR-Ben that demands a meta-reasoning skill. Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
Explainable AI for Comparative Analysis of Intrusion Detection Models [20.683181384051395]
This research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic. We trained all models to the accuracy of 90% on the UNSW-NB15 dataset. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness.
arXiv Detail & Related papers (2024-06-14T03:11:01Z)
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z)
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights. We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs. Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z)
Progressing from Anomaly Detection to Automated Log Labeling and Pioneering Root Cause Analysis [53.24804865821692]
This study introduces a taxonomy for log anomalies and explores automated data labeling to mitigate labeling challenges. The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies.
arXiv Detail & Related papers (2023-12-22T15:04:20Z)
D-Bot: Database Diagnosis System using Large Language Models [30.20192093986365]
Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems. Recently large language models (LLMs) have shown great potential in various fields. We propose D-Bot, an LLM-based database diagnosis system that can automatically acquire knowledge from diagnosis documents.
arXiv Detail & Related papers (2023-12-03T16:58:10Z)
Causal Disentanglement Hidden Markov Model for Fault Diagnosis [55.90917958154425]
We propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism. Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors. To expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments.
arXiv Detail & Related papers (2023-08-06T05:58:45Z)
Fault Diagnosis using eXplainable AI: a Transfer Learning-based Approach for Rotating Machinery exploiting Augmented Synthetic Data [0.0]
FaultD-XAI is a generic and interpretable approach for classifying faults in rotating machinery based on transfer learning. To provide scalability using transfer learning, synthetic vibration signals are created mimicking the characteristic behavior of failures in operation. The proposed approach not only obtained promising diagnostic performance, but was also able to learn characteristics used by experts to identify conditions.
arXiv Detail & Related papers (2022-10-06T15:02:35Z)
How Can Subgroup Discovery Help AIOps? [0.0]
We study how Subgroup Discovery can help AIOps. This project involves both data mining researchers and practitioners from Infologic, a French software editor.
arXiv Detail & Related papers (2021-09-10T14:41:02Z)
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples. We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.