Diagnosing Robotics Systems Issues with Large Language Models
- URL: http://arxiv.org/abs/2410.09084v1
- Date: Sun, 6 Oct 2024 11:58:12 GMT
- Title: Diagnosing Robotics Systems Issues with Large Language Models
- Authors: Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Müller,
- Abstract summary: Large language models (LLMs) excel at analyzing large amounts of data.
Here, we extend this work to the challenging and largely unexplored domain of robotics systems.
- Score: 5.30112395683561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quickly resolving issues reported in industrial applications is crucial to minimize economic impact. However, the required data analysis makes diagnosing the underlying root causes a challenging and time-consuming task, even for experts. In contrast, large language models (LLMs) excel at analyzing large amounts of data. Indeed, prior work in AI-Ops demonstrates their effectiveness in analyzing IT systems. Here, we extend this work to the challenging and largely unexplored domain of robotics systems. To this end, we create SYSDIAGBENCH, a proprietary system diagnostics benchmark for robotics, containing over 2500 reported issues. We leverage SYSDIAGBENCH to investigate the performance of LLMs for root cause analysis, considering a range of model sizes and adaptation techniques. Our results show that QLoRA finetuning can be sufficient to let a 7B-parameter model outperform GPT-4 in terms of diagnostic accuracy while being significantly more cost-effective. We validate our LLM-as-a-judge results with a human expert study and find that our best model achieves similar approval ratings as our reference labels.
Related papers
- ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models [43.895478182631116]
Tool-augmented large language models (LLMs) are rapidly being integrated into real-world applications.
To address this challenge, we introduce a comprehensive diagnostic benchmark, ToolBH.
For breadth, we consider three scenarios based on the characteristics of the toolset: missing necessary tools, potential tools, and limited functionality tools.
The results show the significant challenges presented by the ToolBH benchmark.
arXiv Detail & Related papers (2024-06-28T16:03:30Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Explainable AI for Comparative Analysis of Intrusion Detection Models [20.683181384051395]
This research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic.
We trained all models to the accuracy of 90% on the UNSW-NB15 dataset.
We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness.
arXiv Detail & Related papers (2024-06-14T03:11:01Z) - Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation [113.5002649181103]
Training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology.
For training, we assemble a large dataset of over 697 thousand radiology image-text pairs.
For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation.
The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
arXiv Detail & Related papers (2024-03-12T18:12:02Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Progressing from Anomaly Detection to Automated Log Labeling and
Pioneering Root Cause Analysis [53.24804865821692]
This study introduces a taxonomy for log anomalies and explores automated data labeling to mitigate labeling challenges.
The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies.
arXiv Detail & Related papers (2023-12-22T15:04:20Z) - D-Bot: Database Diagnosis System using Large Language Models [30.20192093986365]
Database administrators (DBAs) play an important role in managing, maintaining and optimizing database systems.
Recently large language models (LLMs) have shown great potential in various fields.
We propose D-Bot, an LLM-based database diagnosis system that can automatically acquire knowledge from diagnosis documents.
arXiv Detail & Related papers (2023-12-03T16:58:10Z) - Causal Disentanglement Hidden Markov Model for Fault Diagnosis [55.90917958154425]
We propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism.
Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors.
To expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments.
arXiv Detail & Related papers (2023-08-06T05:58:45Z) - Fault Diagnosis using eXplainable AI: a Transfer Learning-based Approach
for Rotating Machinery exploiting Augmented Synthetic Data [0.0]
FaultD-XAI is a generic and interpretable approach for classifying faults in rotating machinery based on transfer learning.
To provide scalability using transfer learning, synthetic vibration signals are created mimicking the characteristic behavior of failures in operation.
The proposed approach not only obtained promising diagnostic performance, but was also able to learn characteristics used by experts to identify conditions.
arXiv Detail & Related papers (2022-10-06T15:02:35Z) - How Can Subgroup Discovery Help AIOps? [0.0]
We study how Subgroup Discovery can help AIOps.
This project involves both data mining researchers and practitioners from Infologic, a French software editor.
arXiv Detail & Related papers (2021-09-10T14:41:02Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.