PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection
- URL: http://arxiv.org/abs/2412.12154v1
- Date: Wed, 11 Dec 2024 07:53:20 GMT
- Title: PyOD 2: A Python Library for Outlier Detection with LLM-powered Model Selection
- Authors: Sihan Chen, Zhuangzhuang Qian, Wingchun Siu, Xingcan Hu, Jiaqi Li, Shawn Li, Yuehan Qin, Tiankai Yang, Zhuo Xiao, Wanghao Ye, Yichi Zhang, Yushun Dong, Yue Zhao,
- Abstract summary: Outlier detection (OD) is a critical machine learning (ML) task with applications in fraud detection, network intrusion detection, clickstream analysis, recommendation systems, and social network moderation.
PyOD is the most widely adopted library for OD, with over 8,500 GitHub stars, 25 million downloads, and diverse industry usage.
PyOD Version 2 (PyOD 2) integrates 12 state-of-the-art deep learning models into a PyTorch framework and introduces a large language model (LLM)-based pipeline for automated OD model selection.
- Score: 18.108558631930766
- License:
- Abstract: Outlier detection (OD), also known as anomaly detection, is a critical machine learning (ML) task with applications in fraud detection, network intrusion detection, clickstream analysis, recommendation systems, and social network moderation. Among open-source libraries for outlier detection, the Python Outlier Detection (PyOD) library is the most widely adopted, with over 8,500 GitHub stars, 25 million downloads, and diverse industry usage. However, PyOD currently faces three limitations: (1) insufficient coverage of modern deep learning algorithms, (2) fragmented implementations across PyTorch and TensorFlow, and (3) no automated model selection, making it hard for non-experts. To address these issues, we present PyOD Version 2 (PyOD 2), which integrates 12 state-of-the-art deep learning models into a unified PyTorch framework and introduces a large language model (LLM)-based pipeline for automated OD model selection. These improvements simplify OD workflows, provide access to 45 algorithms, and deliver robust performance on various datasets. In this paper, we demonstrate how PyOD 2 streamlines the deployment and automation of OD models and sets a new standard in both research and industry. PyOD 2 is accessible at [https://github.com/yzhao062/pyod](https://github.com/yzhao062/pyod). This study aligns with the Web Mining and Content Analysis track, addressing topics such as the robustness of Web mining methods and the quality of algorithmically-generated Web data.
Related papers
- dtaianomaly: A Python library for time series anomaly detection [5.356944479760106]
dtaianomaly is an open-source Python library for time series anomaly detection.
Our goal is to bridge the gap between academic research and real-world applications.
dtaianomaly offers (1) a broad range of built-in anomaly detectors, (2) support for time series preprocessing, (3) tools for visual analysis, (4) confidence prediction of anomaly scores, (5) runtime and memory profiling, (6) comprehensive documentation, and (7) cross-platform unit testing.
arXiv Detail & Related papers (2025-02-20T09:18:00Z) - PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.
PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.
We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model [90.71963723884944]
Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research.
We introduce DiffAgent, an agent designed to screen the accurate selection in seconds via API calls.
Our evaluations reveal that DiffAgent not only excels in identifying the appropriate T2I API but also underscores the effectiveness of the SFTA training framework.
arXiv Detail & Related papers (2024-03-31T06:28:15Z) - Exploring Green AI for Audio Deepfake Detection [21.17957700009653]
State-of-the-art audio deepfake detectors leveraging deep neural networks exhibit impressive recognition performance.
Deep NLP models produce around 626k lbs of COtextsubscript2 which is equivalent to five times of average US car emission at its lifetime.
This study presents a novel framework for audio deepfake detection that can be seamlessly trained using standard CPU resources.
arXiv Detail & Related papers (2024-03-21T10:54:21Z) - Distributed Inference and Fine-tuning of Large Language Models Over The
Internet [91.00270820533272]
Large language models (LLMs) are useful in many NLP tasks and become more capable with size.
These models require high-end hardware, making them inaccessible to most researchers.
We develop fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput.
arXiv Detail & Related papers (2023-12-13T18:52:49Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - PySAD: A Streaming Anomaly Detection Framework in Python [0.0]
PySAD is an open-source python framework for anomaly detection on streaming data.
PySAD builds upon popular open-source frameworks such as PyOD and scikit-learn.
arXiv Detail & Related papers (2020-09-05T17:41:37Z) - DR-SPAAM: A Spatial-Attention and Auto-regressive Model for Person
Detection in 2D Range Data [81.06749792332641]
We propose a person detection network which uses an alternative strategy to combine scans obtained at different times.
DR-SPAAM keeps the intermediate features from the backbone network as a template and recurrently updates the template when a new scan becomes available.
On the DROW dataset, our method outperforms the existing state-of-the-art, while being approximately four times faster.
arXiv Detail & Related papers (2020-04-29T11:01:44Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.