Towards an Interpretable Data-driven Trigger System for High-throughput
Physics Facilities
- URL: http://arxiv.org/abs/2104.06622v1
- Date: Wed, 14 Apr 2021 05:01:32 GMT
- Title: Towards an Interpretable Data-driven Trigger System for High-throughput
Physics Facilities
- Authors: Chinmaya Mahesh, Kristin Dona, David W. Miller, Yuxin Chen
- Abstract summary: We introduce a new data-driven approach for designing high- throughput data filtering and trigger systems.
Our goal is to design a data-driven filtering system with a minimal run-time cost for determining which data event to keep.
We introduce key insights from interpretable predictive modeling and cost-sensitive learning in order to account for non-local inefficiencies in the current paradigm.
- Score: 7.939382824995354
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-intensive science is increasingly reliant on real-time processing
capabilities and machine learning workflows, in order to filter and analyze the
extreme volumes of data being collected. This is especially true at the energy
and intensity frontiers of particle physics where bandwidths of raw data can
exceed 100 Tb/s of heterogeneous, high-dimensional data sourced from hundreds
of millions of individual sensors. In this paper, we introduce a new
data-driven approach for designing and optimizing high-throughput data
filtering and trigger systems such as those in use at physics facilities like
the Large Hadron Collider (LHC). Concretely, our goal is to design a
data-driven filtering system with a minimal run-time cost for determining which
data event to keep, while preserving (and potentially improving upon) the
distribution of the output as generated by the hand-designed trigger system. We
introduce key insights from interpretable predictive modeling and
cost-sensitive learning in order to account for non-local inefficiencies in the
current paradigm and construct a cost-effective data filtering and trigger
model that does not compromise physics coverage.
Related papers
- Enhancing High-Energy Particle Physics Collision Analysis through Graph Data Attribution Techniques [0.0]
This paper uses a simulated particle collision dataset to integrate influence analysis inside the graph classification pipeline.
By using a Graph Neural Network for initial training, we applied a gradient-based data influence method to identify influential training samples.
By analyzing the discarded elements we can provide further insights about the event classification task.
arXiv Detail & Related papers (2024-07-20T12:40:03Z) - Computationally and Memory-Efficient Robust Predictive Analytics Using Big Data [0.0]
This study navigates through the challenges of data uncertainties, storage limitations, and predictive data-driven modeling using big data.
We utilize Robust Principal Component Analysis (RPCA) for effective noise reduction and outlier elimination, and Optimal Sensor Placement (OSP) for efficient data compression and storage.
arXiv Detail & Related papers (2024-03-27T22:39:08Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - A spectrum of physics-informed Gaussian processes for regression in
engineering [0.0]
Despite the growing availability of sensing and data in general, we remain unable to fully characterise many in-service engineering systems and structures from a purely data-driven approach.
This paper pursues the combination of machine learning technology and physics-based reasoning to enhance our ability to make predictive models with limited data.
arXiv Detail & Related papers (2023-09-19T14:39:03Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Closing the loop: Autonomous experiments enabled by
machine-learning-based online data analysis in synchrotron beamline
environments [80.49514665620008]
Machine learning can be used to enhance research involving large or rapidly generated datasets.
In this study, we describe the incorporation of ML into a closed-loop workflow for X-ray reflectometry (XRR)
We present solutions that provide an elementary data analysis in real time during the experiment without introducing the additional software dependencies in the beamline control software environment.
arXiv Detail & Related papers (2023-06-20T21:21:19Z) - Online Data Selection for Federated Learning with Limited Storage [53.46789303416799]
Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices.
The impact of on-device storage on the performance of FL is still not explored.
In this work, we take the first step to consider the online data selection for FL with limited on-device storage.
arXiv Detail & Related papers (2022-09-01T03:27:33Z) - How Much More Data Do I Need? Estimating Requirements for Downstream
Tasks [99.44608160188905]
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
Overestimating or underestimating data requirements incurs substantial costs that could be avoided with an adequate budget.
Using our guidelines, practitioners can accurately estimate data requirements of machine learning systems to gain savings in both development time and data acquisition costs.
arXiv Detail & Related papers (2022-07-04T21:16:05Z) - Innovations in trigger and data acquisition systems for next-generation
physics facilities [0.6445605125467573]
Data-intensive physics facilities are increasingly reliant on heterogeneous and large-scale data processing systems.
This White Paper aims to highlight the challenges that these facilities face in the design of the trigger and data acquisition instrumentation and systems.
arXiv Detail & Related papers (2022-03-15T03:13:32Z) - Deep Reinforcement Learning Assisted Federated Learning Algorithm for
Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment.
How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue.
This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z) - PREPRINT: Comparison of deep learning and hand crafted features for
mining simulation data [7.214140640112874]
This paper addresses the task of extracting meaningful results in an automated manner from high dimensional data sets.
We propose deep learning methods which are capable of processing such data and which can be trained to solve relevant tasks on simulation data.
We compile a large dataset of 2D simulations of the flow field around airfoils which contains 16000 flow fields with which we tested and compared approaches.
arXiv Detail & Related papers (2021-03-11T09:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.