Related papers: Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks

Related papers

CREDIT: Certified Ownership Verification of Deep Neural Networks Against Model Extraction Attacks [54.04030169323115]
We introduce CREDIT, a certified ownership verification against Model Extraction Attacks (MEAs)<n>We quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold.<n>We extensively evaluate our approach on several mainstream datasets across different domains and tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2026-02-23T23:36:25Z)
Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions [37.08071497197165]
Intervention-based model steering offers a lightweight and interpretable alternative to prompting and fine-tuning.<n>We build on the principles of distributed alignment search to propose a new steering method: Concept DAS.<n>We show that Concept DAS does not always outperform preference-optimization methods but may benefit more from increased model scale.
arXiv Detail & Related papers (2026-02-05T02:51:00Z)
Foundation CAN LM: A Pretrained Language Model For Automotive CAN Data [1.1091582432763738]
Controller Area Network (CAN) bus provides rich source of vehicular signals for applications in automotive and auto insurance domains.<n>Existing pipelines largely train isolated task-specific models on raw CAN data.<n>We introduce the foundation CAN model paradigm: large-scale pretraining followed by task-specific adaptation.
arXiv Detail & Related papers (2026-01-31T19:00:48Z)
AI-NativeBench: An Open-Source White-Box Agentic Benchmark Suite for AI-Native Systems [52.65695508605237]
We introduce AI-NativeBench, the first application-centric and white-box AI-Native benchmark suite grounded in Model Context Protocol (MCP) and Agent-to-Agent (A2A) standards.<n>By treating agentic spans as first-class citizens within distributed traces, our methodology enables granular analysis of engineering characteristics beyond simple capabilities.<n>This work provides the first systematic evidence to guide the transition from measuring model capability to engineering reliable AI-Native systems.
arXiv Detail & Related papers (2026-01-14T11:32:07Z)
SPACeR: Self-Play Anchoring with Centralized Reference Models [50.55045557371374]
Sim agent policies are realistic, human-like, fast, and scalable in multi-agent settings.<n>Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data.<n>We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a central reference policy.
arXiv Detail & Related papers (2025-10-20T19:53:02Z)
Learning Constraints Directly from Network Data [0.34137115855910755]
Rule extraction can improve quality of synthetic data, reduce brittleness of machine learning models, and improve semantic understanding of network measurements.<n>This paper introduces NetNomos, which learns propositional logic constraints directly from raw network measurements.<n>Our evaluations show that NetNomos learns all benchmark rules, including those associated with as little as 0.01% of data points, in under three hours.
arXiv Detail & Related papers (2025-06-30T15:36:22Z)
Origin Tracer: A Method for Detecting LoRA Fine-Tuning Origins in LLMs [3.925661213372832]
We propose a novel detection method that rigorously determines whether a model has been fine-tuned from a specified base model.<n>This framework is the first to provide a formalized approach specifically aimed at pinpointing the sources of model fine-tuning.<n>We empirically validated our method on thirty-one diverse open-source models under conditions that simulate real-world obfuscation scenarios.
arXiv Detail & Related papers (2025-05-26T03:38:14Z)
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning [40.93098780862429]
We show that the strongest results in foundation model fine-tuning (FT) are achieved via a relatively complex, two-stage training procedure. One first trains a reward model (RM) on some dataset (e.g. human preferences) before using it to provide online feedback as part of a downstream reinforcement learning procedure. We find the most support for the explanation that on problems with a generation-verification gap, the combination of the ease of learning the relatively simple RM from the preference data, and the ability of the downstream RL procedure to then filter its search space to the subset of policies that are optimal for
arXiv Detail & Related papers (2025-03-03T00:15:19Z)
Model-based Offline Policy Optimization with Adversarial Network [0.36868085124383626]
We propose a novel Model-based Offline policy optimization framework with Adversarial Network (MOAN) Key idea is to use adversarial learning to build a transition model with better generalization. Our approach outperforms existing state-of-the-art baselines on widely studied offline RL benchmarks.
arXiv Detail & Related papers (2023-09-05T11:49:33Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Relational Action Bases: Formalization, Effective Safety Verification, and Invariants (Extended Version) [67.99023219822564]
We introduce the general framework of relational action bases (RABs) RABs generalize existing models by lifting both restrictions. We demonstrate the effectiveness of this approach on a benchmark of data-aware business processes.
arXiv Detail & Related papers (2022-08-12T17:03:50Z)
Fully Decentralized Model-based Policy Optimization for Networked Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors. In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z)
Residual Pathway Priors for Soft Equivariance Constraints [44.19582621065543]
We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors. RPPs are resilient to approximate or misspecified symmetries, and are as effective as fully constrained models even when symmetries are exact.
arXiv Detail & Related papers (2021-12-02T16:18:17Z)
Explaining a Series of Models by Propagating Local Feature Attributions [9.66840768820136]
Pipelines involving several machine learning models improve performance in many domains but are difficult to understand. We introduce a framework to propagate local feature attributions through complex pipelines of models based on a connection to the Shapley value. Our framework enables us to draw higher-level conclusions based on groups of gene expression features for Alzheimer's and breast cancer histologic grade prediction.
arXiv Detail & Related papers (2021-04-30T22:20:58Z)
COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable. We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions. We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim. We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy. This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)
MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. We show that an existing model-based RL algorithm already produces significant gains in the offline setting. We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.