JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty
- URL: http://arxiv.org/abs/2603.03748v1
- Date: Wed, 04 Mar 2026 05:36:11 GMT
- Title: JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty
- Authors: Taha Racicot,
- Abstract summary: JANUS (Joint Ancestral Network for Uncertainty and Synthesis) is a framework that unifies capabilities using a DAG of Bayesian Decision Trees.<n>Key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph.<n>Janus achieves state-of-the-art fidelity (Detection Score 0.497), eliminates mode collapse on imbalanced data, and provides exact handling of complex inter-column constraints.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN, TabDDPM) excel at fidelity but rely on inefficient rejection sampling for continuous range constraints. Conversely, Structural Causal Models offer logical control but struggle with high-dimensional fidelity and complex noise inversion. We introduce JANUS (Joint Ancestral Network for Uncertainty and Synthesis), a framework that unifies these capabilities using a DAG of Bayesian Decision Trees. Our key innovation is Reverse-Topological Back-filling, an algorithm that propagates constraints backwards through the causal graph, achieving 100% constraint satisfaction on feasible constraint sets without rejection sampling. This is paired with an Analytical Uncertainty Decomposition derived from Dirichlet priors, enabling 128x faster uncertainty estimation than Monte Carlo methods. Across 15 datasets and 523 constrained scenarios, JANUS achieves state-of-the-art fidelity (Detection Score 0.497), eliminates mode collapse on imbalanced data, and provides exact handling of complex inter-column constraints (e.g., Salary_offered >= Salary_requested) where baselines fail entirely.
Related papers
- Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z) - Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models [2.5170433424424874]
Reinforcement Learning with Verifiable Rewards has established itself as the dominant paradigm for instilling rigorous reasoning capabilities in Large Language Models.<n>We identify a critical pathology in this alignment process: the systematic suppression of valid but rare (low-likelihood under the base model distribution) reasoning paths.<n>We propose Amortized Reasoning Tree Search (ARTS) to counteract this collapse without discarding the base model's latent diversity.
arXiv Detail & Related papers (2026-02-13T11:52:50Z) - ProbFM: Probabilistic Time Series Foundation Model with Uncertainty Decomposition [0.12489632787815884]
Time Series Foundation Models (TSFMs) have emerged as a promising approach for zero-shot financial forecasting.<n>Current approaches either rely on restrictive distributional assumptions, conflate different sources of uncertainty, or lack principled calibration mechanisms.<n>We present a novel transformer-based probabilistic framework, ProbFM, that leverages Deep Evidential Regression (DER) to provide principled uncertainty quantification.
arXiv Detail & Related papers (2026-01-15T17:02:06Z) - Distributionally Robust Optimization with Adversarial Data Contamination [49.89480853499918]
We focus on optimizing Wasserstein-1 DRO objectives for generalized linear models with convex Lipschitz loss functions.<n>Our primary contribution lies in a novel modeling framework that integrates robustness against training data contamination with robustness against distributional shifts.<n>This work establishes the first rigorous guarantees, supported by efficient computation, for learning under the dual challenges of data contamination and distributional shifts.
arXiv Detail & Related papers (2025-07-14T18:34:10Z) - Bridging Internal Probability and Self-Consistency for Effective and Efficient LLM Reasoning [53.25336975467293]
We present the first theoretical error decomposition analysis of methods such as perplexity and self-consistency.<n>Our analysis reveals a fundamental trade-off: perplexity methods suffer from substantial model error due to the absence of a proper consistency function.<n>We propose Reasoning-Pruning Perplexity Consistency (RPC), which integrates perplexity with self-consistency, and Reasoning Pruning, which eliminates low-probability reasoning paths.
arXiv Detail & Related papers (2025-02-01T18:09:49Z) - Controllable Generation via Locally Constrained Resampling [77.48624621592523]
We propose a tractable probabilistic approach that performs Bayesian conditioning to draw samples subject to a constraint.
Our approach considers the entire sequence, leading to a more globally optimal constrained generation than current greedy methods.
We show that our approach is able to steer the model's outputs away from toxic generations, outperforming similar approaches to detoxification.
arXiv Detail & Related papers (2024-10-17T00:49:53Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Efficient Transfer Learning via Causal Bounds [8.981637739384674]
We analyze how causal side-information accelerates online learning, and experiments on data reduction.<n>Our analysis precisely characterizes when how causal side-information accelerates online learning, and experiments on data reduction.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Error-based Knockoffs Inference for Controlled Feature Selection [49.99321384855201]
We propose an error-based knockoff inference method by integrating the knockoff features, the error-based feature importance statistics, and the stepdown procedure together.
The proposed inference procedure does not require specifying a regression model and can handle feature selection with theoretical guarantees.
arXiv Detail & Related papers (2022-03-09T01:55:59Z) - Kidney Exchange with Inhomogeneous Edge Existence Uncertainty [33.17472228570093]
We aim to maximize a matched cycle and chain packing problem, where we aim to identify structures in a directed graph to the edge of failure.
Our approaches on data from the United for Sharing (SUNO) provides better performance with the same weights as as an SAA-based method.
arXiv Detail & Related papers (2020-07-07T04:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.