Towards Handling Uncertainty-at-Source in AI -- A Review and Next Steps
for Interval Regression
- URL: http://arxiv.org/abs/2104.07245v1
- Date: Thu, 15 Apr 2021 05:31:10 GMT
- Title: Towards Handling Uncertainty-at-Source in AI -- A Review and Next Steps
for Interval Regression
- Authors: Shaily Kabir, Christian Wagner and Zack Ellerby
- Abstract summary: This paper focuses on linear regression for interval-valued data as a recent growth area.
We conduct an in-depth analysis of state-of-the-art methods, elucidating their behaviour, advantages, and pitfalls when applied to datasets with different properties.
- Score: 6.166295570030645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most of statistics and AI draw insights through modelling discord or variance
between sources of information (i.e., inter-source uncertainty). Increasingly,
however, research is focusing upon uncertainty arising at the level of
individual measurements (i.e., within- or intra-source), such as for a given
sensor output or human response. Here, adopting intervals rather than numbers
as the fundamental data-type provides an efficient, powerful, yet challenging
way forward -- offering systematic capture of uncertainty-at-source, increasing
informational capacity, and ultimately potential for insight. Following recent
progress in the capture of interval-valued data, including from human
participants, conducting machine learning directly upon intervals is a crucial
next step. This paper focuses on linear regression for interval-valued data as
a recent growth area, providing an essential foundation for broader use of
intervals in AI. We conduct an in-depth analysis of state-of-the-art methods,
elucidating their behaviour, advantages, and pitfalls when applied to datasets
with different properties. Specific emphasis is given to the challenge of
preserving mathematical coherence -- i.e., ensuring that models maintain
fundamental mathematical properties of intervals throughout -- and the paper
puts forward extensions to an existing approach to guarantee this. Carefully
designed experiments, using both synthetic and real-world data, are conducted
-- with findings presented alongside novel visualizations for interval-valued
regression outputs, designed to maximise model interpretability. Finally, the
paper makes recommendations concerning method suitability for data sets with
specific properties and highlights remaining challenges and important next
steps for developing AI with the capacity to handle uncertainty-at-source.
Related papers
- Deep End-to-End Survival Analysis with Temporal Consistency [49.77103348208835]
We present a novel Survival Analysis algorithm designed to efficiently handle large-scale longitudinal data.
A central idea in our method is temporal consistency, a hypothesis that past and future outcomes in the data evolve smoothly over time.
Our framework uniquely incorporates temporal consistency into large datasets by providing a stable training signal.
arXiv Detail & Related papers (2024-10-09T11:37:09Z) - Causal Contrastive Learning for Counterfactual Regression Over Time [3.3523758554338734]
This paper introduces a unique approach to counterfactual regression over time, emphasizing long-term predictions.
Distinguishing itself from existing models like Causal Transformer, our approach highlights the efficacy of employing RNNs for long-term forecasting.
Our method achieves state-of-the-art counterfactual estimation results using both synthetic and real-world data.
arXiv Detail & Related papers (2024-06-01T19:07:25Z) - Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - Uncertainty for Active Learning on Graphs [70.44714133412592]
Uncertainty Sampling is an Active Learning strategy that aims to improve the data efficiency of machine learning models.
We benchmark Uncertainty Sampling beyond predictive uncertainty and highlight a significant performance gap to other Active Learning strategies.
We develop ground-truth Bayesian uncertainty estimates in terms of the data generating process and prove their effectiveness in guiding Uncertainty Sampling toward optimal queries.
arXiv Detail & Related papers (2024-05-02T16:50:47Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets.
We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z) - CEDAR: Communication Efficient Distributed Analysis for Regressions [9.50726756006467]
There are growing interests about distributed learning over multiple EHRs databases without sharing patient-level data.
We propose a novel communication efficient method that aggregates the local optimal estimates, by turning the problem into a missing data problem.
We provide theoretical investigation for the properties of the proposed method for statistical inference as well as differential privacy, and evaluate its performance in simulations and real data analyses.
arXiv Detail & Related papers (2022-07-01T09:53:44Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.