Machine Learning (ML) has recently been demonstrated to rival expert-level
human accuracy in prediction and detection tasks in a variety of domains,
including medicine. Despite these impressive findings, however, a key barrier
to the full realization of ML's potential in medical prognoses is technology
acceptance. Recent efforts to produce explainable AI (XAI) have made progress
in improving the interpretability of some ML models, but these efforts suffer
from limitations intrinsic to their design: they work best at identifying why a
system fails, but do poorly at explaining when and why a model's prediction is
correct. We posit that the acceptability of ML predictions in expert domains is
limited by two key factors: the machine's horizon of prediction that extends
beyond human capability, and the inability for machine predictions to
incorporate human intuition into their models. We propose the use of a novel ML
architecture, Neural Ordinary Differential Equations (NODEs) to enhance human
understanding and encourage acceptability. Our approach prioritizes human
cognitive intuition at the center of the algorithm design, and offers a
distribution of predictions rather than single outputs. We explain how this
approach may significantly improve human-machine collaboration in prediction
tasks in expert domains such as medical prognoses. We propose a model and
demonstrate, by expanding a concrete example from the literature, how our model
advances the vision of future hybrid Human-AI systems.
Recent eﬀorts to produce explainable AI (XAI) have made progress in improving the interpretability of some ML models, but these eﬀorts suﬀer from limitations intrinsic to their design: they work best at identifying why a system fails, but do poorly at explaining when and why a model’s prediction is correct.
We posit that the acceptability of ML predictions in expert domains is limited by two key factors: the machine’s horizon of prediction that extends beyond human capability, and the inability for machine predictions to incorporate human intuition into their models.
The vision of being able to accurately predict a patient’s medical trajectory by integrating vast amounts of disparate data has inspired generations of computer scientists and has resulted in a variety of early applications of artiﬁcial intelligence in the form of decision aids and decision support systems.
At the heart of this vision is a collaboration between human and machine; a synergistic hybrid system that aﬀords humans with near superhuman abilities to compute massive amounts of data and with it project far into the future with brilliant accuracy.
This vision of humanmachine teaming through the application of humanAI agents is fast becoming a reality today, thanks
largely to ML. Indeed, ML algorithms have proven to be as accurate or better than expert-level predictions in various medical domains, from image classiﬁcation to time-series analysis, and many others [3, 4].
As early as the 1980s, thorough comparisons between computergenerated recommendations and experts had already demonstrated the critical usefulness of artiﬁcial decision aids , but the lack of algorithmic transparency caused signiﬁcant conﬂicts and inevitable delays that ultimately prevented the widespread adoption of expert systems into mainstream use.
Concerns over low algorithmic transparency and the blackbox nature of algorithms such as deep learning have given rise to new interdisciplinary ﬁelds of research aimed at improving interpretability and transparency of ML algorithms, so-called explainable artiﬁcial intelligence (XAI) .
Perhaps driven by lessons learned from earlier generations of clinical decision support failures, XAI has quickly been oﬀered up as the solution, even when the problem is seldom articulated or perhaps not even fully understood.
In order to better understand why explainability and transparency play a central role in the potential widespread adoption of ML, the following section we illustrate two general scenarios that motivate their importance.
Following this, we discuss why XAI alone is insuﬃcient in achieving the goal of human-AI cooperation for medical prognoses.
We then introduce Neural Ordinary Diﬀerential Equations (NODEs) as a proposed machine learning architecture for use in medical predictions and prognoses, and we illustrate how their use is intrinsically designed for maximum usability, and is superior in supporting human intuition and decision making in medical prognoses.
Sometimes these errors can be traced to a root cause and behaviors can be easily explained.
In other cases, tracing the error is much more diﬃcult, and oftentimes impossible.
This is of immediate concern for makers of industrial-scale autonomous systems such as self-driving cars , but also of great concern in applications that feature machine learning in the role of decision support, as is the case in clinical decision support systems.
Physicians, hospital administrators and even government regulators, upon seeing the apparently brittleness of ML are likely to ask themselves ”if this system has such low reliability and unpredictability, how can I ethically justify using it for my patients?” Without some measure of assurance of its reliability, low trust and in some cases abandonment of the technology as viable remains the most likely outcome.
While XAI research has resulted in a number of small breakthroughs in terms of ML development techniques, the true beneﬁts from these eﬀorts are limited mostly to programmers and debuggers whose goal is to build more robust and reliable systems.
While important, XAI’s current focus on explaining what went wrong does little to help users determine when and why an algorithmic prediction may be correct, and so does little to help users determine whether or not to use, trust, engage with, or adopt AI moving forward.
For instance, this might come in the form of whether or not a radiologist decides to accept or validate a diagnostic ﬂag created by ML on a medical image, or whether or not to act on ML-based predictions that indicate an aggressive treatment regimen may be warranted in a given patient.
In these situations, users are not aﬀorded the luxury of ground truth, i.e., there is no direct way of knowing whether or not the ML algorithm is accurate because it is projecting a future state that has yet to occur.
what is the reasoning behind this suggestion to treat with an experimental drug?
The argument for XAI to address this prospective scenario, therefore, is that the more answers to user questions a system can provide, the greater the degree of trustworthiness the system has, and the greater the likelihood that the system will be used to the extent and in the manner in which it was designed.
Limits to Prospective XAI Unfortunately, while the majority of XAI has focused on post-hoc explanation strategies, even the few efforts that are prospective in nature are severely limited in their ability to improve usability and technology acceptance for ML for at least three reasons.
While rules governing a natural phenomenon’s evolution lie in a constrained space that can hypothetically be modelled completely, a fully transparent XAI prediction would need to be able to master all possible future states even in regions that do not seem plausible at ﬁrst.
Secondly, another limitation to XAI approaches stems from how humans reason about causality.
Cognitive scientists have long demonstrated that humans do not typically engage in the kind of deliberate, methodical decision making (i.e., “slow thinking,” or “system two thinking”) that would make use of such a robust and complete XAI system.
In other words, more data is seldom likely to result in better decisions.
Lastly, a limitation of XAI
in improving the prospective prediction problem is that developers maximize predictive accuracy of ML models, but do little to address the myriad of other human factors that play a role in how humans prognose and make decisions.
The role of intuition in expert decision making has received much focus in the cognitive and
neurosciences for many decades, especially in tasks such as discovery and exploration [11, 12, 13, 14, 15].
長年の神経科学、特に発見や探索のような仕事[11, 12, 13, 14 15]において。
Earlier generations of artiﬁcial decision aids that attempted to mimic human decision making ran into trouble because they could not account for information originating from outside of their knowledge base.
Developing expert knowledge seems an illusive target for an artiﬁcial system because, as human expertise grows, it also evolves towards more and more intuition  and subjectivity, and draws conclusions from information that is broader than merely the data in a patient’s medical record.
Any cooperative vision where ML is a trusted component in a cooperative decision making system, such as a fully-integrated clinical decision support system, should feature the strengths of both components (human and machine), rather than limiting the strengths of one over the other.
Although explainability is a vital factor in aﬀecting human trust in ML algorithms , it is not entirely suﬃcient to achieve the true vision of how ML can help humanity by improving our ability to predict the future.
Machine learning has a similar limit to its predictive horizon for the same reason [20, 21, 22, 19].
機械学習は、[20, 21, 22, 19]と同じ理由で、予測的地平線に類似した制限がある。
This limit is unbreakable, in the sense that even with perfect knowledge of the underlying dynamics of the system, it is impossible to make predictions beyond a certain point because the latent errors compound to such an extent that no certainty can be achieved.
But this instinct also means that our ability to predict future events is ultimately fragile because our focus on identifying dominant patters often means that we exclude emerging sub-patterns (what is necessary to accurately make a prognosis).
This information is used to guide our exploration until we ﬁnd an eventual matching pattern, and hence a diagnosis is conﬁrmed.
The primary mechanism through which diagnoses are made, however, is through a ”ruling out” process, which consists largely of seeking evidence to support a main hypothesis, and systematically dismissing other hypotheses that are not supported by the data.
Unfortunately, this projection suﬀers from the same conﬁrmatory bias as mentioned before .
When attempting to make predictions, research demonstrates that the projection of a series of consecutive states of a phenomenon is usually ruled by a dominant master pattern, to the exclusion of other potentially informative and inﬂuential patterns .
Studies consistently show Figure 1: Human and machine horizons of predictability (HOP).
Machine learning is able to make an accurate prediction at a longer timescale than human beings, but humans often struggle to trust ML outputs because they are diﬃcult to comprehend, and do not incorporate all available information, including human intution.
To determine a prognosis, therefore, the prognosis that seems most likely and plausible to the person is the one that arranges the data in the most coherent structure— i.e., the one that tells the most convincing story.
Unfortunately, as has been demonstrated before, data do not always arrange themselves neatly into logical causal relationships that can be quickly appreciated by human beings, which sadly means that a great deal of the time, human beings have a tendency to see connections where there are none .
In summary, our evolutionary drive to seek dominant patterns and our aﬃnity to arrange data into a narrative format is especially useful when it comes to diagnosing, but not especially useful for making prognoses.
In order to achieve true human-machine collaboration where experts conﬁdently leverage the predictive power of ML, the task at hand, therefore, should not be to focus solely on creating more predictive algorithms, or creating more explainable models.
Because machine learning can reason and project out much further than human capabilities, there is a gap between the machine and human horizon of predictability— the limits at which accurate predictions can be made.
Current XAI approaches alone will not narrow this gap because a) they are mostly retrospective in focus and do very little to explain future predictions; and b) we have human cognitive limitations (i.e., we have a tendency to focus on predominant patterns that are familiar to us and therefore ignore emerging new patterns, and we have cognitive limitations in how much data we can process).
To overcome these limitations, we need systems that are speciﬁcally designed with the human predisposition for cognitive intuition in mind in order to enhance acceptability and encourage collaboration.
A system that seeks to augment, as opposed to supplant, intuition would be one that presents its outputs in forms that are easily understandable, to the point of being practically available for humans to use as part of their reasoning.
Rather than developing ways to extract information from intractable models, a plausible solution to encourage better human-machine collaboration with ML is to design machine learning in such a way that its mathematical forms and representations maximize human understanding and comprehension.
For instance, the statement ”If a patient has COVID-19 the probability that they will have a positive result on a rapid test is 95%” is often confused with ”if a patient has a positive test result the probability that they have COVID-19 is 95%.” This is an example of how causality, the direction of inference, and conditional probabilities can easily be confounded.
The second statement, however, confounds the directional inference, mistakenly reversing the conditional probability .
For this reason, best practices when displaying statistical risk call for the use of frequency statements (e.g., COVID19 tests will successfully identify 9 out of 10 people who are infected), as they are more intuitively understood by most people .
Their introduction as an architecture for machine learning was met with much surprise and critical acclaim from the scientiﬁc and computational communities of practice, including the best paper of the year at the 2018 Conference on Neural Information Processing Systems (NEURIPS, ).
彼らの機械学習のアーキテクチャとしての紹介は、2018年のニューラル情報処理システム(neurips, )における今年のベストペーパーを含む、科学的および計算的実践コミュニティから大きな驚きと批判的な評価を受けた。 訳抜け防止モード: 機械学習のアーキテクチャとしての彼らの導入は、科学的および計算的な実践コミュニティから多くの驚きと批判的な評価を受けた。 2018 Conference on Neural Information Processing Systems (NEURIPS)の今年のベストペーパーを含む。 [ 34 ] ) .
Applied to machine learning, Neural Ordinary Differential Equations (NODEs) are algorithms that encode the dynamics of a system by learning an ordinary diﬀerential equation for function approximation, as opposed to training a neural network.
NODEs have several advantages over other machine learning techniques for providing clear and tractable outputs.
First, they express the solution in continuous time as opposed to models discretizing the timeline into small time steps [35, 36, 37, 38] and can learn on irregular time-series to best match real-world data (for instance biological measurements in the medical ﬁeld).
As opposed to the more common Partial Diﬀerential Equations (PDEs) [34, 39, 35], where the dynamics of a multi-variate function is modeled; NODEs only consider diﬀerentials with respect to a single parameter [40, 41, 42].
Because we are interested in future projections (i.e., predictions or prognosis), the most relevant continuous indexing parameter is time.
Consequently, we posit that using NODES with all derivatives being with respect to the time variable will aﬀord users a tremendous beneﬁt in being able to comprehend and trust ML outputs for future predictions.
Published works [34, 39] (and the companion code ) have demonstrated for the ﬁrst time the use of NODEs in a latent ODE architecture to model patients’ trajectories from physiological data recorded in an intensive care unit (ICU).
In this work, NODEs show a better sequence reconstruction and state-of-the-art accuracy when predicting in-hospital mortality or risk of re-admission compared to other deep learning architectures [39, 44].
We summarize in Table 2 the main improvements between existing explainability frameworks and our proposed approach using NODEs.
Table 2: Key changes between the explainability framework and the post-explainability framework presented here.
Properties of the latent space modelled by latent ODEs
To brieﬂy illustrate and summarize the basic function of NODEs, we will brieﬂy discuss latent ODEs
6 FrameworkObjectiveTi me-seriesanalysistec hnologyAccuracyAccep tability mechanism Post explainability Provide a narrative on how the system is evolvingODE (Latent ODE)Greater than state of the artQuery-based Human/AI interactions ExplainabilityExplai n where the system is evolving toRNN(VAE RNN)Baseline (state of the art)Fixed set of explanations provided
6 frameworkobjectiveti me-series analysistechnologyac curacyacceptability mechanism post descriptionability provides a narrative on the system is evolutionode (latent ode) much than state of the artquery-based human/ai interaction explanationabilityex plain where the system are evolution tornn(vae rnn)baseline (state of the art)fixed descriptions provided (英語)
and their technical structure. Latent ODEs are used to model the evolution of a process across a time series based on data from an initial latent state.
While RNNs are the go-to solution for modeling regularly sampled time-series data, they do poorly when presented with irregular or inconsistent data, such as the data commonly found in a patient’s medical record.
These steps result in fairly accurate predictions, but without any of the information (particularly the time-related information) necessary to understand the latent variables underlying the prediction.
Latent ODEs, on the other hand, are superior to traditional RNNs because they are ﬂexible with respect to incomplete or inconsistent data, and are especially capable at modeling the future across time.
The resulting latent trajectory should contain information that is both useful for the main classiﬁcation task, and for the reconstruction, thus showing the important features of the original time-series.
Finally, the latent trajectory is decoded into an approximation (ˆx0, .
. . , ˆxN ) of the original measurement.
. . は元の測定の(xN)。
The encoder, decoder and differential equation weights are trained so that ˆx is as close as possible to the real trajectory x.
エンコーダ、デコーダ、微分方程式の重み付けは、x が実軌道 x にできるだけ近いように訓練される。
It was previously observed in the literature that latent ODEs achieve results that are comparable or better than state-of-the-art performances on real life datasets (on the MIMIC-II dataset, see table 6 in , reproduced here as Table 3, and on the MIMIC-III dataset see ).
The analysis made in  focused on the neural network’s ability to predict patient mortality.
Our main objective, however, is to show that using NODEs to model a system’s evolution leverages additional information about a patient’s trajectory, which contributes to human-level understandability and therefore improves the acceptability of the output (assuming the output is accurate and deserves to be accepted), while not compromising the predictive power compared to state-of-the-art approaches.
In this work, we focus on the mortality task: predicting whether the patient will die in the hospital, and we also produce a study of the reconstruction trajectories from  in the case of ICU patients in order to demonstrate how these data dramatically improve the usability of machine learning predictions.
Oﬀering a probabilistic trajectory helps trigger human capabilities
Due to the probabilistic nature of NODEs, our proposed architecture can aﬀord not only a robust and tractable future patient trajectory, but a distribution of trajectories, each representing multiple potential futures of the patient, and each with associated probabilities.
Traditional RNNs provide no such indication as to when a prediction becomes untrustworthy, and systems thus must be programmed to rely on training parameters to set a ﬁxed horizon of events independent of the system’s dynamics.
By providing a timeline with a broad array of potential futures, users can explore these potentials in a way that maximizes and prioritizes their expertise AND intuition because they are now aﬀorded access to multiple potential emerging patterns, instead of having a single dominant pattern presented to them.
The strengths of NODEs illustrated here- a distribution of trajectories along a timeline that aﬀords easy access to predictive boundaries of the machine while allowing multiple potential future scenarios to be explored- emerge as a natural side eﬀect of the architecture.
In other words, in the same way that conveying risk through the use of frequency statements naturally enables people to grasp statistical information and make better decisions, so too do NODE architectures in machine learning.
For our study, training time was extended; better and more variable reconstructions were triggered by reducing the noise parameter, thus limiting the power of the encoder power and increasing the internal ODE weights.
Each box represents a diﬀerent measurement category (i.e., inspired O2, Heart Rate, etc).
The original measurements (blue dots) are displayed.
As the reader can see, some measurements are sparser than others.
This represents the various inaccuracies and inconsistencies of the data.
For example, the arterial blood pressure for patient B is only measured during the second day.
Using these measurements, multiple reconstructions, corresponding to the duration of data fed to the algorithm, are conducted for each feature: the solid lines correspond to the reconstructions where original data is known, whereas the dotted lines of the curves correspond to an extrapolated estimation of the patient’s future.
This shows a direction to improve current NODE models.
Take for example, patient A.
If we look closely at the Glascow Coma Scale (GCS), we can see that the model initially projects an improvement, as seen by the orange and green curves which correspond to 1/5th and 2/5ths of our 48-hour window (roughly the ﬁrst 20 hours).
Glascow Coma Scale(GCS)を詳しく見ると、48時間のウィンドウ(約20時間)の1/5と2/5分の1に対応するオレンジと緑の曲線で見られるように、モデルが最初は改善を計画していることがわかります。
We see, however, that these projections quickly become accurate when enough data is aggregated.
The red line projects what might be considered a median outcome, and has a slightly more distant horizon of predictability, while the purple line and ﬁnally the brown lines show little or no improvement on the Glasgow Coma Scale.
We see that for some features like the Glasgow Coma Scale or the FiO2, reconstructions for patient B tend to follow the tendency of the real feature.
Glasgow Coma ScaleやFiO2のような一部の機能では、患者Bの再構築は実際の機能の傾向に従う傾向があります。
The mortality prediction plot shows the model prediction of the in-hospital mortality.
Given the 48 hours of data, the system is able to predict the death of patient A and the survival of patient B several days later.
validating the brown line’s prediction.
intuitive, while remaining highly accurate.
The last subplot of Figure 2 represents the mortality prediction: for each duration of given data a latent trajectory is drawn by the system, from which a simple neural classiﬁer computes mortality chances.
For patient A, the mortality prediction stays low at the beginning but rises quickly and ultimately crosses the threshold just before the 48-hour mark, indicating that the system predicts patient A will not survive.
We might infer from the data that this prognosis is due to the stable and deteriorated coma state.
この予後は, 安定し, 悪化した coma 状態によるものであると推測できる。
Although the data shown here is not suﬃcient alone to make a full cause of death analysis, this simple example demonstrates the ease with which one can access this data and quickly make sense of the underlying connections and their subsequent eﬀects on the predicted outcome.
Both the reconstructions and the mortality predictions demonstrated here illustrate that the latent ODE architecture can handle complex sparse real-life data in a manner that is human-understandable and
Conclusion In the previous section, we illustrated that the NODE architecture is capable of reconstructing a real life dataset, and have demonstrated how an expert might explore the data and produce a narrative in accordance with the NODE’s results and predictions.
The ease of use aﬀorded by NODEs, combined with multiple future projections provide simple but powerful insights that extend the human horizon of predictability beyond normal limits, and does so in a way that minimizes bias and maximizes trust in the data.
For instance, the expert might choose a speciﬁc curve that leads to a region of the feature space that is close to a dangerous situation, and make the following query: ”if the system crosses the frontier of the dangerous region, what happens next, and how did the system evolve to end up here?”.
The latent ODE then constructs a family of most likely trajectories that passes through this newly added point.
This extra-information will help the experts to construct a narrative that is compatible with their knowledge, reinforcing their decision process, or to explore the complex family of possible trajectories by asking more speciﬁc queries.
ure 3. As you can see in this ﬁgure, the system that ends up close to the dangerous region at the end of the third day does not cross the frontier with this region, so the user may be conﬁdent that this situation is not a concern.
Predictive agents built on intrinsically explainable ML architectures such as NODEs would oﬀer objectivity when the rational foundations of a prediction are still disputed, and would provide dynamical representations to facilitate early adoption of humanly unpredictable scenarios, in the respect of the expert’s world view.
As an ultimate result, these proposed predictive agents could allow users to re-code nonrepresentational knowledge (i.e., intuition) into a dynamic representation of the data, thus leveraging the modeling power and advantage of diﬀerential equations.
In our proposed system built on a NODE architecture, a predictive agent could encode the subject’s evolution patterns into a NODE, and run a simulation of the possible future threats, providing then a concise description of the estimated risks.
Our demonstration, we hope, illustrates how the use of NODEs in medical prognoses is superior to any explanation attempt of black box models, and also supports user’s natural intuition as a consequence of its design.
10 The new narrative is based on the query’s answer :“Given the prediction, the patient should have reached an intermediary state nearthe danger line in approximately 2 days”Initial PrognosisReal dataRNNNODEsPatient Danger lineDay 1 Day 2 Day 3 Day 4 Day 5 Latent ODE QUERY
To conﬁrm the usefulness of these additionally extracted variables , it would then be necessary to conduct trials: the recommendation system would be tested by experts with or without this add-on and evaluated for machine prediction acceptability.
We have illustrated the beneﬁts, both intrinsic and designed, of such an architecture, and have discussed why these beneﬁts are likely to enhance human-machine teaming and technology acceptance of ML in expert domains such as medicine.
To help the construction of narratives and the interactions with a predictive agent, an interesting direction would be to extract additional variables of interest, that are distinct from the measured features.
For instance, in the case of ICU patients, it could be interesting to have machine learning algorithms that extract from the latent trajectory the occurrence of speciﬁc events about diﬀerent systems (respiratory, cardiac, etc.)
This sensibility to prior knowledge needs to be investigated, in particular for real world datasets.
11 RNN(VAE RNN)NODE(Latent ODE)INTERACTIVE QUERYLONG TERMPROJECTION OF NEW NARRATIVESINTUITIVE PREDICTIONSMOOTH CURVENARRATIVES DETAILEDBRUTE FORCESHORT TERMSTEPPED CURVEPost X-AI FUTURE RESTROSPECTIVE+PROSPECTIVE ACCESS TO INTERMEDIARY STATESHUMAN-AI AGENTClassic approaches are over-passed by NODEs in prediction.
11 RNN(VAE RNN)NODE(Latent ODE)ininteractive QUERYLECTSION OF NEW NRRATIVESINTUITIVE PREDICTIONSMOOOOTH CURVENARRATIVES DETAILEDBRUTE FORCESHOT TERMSTEPPED CURVEPost X-AI FUTURE RESTROSPECTIVE+PROSPECTIVE ACCESS to INTERMEDIary STATESHUMAN-AI AGENTClassic approach to INTERTERDAL NEODEsにより予測される。
NODEs allow interactive Agents.123
NODEs allow Interactive Agents.123
References  Mason Marks.
“Robots in space: Sharing our world with autonomous delivery vehicles”.
In: Available at SSRN 3347466 (2019).
In: SSRN 3347466 (2019)で利用可能。
 Konstantinos G Liakos et al.
 Konstantinos G Liakos et al。
“Machine learning in agriculture: A review”.
In: Sensors 18.8 (2018), p. 2674.
In: Sensors 18.8 (2018), p. 2674。
 Andre Esteva et al.
Andre Esteva et al. Andre Esteva et al.
“Dermatologist-level classiﬁcation of skin cancer with deep neural networks”.
In: nature 542.7639 (2017), pp.
in: nature 542.7639 (2017), pp。
115–118.  Anand Avati et al.
115–118. anand Avati et al.  Anand Avati et al.
“Improving palliative care with deep learning”.
In: BMC medical informatics and decision making 18.4 (2018), pp.
in: bmc medical informatics and decision making 18.4 (2018), pp。
55–64.  Bruce G Buchanan and Edward H Shortsystems: the the Stanford In:
liﬀe. “Rule-based expert MYCIN experiments of Heuristic Programming Project”.
(1984).  Alejandro Barredo Arrieta et al.
(1984). Alejandro Barredo Arrieta et al.  Alejandro Barredo Arrieta et al.