An information theoretic approach to uncertainty
The explanation of uncertainty in the previous section relies only on the variance notion of uncertainty, but information theoretic notions of uncertainty exist, too. Incorporating information theoretic aleatoric uncertainty improves robustness of the total uncertainty estimate (Gal 2016, Hein, Andriushchenko, and Bitterwolf 2019, van Amersfoort et al. 2020). Total uncertainty is measured by Shannon’s entropy:
where
is the dot product operator and
is the number of classes.
The predictive entropy
is available to both Bayesian and non-Bayesian neural networks. In order to
decompose this total uncertainty into the epistemic and aleatoric components, you must estimate
the mutual information
, and this requires a Bayesian approach.