A General Method for Measuring Calibration of Probabilistic Neural Regressors

Spencer Young, Porter Jenkins
Abstract: As machine learning systems become increasingly integrated into real-world applications, accurately representing uncertainty is crucial for enhancing their robustness and reliability. Neural networks are effective at fitting high-dimensional probability distributions but often suffer from poor calibration, leading to overconfident predictions. In the regression setting, we find that existing metrics for quantifying model calibration, such as Expected Calibration Error (ECE) and Negative Log Likelihood (NLL), introduce bias, require parametric assumptions, and suffer from information theoretic bounds on their estimating power. We propose a new approach using conditional kernel mean embeddings to measure calibration discrepancies without these shortcomings. Preliminary experiments on synthetic data demonstrate the method's potential, with future work planned for more complex applications.

Flexible Heteroscedastic Count Regression with Deep Double Poisson Networks

Spencer Young, Porter Jenkins, Longchao Da, Jeff Dotson, Hua Wei
Abstract: Neural networks that can produce accurate, input-conditional uncertainty representations are critical for real-world applications. Recent progress on heteroscedastic continuous regression has shown great promise for calibrated uncertainty quantification on complex tasks, like image regression. However, when these methods are applied to discrete regression tasks, such as crowd counting, ratings prediction, or inventory estimation, they tend to produce predictive distributions with numerous pathologies. We propose to address these issues by training a neural network to output the parameters of a Double Poisson distribution, which we call the Deep Double Poisson Network (DDPN). In contrast to existing methods that are trained to minimize Gaussian negative log likelihood (NLL), DDPNs produce a proper probability mass function over discrete output. Additionally, DDPNs naturally model under-, over-, and equi-dispersion, unlike networks trained with the more rigid Poisson and Negative Binomial parameterizations. We show DDPNs 1) vastly outperform existing discrete models; 2) meet or exceed the accuracy and flexibility of networks trained with Gaussian NLL; 3) produce proper predictive distributions over discrete counts; and 4) exhibit superior out-of-distribution detection.

Learning Graph Structures and Uncertainty for Accurate and Calibrated Time-series Forecasting

Harshavardhan Kamarthi, Lingkai Kong, Alexander Rodríguez, Chao Zhang, B. Aditya Prakash
Abstract: Multi-variate time series forecasting is an important problem with a wide range of applications. Recent works model the relations between time-series as graphs and have shown that propagating information over the relation graph can improve time series forecasting. However, in many cases, relational information is not available or is noisy and reliable. Moreover, most works ignore the underlying uncertainty of time-series both for structure learning and deriving the forecasts resulting in the structure not capturing the uncertainty resulting in forecast distributions with poor uncertainty estimates. We tackle this challenge and introduce STOIC, that leverages stochastic correlations between time-series to learn underlying structure between time-series and to provide well-calibrated and accurate forecasts. Over a wide-range of benchmark datasets STOIC provides around 16% more accurate and 14% better-calibrated forecasts. STOIC also shows better adaptation to noise in data during inference and captures important and useful relational information in various benchmarks.

ForeCal: Random Forest-based Calibration for DNNs

Dhruv Nigam
Abstract: Deep neural network(DNN) based classifiers do extremely well in discriminating between observations, resulting in higher ROC AUC and accuracy metrics, but their outputs are often miscalibrated with respect to true event likelihoods. Post-hoc calibration algorithms are often used to calibrate the outputs of these classifiers. Methods like Isotonic regression, Platt scaling, and Temperature scaling have been shown to be effective in some cases but are limited by their parametric assumptions and/or their inability to capture complex non-linear relationships. We propose ForeCal - a novel post-hoc calibration algorithm based on Random forests. ForeCal exploits two unique properties of Random forests: the ability to enforce weak monotonicity and range-preservation. It is more powerful in achieving calibration than current state-of-the-art methods, is non-parametric, and can incorporate exogenous information as features to learn a better calibration function. Through experiments on 43 diverse datasets from the UCI ML repository, we show that ForeCal outperforms existing methods in terms of Expected Calibration Error(ECE) with minimal impact on the discriminative power of the base DNN as measured by AUC.

Addressing Graph Anomaly Detection via Causal Edge Separation and Spectrum

Zengyi Wo, Wenjun Wang, Minglai Shao, Chang Liu, Yumeng Wang, Yueheng Sun
Abstract: In the real world, anomalous entities often add more legitimate connections while hiding direct links with other anomalous entities, leading to heterophilic structures in anomalous networks that most GNN-based techniques fail to address. Several works have been proposed to tackle this issue in the spatial domain. However, these methods overlook the complex relationships between node structure encoding, node features, and their contextual environment and rely on principled guidance, research on solving spectral domain heterophilic problems remains limited. This study analyzes the spectral distribution of nodes with different heterophilic degrees and discovers that the heterophily of anomalous nodes causes the spectral energy to shift from low to high frequencies. To address the above challenges, we propose a spectral neural network CES2-GAD based on causal edge separation for anomaly detection on heterophilic graphs. Firstly, CES2-GAD will separate the original graph into homophilic and heterophilic edges using causal interventions. Subsequently, various hybrid-spectrum filters are used to capture signals from the segmented graphs. Finally, representations from multiple signals are concatenated and input into a classifier to predict anomalies. Extensive experiments with real-world datasets have proven the effectiveness of the method we proposed.

Uncertainty-aware segmentation for rainfall prediction post processing

Simone Monaco, Luca Monaco, Daniele Apiletti
Abstract: Accurate precipitation forecasts are crucial for applications such as flood management, agricultural planning, water resource allocation, and weather warnings. Despite advances in numerical weather prediction (NWP) models, they still exhibit significant biases and uncertainties, especially at high spatial and temporal resolutions. To address these limitations, we explore uncertainty-aware deep learning models for post-processing daily cumulative quantitative precipitation forecasts to obtain forecast uncertainties that lead to a better trade-off between accuracy and reliability. Our study compares different state-of-the-art models, and we propose a variant of the well-known SDE-Net, called SDE U-Net, tailored to segmentation problems like ours. We evaluate its performance for both typical and intense precipitation events. Our results show that all deep learning models significantly outperform the average baseline NWP solution, with our implementation of the SDE U-Net showing the best trade-off between accuracy and reliability. Integrating these models, which account for uncertainty, into operational forecasting systems can improve decision-making and preparedness for weather-related events.

Out-of-Distribution Detection for Heterogeneous Graph Neural Networks

Tao Yin, Chen Zhao, Minglai Shao
Abstract: Heterogeneous Graph Neural Networks (HGNNs) effectively extract rich node and structural information from heterogeneous graphs. However, in real-world scenarios, due to biased sampling, distribution shifts, and anomalies, there exist out-of-distribution (OOD) nodes in heterogeneous graphs. Although existing HGNNs have achieved good performance in in-distribution (ID) node classification tasks, no prior research has focused on the problem of OOD detection in heterogeneous graphs. Therefore, we propose a method for OOD detection in heterogeneous graphs (OODHG), which aims to identify OOD nodes while classifying ID nodes. Specifically, we calculate the energy score of each node and propagate these scores, fully considering the structural information of the heterogeneous graph. Experimental results demonstrate that our method outperforms baselines in both OOD detection and ID node classification tasks.

HyperDG: A Hypergraph-Based Approach for Dynamic Graph Node Classification under Spatio-Temporal Shift

Xiaoxu Ma, Chen Zhao, Minglai Shao
Abstract: Accurate node classification on dynamic graphs, where node structure, attributes, and labels change over spatio-temporal distributions, remains a challenging problem. Existing methods based on RNNs and self-attention mechanisms struggle to effectively capture the diverse dynamic variations in dynamic graphs. To address this, we propose a novel multi-scale hypergraph-based dynamic graph node classification algorithm (HyperDG). This algorithm uses two modules for hypergraph modeling of dynamic graph nodes: the individual-level hypergraph captures diverse temporal representations among individual nodes, while the group-level hypergraph captures multi-scale temporal group representations among similar nodes. Each hyperedge connects multiple nodes within specific time ranges to capture dependencies at different scales. By propagating and aggregating weighted information through hypergraph neural networks, more accurate temporal dependency representations are obtained. Extensive experiments on five dynamic graph datasets, conducted using two backbone models, demonstrate the superiority of our proposed framework.

How to Leverage Predictive Uncertainty Estimates for Reducing Catastrophic Forgetting in Online Continual Learning

Giuseppe Serra, Ben Werner, Florian Buettner
Abstract: Many real-world applications require machine-learning models to be able to deal with non-stationary data distributions and thus learn autonomously over an extended period of time, often in an online setting. One of the main challenges in this scenario is the so-called catastrophic forgetting (CF) for which the learning model tends to focus on the most recent tasks while experiencing predictive degradation on older ones. In the online setting, the most effective solutions employ a fixed-size memory buffer to store old samples used for replay when training on new tasks. Many approaches have been presented to tackle this problem. However, it is not clear how predictive uncertainty information for memory management can be leveraged in the most effective manner and conflicting strategies are proposed to populate the memory. Are the easiest-to-forget or the easiest-to-remember samples more effective in combating CF? Starting from the intuition that predictive uncertainty provides an idea of the samples' location in the decision space, this work presents an in-depth analysis of different uncertainty estimates and strategies for populating the memory. The investigation provides a better understanding of the characteristics data points should have for alleviating CF. Then, we propose an alternative method for estimating predictive uncertainty via the generalised variance induced by the negative log-likelihood. Finally, we demonstrate that the use of predictive uncertainty measures helps in reducing CF in different settings.

Measuring Aleatoric and Epistemic Uncertainty in LLMs: Empirical Evaluation on ID and OOD QA Tasks

Kevin Wang, Subre Abdoul Moktar, Jia Li, Kangshuo Li, Feng Chen
Abstract: Large Language Models (LLMs) have become increasingly pervasive, finding applications across many industries and disciplines. Ensuring the trustworthiness of LLM outputs is paramount, where Uncertainty Estimation (UE) plays a key role. In this work, a comprehensive empirical study is conducted to examine the robustness and effectiveness of diverse UE measures regarding aleatoric and epistemic uncertainty in LLMs. It involves twelve different UE methods and four generation quality metrics including LLMScore from LLM criticizers to evaluate the uncertainty of LLM-generated answers in Question-Answering (QA) tasks on both in-distribution (ID) and out-of-distribution (OOD) datasets. Our analysis reveals that information-based methods, which leverage token and sequence probabilities, perform exceptionally well in ID settings due to their alignment with the model's understanding of the data. Conversely, density-based methods and the P(True) metric exhibit superior performance in OOD contexts, highlighting their effectiveness in capturing the model's epistemic uncertainty. Semantic consistency methods, which assess variability in generated answers, show reliable performance across different datasets and generation metrics. These methods generally perform well but may not be optimal for every situation.

Bayesian Disease Progression Modeling That Accounts For Health Disparities

Erica Chiang, Ashley Beecy, Gabriel Sayer, Nir Uriel, Deborah Estrin, Nikhil Garg, Emma Pierson
Abstract: Disease progression models, in which a patient's latent severity is modeled as progressing over time and producing observed symptoms, have developed great potential to help with disease detection, prediction, and drug development. However, a significant limitation of existing models is that they do not typically account for healthcare disparities that can bias the observed data. We draw attention to three key disparities: certain patient populations may (1) start receiving care only when their disease is more severe, (2) experience faster disease progression even while receiving care, or (3) receive care less frequently conditional on disease severity. To address this, we develop an interpretable Bayesian disease progression model that captures these three disparities. We show theoretically and empirically that our model correctly estimates disparities and severity from observed data, and that failing to account for these disparities produces biased estimates of severity.

IDGG: Invariant Learning for Out-of-Distribution Generalization on Graphs

Qin Tian, Wenjun Wang, Minglai Shao, Chen Zhao, Dong Li
Abstract: Traditional machine learning methods rely heavily on the independent and identically distributed (i.i.d) assumption, which poses limitations when test distributions differ from training distributions. To address this, out-of-distribution (OOD) generalization has made significant progress, aiming to maintain performance despite unknown distribution shifts. However, OOD methods for graph-structured data are underexplored due to challenges such as simultaneous distribution shifts in node attributes and graph topology, and the difficulty in capturing invariant information across different distribution shifts. To tackle these challenges, we introduce IDGG, a framework designed to (1) diversify variations across domains by modeling potential variations in the distribution of attribute distribution and topological structure, and (2) minimize variation discrepancy in a representation space for predicting semantic factors that do not vary with distribution shifts. We validate the effectiveness of IDGG on a real-world graph dataset and results show that our model outperforms baseline methods in node-level OOD generalization.

Semantic OOD Detection under Covariate Shift on Graphs with Diffusion Model

Zhixia He, Chen Zhao, Minglai Shao, Yujie Lin, Dong Li
Abstract: Most existing deep learning methods are based on the closed-world assumption, which assumes that testing data are sampled from the same distribution as the training data. However, when models are deployed in open-world scenarios, testing data can exhibit various distribution shifts. Here, we mainly consider two types of distribution shifts: covariate shift and semantic shift. We raise a new question for OOD detection: semantic OOD detection under covariate shift. Unlike existing graph OOD algorithms only considering one type of distribution shifts, our work addresses both semantic shift and covariate shift simultaneously to fully utilize label and environment information. We propose the Graph Diffusion Augmentation (GDA) framework, which leverages diffusion model to simulate these two types of distribution shifts by semantic augmentation and covariate augmentation. We design perturbation control element and label control element to guide the denoising diffusion process at different intermediate timesteps which generate pseudo ID and pseudo OOD data. The GDA model generates diverse pseudo graphs with consistent predictive relationships, helping the predictor learn invariant relationship related to the label information while ignoring spurious relationship with the environment.