Publications

List of my publications in reversed chronological order.

ICLR

Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences

Riccardo Cadei; Ilker Demirel; Piersilvio De Bartolomeis; Lukas Lindorfer; Sylvia Cremer; Cordelia Schmid; Francesco Locatello

arXiV (under review)
Workshop on (i) Spurious Correlation and Shortcut Learning and (ii) XAI4Science at ICLR, 2025

Abstract Paper Code Data

A plethora of real-world scientific investigations is waiting to scale with the support of trustworthy predictive models that can reduce the need for costly data annotations. We focus on causal inferences on a target experiment with unlabeled factual outcomes, retrieved by a predictive model fine-tuned on a labeled similar experiment. First, we show that factual outcome estimation via Empirical Risk Minimization (ERM) may fail to yield valid causal inferences on the target population, even in a randomized controlled experiment and infinite training samples. Then, we propose to leverage the observed experimental settings during training to empower generalization to downstream interventional investigations, "Causal Lifting" the predictive model. We propose Deconfounded Empirical Risk Minimization (DERM), a new simple learning procedure minimizing the risk over a fictitious target population, preventing potential confounding effects. We validate our method on both synthetic and real-world scientific data. Notably, for the first time, we zero-shot generalize causal inferences on ISTAnt dataset (without annotation) by causal lifting a predictive model on our experiment variant.

ICLR

Unifying Causal Representation Learning with the Invariance Principle

Dingling Yao; Dario Rancati; Riccardo Cadei; Marco Fumero; Francesco Locatello

International Conference on Learning Representations (ICLR), 2025
Workshop on (i) Causal Representation Learning and (ii) UniReps at NeurIPS, 2024

Abstract Paper Code Short Version (NeurIPS'24)

Causal representation learning aims at recovering latent causal variables from high-dimensional observations to solve causal downstream tasks, such as predicting the effect of new interventions or more robust classification. A plethora of methods have been developed, each tackling carefully crafted problem settings that lead to different types of identifiability. The folklore is that these different settings are important, as they are often linked to different rungs of Pearl's causal hierarchy, although not all neatly fit. Our main contribution is to show that many existing causal representation learning approaches methodologically align the representation to known data symmetries. Identification of the variables is guided by equivalence classes across different data pockets that are not necessarily causal. This result suggests important implications, allowing us to unify many existing approaches in a single method that can mix and match different assumptions, including non-causal ones, based on the invariances relevant to our application. It also significantly benefits applicability, which we demonstrate by improving treatment effect estimation on real-world high-dimensional ecological data. Overall, this paper clarifies the role of causality assumptions in the discovery of causal variables and shifts the focus to preserving data symmetries.

2025

NeurIPS

Smoke and Mirrors in Causal Downstream Tasks

Riccardo Cadei; Lukas Lindorfer; Sylvia Cremer; Cordelia Schmid; Francesco Locatello

Advances in Neural Information Processing Systems (NeurIPS), 2024
Workshop in AI for Science: Scaling in AI for Scientific Discovery at ICML, 2024

Abstract Paper Code Data Short Version (ICML'24)

Machine Learning and AI have the potential to transform data-driven scientific discovery, enabling accurate predictions for several scientific phenomena. As many scientific questions are inherently causal, this paper looks at the causal inference task of treatment effect estimation, where we assume binary effects that are recorded as high-dimensional images in a Randomized Controlled Trial (RCT). Despite being the simplest possible setting and a perfect fit for deep learning, we theoretically find that many common choices in the literature may lead to biased estimates. To test the practical impact of these considerations, we recorded the first real-world benchmark for causal inference downstream tasks on high-dimensional observations as an RCT studying how garden ants (Lasius neglectus) respond to microparticles applied onto their colony members by hygienic grooming. Comparing 6 480 models fine-tuned from state-of-the-art visual backbones, we find that the sampling and modeling choices significantly affect the accuracy of the causal estimate, and that classification accuracy is not a proxy thereof. We further validated the analysis, repeating it on a synthetically generated visual data set controlling the causal model. Our results suggest that future benchmarks should carefully consider real downstream scientific questions, especially causal ones. Further, we highlight guidelines for representation learning methods to help answer causal questions in the sciences. All code and data will be released.

GitHub

NetworkCausalTree: an R package for estimating heterogeneous effects under interference

Costanza Tortù; Falco J. Bargagli Stoffi; Riccardo Cadei; Laura Forastiere

GitHub (under review)

Abstract Paper Code

The NetworkCausalTree package introduces a machine learning method that uses tree-based algorithms and an Horvitz-Thompson estimator to assess the heterogeneity of treatment and spillover effects in clustered network interference. Causal inference studies typically assume no interference between individuals, but in real-world scenarios where individuals are interconnected through social, physical, or virtual ties, the effect of a treatment can spill over to other connected individuals in the network. To avoid biased estimates of treatment effects, interference should be considered. Understanding the heterogeneity of treatment and spillover effects can help policy-makers scale up interventions, target strategies more effectively, and generalize treatment spillover effects to other populations.

2024

ICLR

Projecting the climate penalty on PM_2.5 pollution with spatial deep learning

Mauricio Tec; Riccardo Cadei; Francesca Dominici; Corwin Zigler

Workshop in Tackling Climate Change with Machine Learning at ICLR, 2023

Abstract Paper Code

The climate penalty measures the effects of a changing climate on air quality due to the interaction of pollution with climate factors, independently of future changes in emissions. This work introduces a statistical framework for estimating the climate penalty on soot pollution (PM 2.5), which has been linked to respiratory and cardiovascular diseases and premature mortality. The framework is used to evaluate the disparities in future PM 2.5 exposure across racial/ethnic and income groups. The findings of this study have the potential to inform mitigation policy aiming to protect public health and promote environmental equity in addressing the effects of climate change. The proposed methodology significantly improves upon existing statistical-based methods for estimating the climate penalty. It will use higher-resolution climate inputs---which current statistical approaches cannot accommodate---using an expressive and scalable predictive model based on spatial deep learning with spatiotemporal trend estimation. It will also integrate additional predictive data sources such as demographics and geology. This approach allows us to consider regional dependencies and synoptic weather patterns that influence PM 2.5, and deconvolve them from the effects of exogenous factors, such as the trends in increasing air quality regulations and other sources of unmeasured spatial heterogeneity.

JOSS

CRE: An R package for interpretable discovery and inference of heterogeneous treatment effects

Riccardo Cadei^*; Naeem Khoshnevis^*; Kwonsang Lee; Daniela Maria Garcia; Falco J. Bargagli Stoffi

Journal of Open Source Software

Abstract Paper Code Website

In health and social sciences, it is critically important to identify interpretable subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect (ATE). Several approaches have already been proposed for heterogenous treatment effect (HTE) discovery, either estimating first the conditional average treatment effect (CATE) and identifying heterogeneous subgroups in a second stage, either in a direct data-driven procedure. Many of these methodologies are decision tree-based methodologies. Tree-based approaches are based on efficient and easily implementable recursive mathematical programming (e.g., HTE maximization), they can be easily tweaked and adapted to different scenarios depending on the research question of interest, and they guarantee a high degree of interpretability---i.e., the degree to which a human can understand the cause of a decision. Despite these appealing features, single-tree heterogeneity discovery is characterized by two main limitations: instability in the identification of the subgroups and reduced exploration of the potential heterogeneity. To accommodate these shortcomings, Bargagli et al. (2023) proposed Causal Rule Ensemble, a new method for interpretable HTE characterization in terms of decision rules, via an extensive exploration of heterogeneity patterns by an ensemble-of-trees approach, enforcing high stability in the discovery. CRE is an R package providing a flexible implementation of Causal Rule Ensemble algorithm.

arXiv

Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects

Falco J. Bargagli-Stoffi^*; Riccardo Cadei^*; Kwonsang Lee; Francesca Dominici

arXiv (under review)

Abstract arXiv Code Website

In health and social sciences, it is critically important to identify subgroups of the study population where a treatment has notable heterogeneity in the causal effects with respect to the average treatment effect. Data-driven discovery of heterogeneous treatment effects (HTE) via decision tree methods has been proposed for this task. Despite its high interpretability, the single-tree discovery of HTE tends to be highly unstable and to find an oversimplified representation of treatment heterogeneity. To accommodate these shortcomings, we propose Causal Rule Ensemble (CRE), a new method to discover heterogeneous subgroups through an ensemble-of-trees approach. CRE has the following features: 1) provides an interpretable representation of the HTE; 2) allows extensive exploration of complex heterogeneity patterns; and 3) guarantees high stability in the discovery. The discovered subgroups are defined in terms of interpretable decision rules, and we develop a general two-stage approach for subgroup-specific conditional causal effects estimation, providing theoretical guarantees. Via simulations, we show that the CRE method has a strong discovery ability and a competitive estimation performance when compared to state-of-the-art techniques. Finally, we apply CRE to discover subgroups most vulnerable to the effects of exposure to air pollution on mortality for 35.3 million Medicare beneficiaries across the contiguous U.S.

2023

CVPR

Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective

Yuejiang Liu; Riccardo Cadei; Jonas Schweizer; Sherwin Bahmani; Alexandre Alahi

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022
Workshop on Distribution Shifts at NeurIPS, 2021

Abstract arxiv IEEE Code Short Version (NeuriPS'21)

Learning behavioral patterns from observational data has been a de-facto approach to motion forecasting. Yet, the current paradigm suffers from two shortcomings: brittle under covariate shift and inefficient for knowledge transfer. In this work, we propose to address these challenges from a causal representation perspective. We first introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables, namely invariant mechanisms, style confounders, and spurious features. We then introduce a learning framework that treats each group separately: (i) unlike the common practice of merging datasets collected from different locations, we exploit their subtle distinctions by means of an invariance loss encouraging the model to suppress spurious correlations; (ii) we devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph; (iii) we introduce a style consistency loss that not only enforces the structure of style representations but also serves as a self-supervisory signal for test-time refinement on the fly. Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations, outperforming prior state-of-the-art motion forecasting models for out-of-distribution generalization and low-shot transfer.

2022

JoP

Quantification of the available area for rooftop photovoltaic installation from overhead imagery using convolutional neural networks

Roberto Castello; Alina Walch; Raphael Attias; Riccardo Cadei; Shasha Jiang; Jean-Louis Scartezzini

Journal of Physics: Conference Series, 2021

Abstract IOPscience Code

The integration of solar technology in the built environment is realized mainly through rooftop-installed panels. In this paper, we combine state-of-the-art Machine Learning and computer vision techniques together with high-resolution overhead images to provide a geo-localization of the available rooftop surfaces for solar panel installation. We further associate them to the corresponding buildings by means of a geospatial post-processing approach. The stand-alone Convolutional Neural Network used to segment suitable rooftop areas reaches an intersection over union of 64% and an accuracy of 93%, while a post-processing step using building database improves the rejection of false positives. The model is applied to a case study area in the canton of Geneva and the results are compared with another recent method used in the literature to derive the available area.

Publications

Causal Lifting of Neural Representations: Zero-Shot Generalization for Causal Inferences

Unifying Causal Representation Learning with the Invariance Principle

2025

Smoke and Mirrors in Causal Downstream Tasks

NetworkCausalTree: an R package for estimating heterogeneous effects under interference

2024

Projecting the climate penalty on PM2.5 pollution with spatial deep learning

CRE: An R package for interpretable discovery and inference of heterogeneous treatment effects

Causal Rule Ensemble: Interpretable Discovery and Inference of Heterogeneous Treatment Effects

2023

Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective

2022

Quantification of the available area for rooftop photovoltaic installation from overhead imagery using convolutional neural networks

2021

Projecting the climate penalty on PM_2.5 pollution with spatial deep learning