From Tokens to Policy:
Causal and Interpretable
Heterogeneous Treatment Effects
Identification

Riccardo R. Cadei1,  Frank F. Otchere2,  Nyasha N. Tirivayi2,  Gustavo A. G.A. Tagliaferro2,  Falco J. F.J. Bargagli-Stoffi3,  Francesco F. Locatello1
1ISTA   2UNICEF   3UCLA

Satellite imagery reveals environmental drivers of anti-poverty program impact invisible to surveys. Click to explore.

Apac, Uganda (2007)
Perennial river presence Uganda · YOP
skilled employment
Apac, Uganda (2007)
Vegetation spatial heterogeneity Uganda · YOP
skilled employment
Apac, Uganda (2007)
Structured agricultural landscape Uganda · YOP
business assets
Bongo, Ghana (2015)
Ephemeral waterways Ghana · LEAP 1000
expenditure
Karaga, Ghana (2015)
Closed-canopy forest Ghana · LEAP 1000
expenditure

Introduction

TL;DR How to identify the causal drivers of treatment-effect heterogeneity entangled in complex measurements (e.g., multi-modal, unstructured), offering prescriptive guidelines for policy adaptation.

Contributions

  • Causal and interpretable framework
    More data doesn't help if we ask the wrong question. We distinguish variables correlated with treatment-effect heterogeneity from those that causally drive it, and show when and how finding the latter is achievable end-to-end — from raw multi-modal measurements to interpretable, actionable policy guidelines.
  • New method
    We introduce NEXIS, an iterative forward-backward selection algorithm that conditions on competing candidates to isolate the true heterogeneity drivers (causal) — with provable selection consistency. Extensively tested and ablated on semi-synthetic experiments, providing practical guidelines for different data regimes.
  • New discoveries in development economics
    We deploy our method (NEXIS) to two anti-poverty program evaluations in Sub-Saharan Africa, extending both with satellite imagery to caputure environmental modifiers. In both programs, environmental features invisible to standard survey analyses emerge as the dominant effect drivers, with concrete prescriptions for program redesign.

Experimental Power Paradox

Effect Modifier

It is a variable for which the treatment effect varies among its strata. Associated with how much a unit benefits from the treatment.

Direct Effect Modifier (Interactor)

It is an effect modifier that causally drives the heterogeneity by directly interacting with the treatment. Crucial for policy optimization, as intervening on it or targeting it amplifies the policy impact.

Example: Consider a cash transfer program targeting mothers with a newborn, designed to increase the monthly household expenditure on infant care. Market access is a latent direct modifier — communities near markets can convert the cash into goods for the newborn more effectively. Road density, visible from satellite imagery, correlates with market presence and surfaces as an effect modifier, but only by common cause. A naive interpretation suggests building roads to amplify the program effect; the actual driver is market presence, so a road built where no market exists leaves the effect unchanged.

Experimental Power Paradox

Marginal screening tests each variable independently and cannot distinguish effect modifiers from interactors: as experimental power grows, it flags every variable correlated with the treatment effect — including non-causal proxies like road density. More data makes spurious discoveries more significant, not fewer, especially in learned representations which are intrinsically entangled.

Metric
Sample size sweep
Fix η =
Effect size sweep
Fix n =

Method

From Tokens to Policy explaining the effect heterogeneity

Procedure:

  1. Experiment. Design a controlled experiment, run and measure any candidate treatment interactors.
  2. Represent. Represent such pre-treatment measurements in an interpretable dictionary with minimal or none human supervision, e.g., training a Sparse Autoencoder (SAE) on a frozen domain-specific foundation model.
  3. Identify. Neural EXposure Interactors Search (NEXIS): retrieve the minimal and sufficient heterogenous effect characterization, by iterative CATE-equivalence testing on the representation coordinates. Interpret the selected coordinates a-posteriori, e.g., querying a VLM on the most and activating observations.
    Theoretical Guarantee

    Given a controlled experiment, faithfulness, and a valid conditional independence test. Assuming:

    • Principal Alignment in the representation
    NEXIS asymptotically retrieves a minimal and sufficient HTE characterization with high probability  (selection consistency)

    If we further assume:

    • Measurement and Representation Sufficiency
    NEXIS characterization is causal and interpretable by construction  (causal identification)
  4. Policy. Optimize the treatment assignment accordingly.

Controlled Experiments — CelebA Benchmark

TopK: Activation:
Metric
Sample size sweep
Fix η =
Effect size sweep
Fix n =

Application 1: Youth Opportunities Program (Uganda)

Program. The Youth Opportunities Program (YOP) was a randomised cash-grant experiment conducted in post-conflict Northern Uganda in 2008. The government awarded one-time grants (~$7,500 USD) to randomly selected groups of young adults, enabling them to pursue skilled artisan or business activity. The program targeted communities in five linguistic sub-regions severely affected by the LRA conflict.

Experiment. The trial enrolled 439 groups across 331 communities (422 geo-coded sites), with individual-level baseline and endline surveys covering labour-market outcomes (skilled employment, business assets) and pre-treatment demographic covariates (age, education, household composition, linguistic group).

Data fusion (ours). We extend the trial with Landsat-7 imagery (2005–2007), embedded via Prithvi-EO and summarised by a 1,024-atom SAE, adding 146 active SAE atoms and spectral indices to the demographic covariates — 170 total NEXIS candidates. The map below shows trial communities colored by linguistic group or treatment assignment.

View:
scroll to zoom · drag to pan · dbl-click to reset

Results

Outcome of interest
Skilled Employment — Key finding NEXIS selects 5 direct modifiers: 3 ethnolinguistic-group interactions (Karamojong, Lugbara, Pallisa) pointing to post-conflict market connectivity, and 2 satellite-derived environmental features following an outside-option logic: program impact on skilled employment is strongest where subsistence alternatives are scarcest. Marginal screening returns 71 features from the same 170 candidates.
GATE and Δ in probability points (s.e. in parentheses)
Direct Effect ModifierGATE activeGATE inactive Δp-value
Linguistic group
Karamojong

Under an active government disarmament campaign through 2011 and isolated from post-LRA reconstruction flows, Karamoja had the weakest market recovery of all subgroups. Without functioning markets to channel the grant into skilled trade, the program had no positive effect on skilled employment in these communities — the sharpest dampening across all discovered modifiers.

−0.030 (0.060)+0.372 (0.022) −0.403 (0.063)7.7×10−10
Lugbara

West Nile's geographic isolation at the DRC–South Sudan border, historical underservice by national infrastructure, and constrained cross-border trade reduced how much of the grant translated into skilled employment. The program remained positive but well below the baseline rate, reflecting barriers to market participation that skill grants alone cannot overcome.

+0.092 (0.061)+0.347 (0.023) −0.255 (0.065)1.6×10−5
Pallisa

Located outside the northern conflict core in eastern Uganda, Pallisa's mixed Iteso–Bagwere farming economy and shorter supply chains to central markets provided conditions for skill-grant success. Where markets had recovered, the grant compounded into durable employment gains — the strongest amplifying effect among all discovered modifiers.

+0.674 (0.058)+0.288 (0.022) +0.386 (0.062)6.3×10−8
Satellite atoms
Perennial river presence

River corridors sustain subsistence fishing and bush-crop farming — persistent outside options that reduce the marginal utility of entering skilled trade. In communities along permanent watercourses the program effect on skilled employment is markedly dampened (outside-option mechanism).

+0.089 (0.098)+0.330 (0.021) −0.242 (0.100)2.1×10−4
Vegetation spatial heterogeneity

A mosaic of agricultural patches and bush signals diverse subsistence livelihoods. Where land cover is varied, households can sustain mixed farming strategies without the grant, dampening its incentive to enter skilled trade (outside-option mechanism).

+0.214 (0.038)+0.373 (0.025) −0.159 (0.045)6.7×10−5
Direct Effect Modifier Explorer
Most active
Least active

Application 2: Livelihood Empowerment Against Poverty (Ghana)

Program. The Livelihood Empowerment Against Poverty 1000 (LEAP 1000) is Ghana's national unconditional cash transfer program, targeting pregnant women and mothers of children under one year of age in the country's poorest districts. A quasi-experimental evaluation using a regression discontinuity design was conducted in Northern Ghana between 2015 and 2017, measuring the program's impact on household consumption and welfare.

Experiment. The evaluation enrolled 2,331 households in 162 communities across the North East, Northern, Savannah, and Upper East regions. Baseline (2015) and endline (2017) surveys provide household-level consumption outcomes and 24 pre-treatment covariates (demographics, asset ownership, household composition). The overall difference-in-differences ATE is +7.35 GH₡/month.

Data fusion (ours). We extend the trial with Landsat-8 2015 imagery, embedded via Prithvi-EO and summarised by a 4,096-atom SAE, combined with the 24 survey covariates — 155 total NEXIS candidates. The map below shows GPS centroids of the 162 trial communities, colored by study region.

View:
scroll to zoom · drag to pan · dbl-click to reset

Results

Key finding NEXIS selects 2 satellite-derived direct modifiers: ephemeral waterways and closed-canopy forest, rare endowments in this predominantly arid savannah landscape that let households direct the transfer into productive investment. No survey covariate survives. Active communities show 6–8× the average program effect. Marginal screening returns 50+ features from the same 155 candidates.
GATE and Δ in GH₵/month, deflated to constant Greater Accra Aug-2017 prices (s.e. in parentheses)
Direct Effect ModifierGATE activeGATE inactive Δp-value
Ephemeral waterways

Narrow seasonal streams and wetland corridors enable micro-irrigation of adjacent plots during the agricultural season. In Northern Ghana's rainfed smallholder system, the transfer can be invested in seeds, fertiliser, or tools that water-adjacent households productively deploy — driving far larger consumption gains than in waterless communities. Active in 6 communities (83 households); temporal satellite analysis confirms cropland expansion and denser riparian vegetation by 2017 in three of them.

+42.9 (14.9)+6.0 (1.0) +36.9 (14.9) 2.1×10−8
Closed-canopy forest

Dense, continuous forest canopy is a rare endowment in this predominantly savannah landscape. Forest-adjacent households access non-timber forest products, fuelwood, and forest-edge cultivation. Because these resources buffer basic consumption needs, the transfer can be redirected from emergency food purchases toward productive assets or dietary diversity — driving substantially larger welfare gains than in open-savannah communities. Active in 5 communities (42 households).

+56.2 (20.8)+6.4 (1.0) +49.8 (20.8) 3.7×10−7
Direct Effect Modifier Explorer
Most active
Least active

Take Aways

Theoretical. Causal and interpretable heterogeneous treatment effect identification is now within reach in controlled experiments — made possible by more extensive pre-treatment measurements and scalable representations.
Applied. Across two anti-poverty programs, satellite-derived environmental features are the dominant direct effect modifiers, explaining treatment heterogeneity not captured by survey covariates.

Citation

@article{cadei2026nexis,
  title   = {From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects Identification},
  author  = {Cadei, Riccardo and Otchere, Frank and Tirivayi, Nyasha and Angeles Tagliaferro, Gustavo and Bargagli-Stoffi, Falco J. and Locatello, Francesco},
  year    = {2026},
  url     = {https://arxiv.org/abs/2606.17010},
  note    = {Under review}
}