Research
Conference
Parallel Reweighted Wake-Sleep
(2023),
UAI oral
,
Reweighted wake-sleep (RWS) can do Bayesian inference in a very general class of models. It samples from an approximate posterior Q, then uses importance weighting to estimate the true posterior P, then updates Q towards the importance-weighted estimate of P. But the sheer number of samples you need in IW rules out any realistic-size model. We develop "massively parallel RWS", which samples all latent variables and reasons about all possible combinations of samples. You can do this in polynomial time by exploiting conditional independencies. We get a dramatic speedup over standard RWS, which samples the full joint. Authors: Thomas Heap, Gavin Leech, Laurence Aitchison. Total hours I contributed: 300 (mostly on an abandoned branch). |
|
How robust are estimates of Covid policies?
(2020),
NeurIPS Spotlight
,
Full title: How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19? COVID-19 policy studies mostly don't do proper validation - very few papers check their performance on holdout data, and the sensitivity checks they perform are usually really limited. We re-ran one of the famous models, and several variations of our own, and found that the famous model's results depend quite a lot on analysis decisions (ours is a bit more robust). Also a couple theorems about how to interpret the effects: it's not the unconditional effect of doing policy p, it's the average additional effect of p, if you implement it alongside average existing policies (the average in your dataset). Authors: Mrinank Sharma, Sören Mindermann, Jan M. Brauner, Gavin Leech, Anna B. Stephenson, Tomáš Gavenciak, Jan Kulveit, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal. Total hours I contributed: 120. |
Journal
Mass mask-wearing may have reduced Covid transmission
(2022), PNAS
,
Full title: Mask wearing in community settings reduces SARS-CoV-2 transmission A puzzle: measuring individual people tended to find nice big reductions in COVID transmission from masks, like 50%. But society-level studies found random results in [-2%, 40%]. Turns out that the proxy people were using was pretty weak. So we got a much much better proxy, using Facebook's reach to get 20 million data points on where and when people were actually wearing masks. We did a complicated regression model on 56 countries (not including treating the US states as countries), and checked it in 22 ways to make sure that our result wasn't just cherrypicking or a pure correlation. We find that masks can be confidently linked to a 6% - 43% reduction in transmission, where we can't really say what the effect of mandates was. (For comparison, the difference between summer and winter is 42%, or the effect of all government interventions in the first-wave was about 80%.) Authors: Gavin Leech, Charlie Rogers-Smith, Jonas B Sandbrink, Ben Snodin, Rob Zinkov, Benjamin Rader, John S. Brownstein, Yarin Gal, Samir Bhatt, Mrinank Sharma, Soren Mindermann, Jan Markus Brauner, Laurence Aitchison Total hours I contributed: 565. |
|
Seasonal variation in Covid transmission
(2022),
PLOS Computational Biology
,
Full title: Seasonal variation in SARS-CoV-2 transmission in temperate climates: a Bayesian modelling study in 143 European regions We reconstruct the ridiculously complicated causal web involved in making COVID less bad in the summer and then ignore it and estimate one scalar instead. It turns out to be big but not big enough, only about 40% reduced transmission in summer. This provides a really important adjuster for observational studies, and updates unadjusted estimates from last year. Authors: Tomas Gavenciak, Joshua Teperowski Monrad, Gavin Leech, Mrinank Sharma, Soren Mindermann, Jan Markus Brauner, Samir Bhatt, Jan Kulveit Total hours I contributed: 110. |
|
Dataset on Covid interventions
(2022), Nature Scientific Data
,
Full title: A dataset of non-pharmaceutical interventions on SARS-CoV-2 in Europe The dataset from the second wave paper. Authors: George Altman, Janvi Ahuja, Joshua Monrad, Gurpreet Dhaliwal, Charlie Rogers-Smith, Gavin Leech, Benedict Snodin, Jonas B. Sandbrink, Lukas Finnveden, Alexander John Norman, Sebastian B. Oehm, Julia Fabienne Sandkühler, Jan Kulveit, Seth Flaxman, Yarin Gal, Swapnil Mishra, Samir Bhatt, Mrinank Sharma, Sören Mindermann & Jan Markus Brauner |
|
Interventions against Covid, 2nd wave
(2021),
Nature Comms
,
Full title: Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe We looked at how policy effects changed in the second wave (late 2020). This time we did it at a finer level, with the unit roughly 1/20 of a country rather than whole countries. Policies got a bit weaker overall (66% combined reduction, compared to 80% in Spring). The best reading of this is that all the safety measures and individual protective behaviour persisted after the first wave, even when the government said it was ok to stop. School closure was notably weaker (10% instead of 35%). This probably means that the safety measures enforced since Spring really did make them safer. Authors: Mrinank Sharma, Sören Mindermann, Charlie Rogers-Smith, Gavin Leech, Benedict Snodin, Janvi Ahuja, Jonas B. Sandbrink, Joshua Teperowski Monrad, George Altman, Gurpreet Dhaliwal, Lukas Finnveden, Alexander John Norman, Sebastian B. Oehm, Julia Fabienne Sandkühler, Thomas Mellan, Jan Kulveit, Leonid Chindelevitch, Seth Flaxman, Yarin Gal, Swapnil Mishra, Jan Markus Brauner, Samir Bhatt Total hours I contributed: 100. |
|
Covid lineages and the rise of Delta
(2021), Lancet E-Clinical Medicine
,
Full title: Changing composition of SARS-CoV-2 lineages and rise of Delta variant in England By looking at tests and sewage data from early 2021, we saw that "the English variant" of COVID (B.1.1.7) which took over England in December was itself being displaced by other nasty variants. The main worry was that one of the new variants would be resistant to the vaccines. Authors: Swapnil Mishra, Sören Mindermann, Mrinank Sharma, Charles Whittaker, Thomas Mellan, Thomas Wilton, Dimitra Klapsa, Ryan Mate, Martin Fritzsche, Maria Zambon, Janvi Ahuja, Adam Howes, Xenia Miscouridou, Guy P Nason, Oliver Ratmann, Gavin Leech, Julia Fabienne Sandkühler, Charlie Rogers-Smith, Michaela Vollmer, Juliette T Unwin, Yarin Gal, Meera Chand, Axel Gandy, Javier Martin, Erik Volz, Neil M Ferguson, Samir Bhatt, Jan M Brauner, Seth Flaxman Total hours I contributed: 10. ![]() |
|
Government interventions against Covid
(2020),
Science
,
Full title: Inferring the effectiveness of government interventions against COVID-19 We used a hierarchical Bayesian model to see what worked in the first wave of the pandemic. Up to then, people hadn't been able to pick apart the individual effects of anti-COVID policies, instead using "lockdown" to name all the 20 different things that governments tried in spring 2020 (where that should really be the word for stay-at-home-orders). We collected a big new dataset, 41 countries. We were one of the first to spot the really large effect of closing schools & unis, back when people were hoping that children were magically not infectious. Our validation was unusually big and rigorous, for epidemiology. Stay-at-home-orders did surprisingly little (0 to 25% reduction) if you've already closed schools, restaurants, big events. We initially did a cost-benefit analysis of each policy, by surveying (American) people on how much each interferes with their lives, but this wasn't done well enough to get into the final paper. To my knowledge this still hasn't been done (outside of secret government documents), despite it being impossible to make good decisions without it. Authors: Jan M. Brauner, Sören Mindermann, Mrinank Sharma, David Johnston, John Salvatier, Tomáš Gavenčiak, Anna B. Stephenson, Gavin Leech, George Altman, Vladimir Mikulik, Alexander John Norman, Joshua Teperowski Monrad,, Tamay Besiroglu, Hong Ge, Meghan A. Hartwick, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal, Jan Kulveit. Total hours I contributed: 130. |
Workshop
Replications and Reversals in the Social Sciences
(2021),
AIMOS workshop
,
Research results are often not reproducible and/or replicable. Intense self-criticism in psychology over the last 7 years or so has showed that only 40-65% of classic results were replicated. Call these "reversals". Shockingly, they are not incorporated into research training or undergrad education. Authors: Helena Hartmann, Shilaan Alzahawi, Meng Liu, Mahmoud Elsherif, Alaa AlDoh, Gavin Leech, Flavio Azevedo Total hours I contributed: about 200 to the underlying dataset. |
|
Safety Properties of Inductive Logic Programming
(2020),
AAAI SafeAI workshop
,
We look at an obscure kind of machine learning, seeing if it is (or could be) safer than neural networks. We use an existing framework for thinking about AI safety, and formalise it a bit to allow the comparison. Upsides: ILP is convenient for specification, is robust to some syntactic input changes, has greater control over the inductive bias, actually can be formally verified, and the results are pretty interpretable (you can read the model and see how it is built). But ILP is (so far) limited to domains where you have nice neat symbolic data, it can't do architecture search, and its performance lags far behind NNs on almost all tasks. Hybrid systems of ILP and NNs look like they would lose most of what we like about ILP in the first place. Authors: Gavin Leech, Nandi Schoots, Joar Skalse Total hours I contributed: 100 (plus a crapton learning Metagol and learnability stuff). |
Preprint
Steering Language Models Without Optimisation
(2023)
,
Full title: Activation Addition: Steering Language Models Without Optimization We investigate activation engineering: modifying the activations of a language model at inference time to predictably alter its behavior. It works by adding a bias to the forward pass, a 'steering vector' implicitly specified through normal prompts. Activation Addition (ActAdd) computes these by taking the activation differences of pairs of prompts. We get control over high-level properties of the output without damaging the model's performance. ActAdd takes far less compute and implementation effort compared to finetuning or RLHF, allows nontechnical users to provide natural language specifications, and it scales really naturally with model size. This is the first (superficial-) alignment method which doesn't need training data or gradient descent. Authors: Alex Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid. Total hours I contributed: 190. |
|
![]() |
Decision trees compensate for misspecification
(2021)
My friend Hugh was convinced that linear models should be able to match decision tree ensembles in many places, and we had both seen stuff like depth-12 trees used in industry, which just don't make sense on a naive account. We contrived a few neat stochastic processes to test a few ways that the depth of the trees could be helping with various realistic data flaws. Turns out that depth partially fixes a bunch of modelling and training errors: if you don't run training long enough, if you use the wrong link function, if you assume single responses when the response is actually composite, if you assume homogeneity in a heterogeneous population, if there are missing variables. Also a couple of simple extensions to Gaussian mixture modelling with lines. Our linear fixes didn't work that well on randomly selected real data though. Total hours I contributed: 50. |
Legally Grounded Fairness
(2020)
,
We try to work around the famous impossibility results in algorithmic fairness, by using legal damages as a signal about all-things-considered unfairness. This lets us use multiple definitions of fairness at once and set the weight on each in a non-arbitrary way. A human picks a set of fairness definitions; A human gives the algorithm a set of past cases, along with the damages awarded in each case; LGFO works out how much weight to give each kind of fairness, and so produces a classifier which is relatively fair, if we trust the legal system to know this relatively well. Authors: Dylan Holden-Sim, Gavin Leech, Laurence Aitchison Total hours I contributed: 20. |
Popular
- Can policymakers trust forecasters? (2023)
- Total rewrite of the AI alignment wikipedia page with Mantas Manzeikas and Sören Mindermann (2022)
- Some shallow investigations of modern psychometrics and talent scouting (2023).
- Scoring the Big Three (2022)
- Learning from crisis (2022)
- Comparing top forecasters and domain experts (2022)
- Reversals in psychology (2020)
- The academic contribution to AI safety seems large (2020)
- Existential risk as common cause (2018)
- Side effects in Gridworlds (2018). Developed further.
Service
- Briefed the UK Cabinet Office COVID-19 Task Force on mask policy.
- Briefed the UK Cabinet Office on AI economics.
- Reviewer for PNAS, Machine Learning, BMJ Global Health, BMC Medicine, AI Safety Camp, PIBBSS.
- Created a 1000-paper bibliography on every angle of the AI problem.
Media
- Clearer Thinking
- Masks: BBC, ACX, New York Times, Wired, Guardian, Mail, Marginal Revolution, Gelman
- Psychology: Nature, Gelman, Coyne, Everything Hertz, Stronger by Science.
Teaching
- 2022, 2023: Lead instructor for ESPR. (Metaphilosophy and metamathematics)
- 2021: Course designer and instructor, ESPR. (Metascience, cultural literacy, speculative cosmology.)
- 2020: Lead TA for COMS20010: Algorithms 2.
- 2019: TA for the fearsome COMS30007: Bayesian Machine Learning.
Credit to James Walsh for the academic SVGs.