Benjamin Sturgeon

Me in 10 seconds

I'm a researcher working on AI safety, with a focus on understanding agency, both in AI systems, and to understand how powerful AI will impact human agency. Currently pursuing an MPhil in Applied Mathematics at the University of Cape Town with Jonathan Shock, where I'm studying the application of mechanistic interpretability to deeply understand RL models. I also have strong interest in evaluative frameworks and creative ways that we can learn to measure and understand LLMs.

I am also passionate about teaching and growing the field of AI safety, and co-founded AI Safety Cape Town to contribute to that effort.

Outside of research, I draw inspiration from Buddhist philosophy, Stoicism, and Kantian ethics in thinking about how to make meaningful contributions to the field of AI safety. To recharge I spend time playing beach volleyball, watching anime, reading epic fantasy/sci-fi and hiking.

I welcome feedback on how I'm doing! If you'd like to share, please feel encouraged to do so using this feedback form.

Me in many seconds

link to my about page

Now Now Now

What I'm doing now

Inkhaven: 30 Days of Posts

Why Are Anglophone Countries Unhappier Than Their European Counterparts?

From Output to Connection

Attachment Theory Is Extremely Cool and Useful

Q&A

Linkpost: Sanity-Checking 'Incompressible Knowledge Probes'

All posts

Posts

Whole Brain Emulation as an Anchor for AI Welfare

Lessons from my first 10 day Vipassana

Learning how to learn

Why pursue conceptions of agency for AI safety

Vipassana Meditation and Active Inference: A Framework for Understanding Suffering and its Cessation

Projects

Building and training a word embedding system

Creating a small GPT from scratch in Pytorch

Adding vision and navigation to an autonomous farm robot

Developing toy models of agency. A mechanistic interpretability project.

Talks

An intro to AI Safety

Papers

Investigating Factored Cognition in Large Language Models For Answering Ethically Nuanced Questions

A Security Analysis of the Linux RNG Protocol in Virtual Machines

HumanAgencyBench: Do Language Models Support Human Agency?

Pictures

My Resumé

Download my resumé

Connect