all
research
policy/fieldbuilding
self-improvement
culture
other
An Apple-Picking Model of AI R&D | Tom Cunningham – Tom Cunningham
r
11 pieces of advice for children — LessWrong
s-i
The state of AI safety in four fake graphs — LessWrong
p/f
Product Alignment is not Superintelligence Alignment (and we need the latter to survive) — LessWrong
r
Academic Proof-of-Work in the Age of LLMs — LessWrong
o
Fitness-Seekers: Generalizing the Reward-Seeking Threat Model
r
AI Safety Talent Needs in 2026: Insights for Field-Building Organizations
p/f
Facing the Precipice of History - Chanden Climaco
c
AI Safety Needs Startups - by Joshua Landes and LTM
p/f
Your Work Will Change You Whether You Like It Or Not
s-i
Strategic Taste
s-i
Are AIs more likely to pursue on-episode or beyond-episode reward?
r
There should be ‘general managers’ for more of the world’s important problems
p/f
We don't need more founders in AI safety - by Gauraventh
p/f
Lessons from a year of university AI safety field building — LessWrong
p/f
Separating Prediction from Goal-Seeking — LessWrong
r
Two Skillsets You Need to Launch an Impactful AI Safety Project — LessWrong
p/f
How to Design Environments for Understanding Model Motives — LessWrong
r
Martian Interpretability Challenge: The Core Problems In Interpretability — LessWrong
r
Prefill awareness: can LLMs tell when “their” message history has been tampered with? — LessWrong
r
Don't Let LLMs Write For You — LessWrong
s-i
Tell Culture — LessWrong
c
How to win a best paper award (or, an opinionated take on how to do important research)
r
Current activation oracles are hard to use — LessWrong
r
The current SOTA model was released without safety evals — LessWrong
p/f
Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers — LessWrong
r
Good Ideas Aren't Enough in AI Policy - Andrew Wei
p/f
Instant LLM Updates with Doc-to-LoRA and Text-to-LoRA
r
The Stakes of Our Work - by Celeste Li - heart of hearts
p/f
Why AI won’t go well unless sensible people like you speak up and act.
p/f
Persona Parasitology — LessWrong
r
Value systematization: how values become coherent (and misaligned) — AI Alignment Forum
r
The Persona Selection Model: Why AI Assistants might Behave like Humans
r
The 2026 Global Intelligence Crisis - Citadel Securities
o
Questionable practices in machine learning
r
Managed vs Unmanaged Agency — LessWrong
r
Mapping LLM attractor states — LessWrong
r
METR's 14h 50% Horizon Impacts The Economy More Than ASI Timelines — LessWrong
o
Changing the world for the worse — LessWrong
c
You don't create a culture – Signal v. Noise
c
Minimal-trust investigations
c
Alignment to Evil — LessWrong
r
If you don't feel deeply confused about AGI risk, something's wrong — LessWrong
p/f
Aligning to Virtues — LessWrong
r
Did Claude 3 Opus align itself via gradient hacking? — LessWrong
r
My six stages of learning to be a socially normal person
s-i
Good conversations have lots of doorknobs
c
21 Facts About Throwing Good Parties
c
How to Manage Relationships Like a Psychopath
s-i
Most of Your Efforts are Wasted. Here’s the Framework to Fix It.
s-i
Will reward-seekers respond to distant incentives? — LessWrong
r
Two Buckets - by Gauraventh - Dhaniya
p/f
How I've run major projects | benkuhn.net
s-i
Frontier Safety Framework Report - Gemini 3 Pro (November, 2025) v2
p/f
Status Is The Game Of The Losers' Bracket — LessWrong
o
Where is the Capital? An Overview — LessWrong
o
Omniscaling to MNIST — LessWrong
r
Legible vs. Illegible AI Safety Problems — LessWrong
p/f