An Apple-Picking Model of AI R&D | Tom Cunningham – Tom Cunninghamr
11 pieces of advice for children — LessWrongs-i
The state of AI safety in four fake graphs — LessWrongp/f
Product Alignment is not Superintelligence Alignment (and we need the latter to survive) — LessWrongr
Academic Proof-of-Work in the Age of LLMs — LessWrongo
Fitness-Seekers: Generalizing the Reward-Seeking Threat Modelr
AI Safety Talent Needs in 2026: Insights for Field-Building Organizationsp/f
Facing the Precipice of History - Chanden Climacoc
AI Safety Needs Startups - by Joshua Landes and LTMp/f
Your Work Will Change You Whether You Like It Or Nots-i
Strategic Tastes-i
Are AIs more likely to pursue on-episode or beyond-episode reward?r
There should be ‘general managers’ for more of the world’s important problemsp/f
We don't need more founders in AI safety - by Gauraventhp/f
Lessons from a year of university AI safety field building — LessWrongp/f
Separating Prediction from Goal-Seeking — LessWrongr
Two Skillsets You Need to Launch an Impactful AI Safety Project — LessWrongp/f
How to Design Environments for Understanding Model Motives — LessWrongr
Martian Interpretability Challenge: The Core Problems In Interpretability — LessWrongr
Prefill awareness: can LLMs tell when “their” message history has been tampered with? — LessWrongr
Don't Let LLMs Write For You — LessWrongs-i
Tell Culture — LessWrongc
How to win a best paper award (or, an opinionated take on how to do important research)r
Current activation oracles are hard to use — LessWrongr
The current SOTA model was released without safety evals — LessWrongp/f
Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers — LessWrongr
Good Ideas Aren't Enough in AI Policy - Andrew Weip/f
Instant LLM Updates with Doc-to-LoRA and Text-to-LoRAr
The Stakes of Our Work - by Celeste Li - heart of heartsp/f
Why AI won’t go well unless sensible people like you speak up and act.p/f
Persona Parasitology — LessWrongr
Value systematization: how values become coherent (and misaligned) — AI Alignment Forumr
The Persona Selection Model: Why AI Assistants might Behave like Humansr
The 2026 Global Intelligence Crisis - Citadel Securitieso
Questionable practices in machine learningr
Managed vs Unmanaged Agency — LessWrongr
Mapping LLM attractor states — LessWrongr
METR's 14h 50% Horizon Impacts The Economy More Than ASI Timelines — LessWrongo
Changing the world for the worse — LessWrongc
You don't create a culture – Signal v. Noisec
Minimal-trust investigationsc
Alignment to Evil — LessWrongr
If you don't feel deeply confused about AGI risk, something's wrong — LessWrongp/f
Aligning to Virtues — LessWrongr
Did Claude 3 Opus align itself via gradient hacking? — LessWrongr
My six stages of learning to be a socially normal persons-i
Good conversations have lots of doorknobsc
21 Facts About Throwing Good Partiesc
How to Manage Relationships Like a Psychopaths-i
Most of Your Efforts are Wasted. Here’s the Framework to Fix It.s-i
Will reward-seekers respond to distant incentives? — LessWrongr
Two Buckets - by Gauraventh - Dhaniyap/f
How I've run major projects | benkuhn.nets-i
Frontier Safety Framework Report - Gemini 3 Pro (November, 2025) v2p/f
Status Is The Game Of The Losers' Bracket — LessWrongo
Where is the Capital? An Overview — LessWrongo
Omniscaling to MNIST — LessWrongr
Legible vs. Illegible AI Safety Problems — LessWrongp/f