Reading — Kaustubh Kislay

Favorites

Strategy as a Wicked Problem (Camillus, 2008)
Why You Can’t Just Do Things - Octopusyarn
Strategic Taste
How to win a best paper award (or, an opinionated take on how to do important research)
How I've run major projects | benkuhn.net

Everything I've read

[r]The behavioral selection model for predicting AI motivations — LessWrong
[r]Your Left Brain Doesn't Trade With Your Right — LessWrong
[p/f]escaping flatland: career advice for CS undergrads
[c]It's nice of you to worry about me, but I really do have a life — LessWrong
[p/f]How to Actually Spend Billions on AI Safety - by Sophie Kim
[p/f]A playbook for field strategy - by Dewi Erwan
[r]Natural Language Autoencoders \ Anthropic
[r]Risk from fitness-seeking AIs: mechanisms and mitigations
[r]Not a Paper: "Frontier Lab CEOs are Capable of In-Context Scheming" — LessWrong
[p/f]Not All Compute is Created Equal
[r]Fail safe(r) at alignment by channeling reward-hacking into a "spillway" motivation
[s-i]Half A Month Of Consolation Writing Advice
[c]The Orange - The Gladdest Thing
[s-i]An extremely non-comprehensive list of how to increase your surface area for luck and magic (and instantly sprinkle fairy dust on your life)
[r]From personas to intentions: towards a science of motivations for AI models — LessWrong
[c]Annoyingly Principled People, and what befalls them — LessWrong
[r]Hidden Role Games as a Trusted Model Eval - James Lucassen's Blog
[r]Model organisms researchers should check whether high LRs defeat their model organisms — LessWrong
[r]Steering Might Stop Working Soon — LessWrong
[s-i]Do Thing, Do One Thing
[p/f]What 3,654 Job Postings Tell Us About Talent Needs in AI Safety — EA Forum
[r]If Mythos actually made Anthropic employees 4x more productive, I would radically shorten my timelines
[c]Why You Can’t Just Do Things - Octopusyarn
[r]Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes
[p/f]AI Populism's Warning Shots - by Jasmine Sun
[s-i]How to walk through walls - by Henrik Karlsson
[r][2603.20639] Agentic AI and the next intelligence explosion
[s-i]A woefully incomplete guide to technical upskilling
[r]An Apple-Picking Model of AI R&D | Tom Cunningham – Tom Cunningham
[c]11 pieces of advice for children — LessWrong
[p/f]The state of AI safety in four fake graphs — LessWrong
[r]Product Alignment is not Superintelligence Alignment (and we need the latter to survive) — LessWrong
[c]Academic Proof-of-Work in the Age of LLMs — LessWrong
[r]Fitness-Seekers: Generalizing the Reward-Seeking Threat Model
[p/f]AI Safety Talent Needs in 2026: Insights for Field-Building Organizations
[c]Facing the Precipice of History - Chanden Climaco
[p/f]AI Safety Needs Startups - by Joshua Landes and LTM
[s-i]Your Work Will Change You Whether You Like It Or Not
[s-i]Strategic Taste
[r]Are AIs more likely to pursue on-episode or beyond-episode reward?
[p/f]There should be ‘general managers’ for more of the world’s important problems
[p/f]We don't need more founders in AI safety - by Gauraventh
[p/f]Lessons from a year of university AI safety field building — LessWrong
[r]Separating Prediction from Goal-Seeking — LessWrong
[p/f]Two Skillsets You Need to Launch an Impactful AI Safety Project — LessWrong
[r]How to Design Environments for Understanding Model Motives — LessWrong
[r]Martian Interpretability Challenge: The Core Problems In Interpretability — LessWrong
[r]Prefill awareness: can LLMs tell when “their” message history has been tampered with? — LessWrong
[c]Don't Let LLMs Write For You — LessWrong
[c]Tell Culture — LessWrong
[r]How to win a best paper award (or, an opinionated take on how to do important research)
[r]Current activation oracles are hard to use — LessWrong
[p/f]The current SOTA model was released without safety evals — LessWrong
[r]Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers — LessWrong
[p/f]Good Ideas Aren't Enough in AI Policy - Andrew Wei
[r]Instant LLM Updates with Doc-to-LoRA and Text-to-LoRA
[p/f]The Stakes of Our Work - by Celeste Li - heart of hearts
[p/f]Why AI won’t go well unless sensible people like you speak up and act.
[r]Persona Parasitology — LessWrong
[r]Value systematization: how values become coherent (and misaligned) — AI Alignment Forum
[r]The Persona Selection Model: Why AI Assistants might Behave like Humans
[c]The 2026 Global Intelligence Crisis - Citadel Securities
[r]Questionable practices in machine learning
[r]Managed vs Unmanaged Agency — LessWrong
[r]Mapping LLM attractor states — LessWrong
[r]METR's 14h 50% Horizon Impacts The Economy More Than ASI Timelines — LessWrong
[c]Changing the world for the worse — LessWrong
[c]You don't create a culture – Signal v. Noise
[c]Minimal-trust investigations
[r]Alignment to Evil — LessWrong
[p/f]If you don't feel deeply confused about AGI risk, something's wrong — LessWrong
[r]Aligning to Virtues — LessWrong
[r]Did Claude 3 Opus align itself via gradient hacking? — LessWrong
[s-i]My six stages of learning to be a socially normal person
[c]Good conversations have lots of doorknobs
[c]21 Facts About Throwing Good Parties
[s-i]How to Manage Relationships Like a Psychopath
[s-i]Most of Your Efforts are Wasted. Here’s the Framework to Fix It.
[r]Will reward-seekers respond to distant incentives? — LessWrong
[p/f]Two Buckets - by Gauraventh - Dhaniya
[s-i]How I've run major projects | benkuhn.net
[r]Frontier Safety Framework Report - Gemini 3 Pro (November, 2025) v2
[c]Status Is The Game Of The Losers' Bracket — LessWrong
[c]Where is the Capital? An Overview — LessWrong
[r]Omniscaling to MNIST — LessWrong
[r]Legible vs. Illegible AI Safety Problems — LessWrong