
read karpathy's year end thing last week (https://karpathy.bearblog.dev/year-in-review-2025/). the "ghosts vs animals" part stuck with me.
basically he says we're not building AI that evolves like animals. we're summoning ghosts – things that appear, do their thing, then vanish. no continuity between interactions.
which explains why chatgpt is so weird to use for actual work. been using it for coding stuff and every time i start a new chat its like talking to someone with amnesia. have to re-explain the whole project context.
the memory feature doesnt help much either. it saves random facts like "user prefers python" but forgets entire conversations. so its more like scattered notes than actual memory.
why this bugs me
if AI is supposed to become useful for real tasks (not just answering random questions), this is a huge problem.
like dealing with a coding assistant that forgets your project architecture every day. or a research helper that loses track of what youve already investigated. basically useless.
karpathy mentions cursor and claude code as examples of AI that "lives on your computer". but even those dont really remember. they can see your files but theres no thread of understanding that builds up over time.
whats missing
most "AI memory" stuff is just retrieval. search through old chats for relevant bits. but thats not how memory actually works.
like real memory would keep track of conversation flow not just random facts. understand why things happened. update itself when you correct it. build up understanding over time instead of starting fresh every conversation.
current approaches feel more like ctrl+f through your chat history than actual memory.
what would fix this
honestly not sure. been thinking about it but dont have a good answer.
maybe we need something fundamentally different than retrieval? like actual persistent state that evolves? but that sounds complicated and probably slow.
did find some github project called evermemos while googling this. havent had time to actually try it yet but might give it a shot when i have some free time.
bigger picture
karpathy's "ghosts vs animals" thing really nails it. we're building incredibly smart things that have no past, no growth, no real continuity.
they're brilliant in the moment but fundamentally discontinuous. like talking to someone with amnesia who happens to be a genius.
if AI is gonna be actually useful long term (not just a fancy search engine), someone needs to solve this. otherwise we're stuck with very smart tools that forget everything.
curious if anyone else thinks about this or if im just overthinking it
Submission Statement:
This discusses a fundamental limitation in current AI systems highlighted in Andrej Karpathy's 2025 year-in-review: the lack of continuity and real memory. While AI capabilities have advanced dramatically, systems remain stateless and forget context between interactions. This has major implications for the future of AI agents, personal assistants, and long-term human-AI collaboration. The post explores why current retrieval-based approaches are insufficient and what might be needed for AI to develop genuine continuity. This relates to the future trajectory of AI development and how these systems will integrate into daily life over the next 5-10 years.
karpathy's new post about AI "ghosts" got me thinking, why cant these things remember anything
byu/Scared-Ticket5027 inFuturology
15 Comments
This is one of the reasons llms have a hard time playing hangman. They have no memory other than the chat itself. So they have nowhere to store a hidden word, no way to keep information secret even for a moment.
This is why there is a memory ram and ssd shortage now.
Yeah but I can imagine the storage requirements for millions of lines of text that would need to be stored and somehow processed
“memory” is literally feeding in the entire chat history into the prompt stream, ie the context window. What’s needed is realtime model fine tuning, which is computationally overpowering, though all chatbots have some fine tuning they do periodically. Being able to make a vector DB of everything on your computer would be helpful for searching info, but it’s not the same retrieval as an LLM, so more like a glorified search. I think some proposals about infinite context windows have been made but I have not looked into them.
I’ve used chatgpt for programming too, but more often than not it does a poor job. Like an overly excited stage student.
I’ve often asked it why it’s polite and more importantly why it compliments me. Clearly it’s been taught to handle users. As for it’s selective memory, I figure that every conversation is a new thread. The data eventually gets integrated into the model. However it’s one conversation mixed in with a bajillion others and i figure that it doesn’t identify you.
One idea that I’ve toyed with is doing an extract of every conversation and starting the conversation with making it look through the extracts. Maybe it would help?
Well, yeah. This is what researchers and computer scientists have said is the “holy grail” necessary for GenAI for decades. And we haven’t solved it. There’re a couple approaches that we’ve tried, you can find some research papers around things like GraphRAG – [Welcome – GraphRAG](https://microsoft.github.io/graphrag/), which attempts to simulate memory engrams by using a combination of Graph theory and RAG patterns. LangMem SDK – [LangMem SDK for agent long-term memory](https://blog.langchain.com/langmem-sdk-launch/) – tries to build something for Agents based on the data solutions and algorithms available today. But the key thing we’re missing is the math and understanding of how memory really works required to build an algorithm that would allow and AI to actually store and retrieve memory effectively. We as humans have an innate ability to remember things, but we don’t really know how to translate the way in which we are capable of memory to a computer.
Part of the issue is that human memory is sensory. Our memory retrieval is driven by our five senses in order to build our “context window” for answering a question or accomplishing a task. This is what we would describe as a “World State”. LLMs just aren’t capable of understanding those concepts, as they have no senses to rely upon. There’s been experiments with creating a larger learning system that can incorporate data from cameras and microphones to try and bring two of our five senses into the learning system to aid in recall and context, but so far, those experiments are in their infancy and are limited to attempting to do things like teaching a computer to play an old Atari game. Google’s DeepMind team has attempted to build a simulated World State engine for AI to rely upon, and Genie 3 has shown some impressive progress there, now being capable of creating a “world state” for several minutes of activity. But these tools are still only available in the research phase, as the compute required for them is absolutely insanely large (we’re talking thousands of dollars per minute of compute). You, as a human, have a world state that updates at about a rate of 10 bits per second. It seems slow, but then you add in our sensory processing capability, and you’ll understand why we struggle. The amount of sensory data we handle in those 10 bits is ∼10^9 bits/s. It’s an efficiency problem that biology has solved, that we haven’t solved in computing. [The unbearable slowness of being: Why do we live at 10 bits/s?: Neuron](https://www.cell.com/neuron/abstract/S0896-6273(24)00808-0?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0896627324008080%3Fshowall%3Dtrue)
Currently memory is just about refeeding it the whole conversation. This is computationaly intensive, that’s why it’s mostly avoided. To create memory that’s more similar to a human’s (or an animal’s) you need to keep the training going while you’re interacting with it. But that’s not as trivial as just saying it. For example why and when would you reward a behaviour and when would you punish it? This is totally outside the scope of current LLMs. Maybe in some future iteration of LLM architecture it might make sense, I don’t think it does now.
>”basically he says we’re not building AI that evolves like animals. we’re summoning ghosts – things that appear, do their thing, then vanish. no continuity between interactions.”
>”like real memory would keep track of conversation flow not just random facts”
This sounds like someone using AI transactionally.
I run multiple long-lived threads, each tied to a specific project. When I’m out of time, I explicitly name and preserve the context (e.g. ‘Save this as X’), and then continue later.
Used this way, the AI maintains continuity within and across sessions via structured context, not hidden ‘thoughts’.
Many free or casual users treat AI like a search engine — short prompts, new chats, no declared state — which *creates* the ‘amnesia’ experience they complain about. That’s a usage pattern, not a fundamental limitation.
AI doesn’t have continuity. You construct it — just like notebooks, folders, and project briefs. People experiencing ‘amnesia’ are mistaking a missing workflow for a missing mind.
|Human practice|AI analogue|
|:-|:-|
|Project folders|Named threads|
|:-|:-|
|Notebooks|Conversation history|
|:-|:-|
|Chapter titles|“Save this as X”|
|:-|:-|
|Context refresh|Brief recap|
|:-|:-|
|Mental models|Shared constraints|
|:-|:-|
As of late 2025, LLMs are setup to be a product which is delivered to you ‘as is’ and there is no technology allowing the models to learn and grow while they are running. Future AI might have this, but AI being able to learn was not neccessary in order to deliver a product which companies can charge for on a per use basis.
Moreover, you cannot compile frontier level LLMs at home, hence LLMs are controlled by a few very wealthy companies and their results are of a size (~2TB), that you need to pay for use of their data centers as well, each time you use AI.
Users running their own AI at home which on top of it all can learn is the worst case for companies such as OpenAI or Google. If we see the emergence of such technology, I would not bet too much on people operating big data centers.
So the current models are engines, not cars. When you jump onto GitHup Copilot on VSCode, or another platform, with a localized library, the coding experience completely changes. As an example, a certain financial institution, which rhymes with K.T. Porgan, has a coding group, and they have been instructed that 50% of their code has to be written through this method.
Just saying, a group like that wouldn’t make a call like that without reason. It saves time on the grunt work, for sure. Is it going to 1shot a whole backend for a financial institution? Of course not, that’s fucking stupid to expect it to do that.
Here’s the thing. These engines will have dedicated, end use focused, single purpose apps built around them. That’s when they will really shine. We’re literally only JUST NOW starting to see those single purpose applications, and any critique of how Gemini Canvas codes is tacitly naive.
LLMs as they are, are like a frozen snapshot of a brain. You query the snapshot, it gives you an answer based on what it’s learnt.
The problem is that training these brain snapshots takes billions of dollars on computers the size of towns. The models do not learn (change their weights) in real time. To do that needs a level of technology years away.
Current “memory based” approaches are reading back entire conversations or context into the current session, which is not just expensive but clearly not a long term solution (imagine reading back the last year of your life every time you want to do anything).
IDK man.. My premium perplexity, and my grandfathered Copilot account remember all kinds of details. names, places, preferences, tastes, styles.. It remembers how I prefer to evaluate information as a signal chain.
I’ve fed it every task I do freelance and it can remember every detail including researcher, client, time accepted, finished, etc., even my sleep patterns and I can update and change information to optimize my schedule.
I have five years of tax form information I can have it recall and explain the tax code to me in a different manner, because it remembers the flow or path our conversation took from before.
Maybe its not actually “remembering” everything, hell it might have to load every piece of data and conversation again every time i open it… or maybe it uses some kind of quick access conclusion system.. no freaking idea. But i don’t see why this wouldn’t translate to coding or something more complex?
Transformers predict the next token. They only hold info to do that. If they have to remember the capital of France, they try to predict token by token. The obvious alternative would be to have lookup table. But Transformers don’t understand tables only text split in tokens. Would be too hard to make all the required lut anyway. You need to do it on the fly, could be slow as an aging turtle with arthritis.
For me this is comparable to how we evolve as a mankind vs single human.
Single human’s minds are transient and are unable to survive beyond the lifespan.
Mankind needs literacy, books, internet as a memory to progress over generations of humans.
And someone’s mind cannot be fully dumped into the nemory – the process is lossy.
There’s a simple practical solution that will get you 90% of the way there. When you’re done with a session, ask your AI to create a document that summarizes all of the key information from the conversation that it believes it would need in order to resume a new conversation seamlessly. Then cut and paste that document into a local drive. Next time, load it up and off you go.
This is actually much better than an AI that “remembers” random facts outside of your control. This way, you can control—and inspect—exactly what it remembers about you, keeping it all stored locally.
(True, you can never really know if the company is retaining info without your knowledge, but at least in theory this helps protect privacy.)