

I started this project last week to make Epstein documents easily searchable and create an archive in case data is removed from official sources. This quickly escalated into a much larger project than expected, from a time, effort, and cost perspective :). I also managed to archive a lot of the House Oversight committee's documents, including from the epstein estate.
I scraped everything, ran it through OpenAI's batch API, and built a full-text search with network graphs leveraging PostgreSQL full text search.
Now at 1,317,893 documents indexed with 238,163 people identified (lots of dupes, working on deduping these now). I'm also currently importing non PDF data (like videos etc).
Feedback is welcome, this is my first large dataset project with AI. I've written tons of automation scripts in python, and built out the website for searching, added some caching to speed things up.
Posted by indienow
![I mapped connections between 238,000 unique people across 1.3 million Epstein documents [OC] I mapped connections between 238,000 unique people across 1.3 million Epstein documents [OC]](https://www.byteseu.com/wp-content/uploads/2026/02/7ris9gvnqoig1-1024x922.png)
6 Comments
All of this data came from the DOJ’s Epstein Transparency Act releases, and the House Oversight Committee’s public releases. I used D3 for the visualizations.
I like how there’s C, S, a, N, B, E, and #1 in the second image. It’s cool to have a code name.
This is amazing, great work!
Why does [Ehud Barak](https://epsteingraph.com/people/ehud-barak) show up as ‘person not found’?
FYI for anyone looking for Trump, his Executive Assistant is Rhona Groff, the 2nd largest connection to Epstein beside Ghislaine Maxwell.
Trump doesn’t use email, he has his EA do it all, or he uses the phone so as not to leave a paper trail.
God damn I was hoping not to find Self in there. I swear, I didn’t do anything.