Changelog neural search

🎧 Listen

Search is one of the most important breakthroughs of the internet. Some are saying a list of blue links is not enough - and that AI will overthrow search. I don't know if we're about to witness a revolution. But as with most things - there's only one way to know - to build and use it myself.

I like podcasts a lot. There's really nothing like hearing people talk about things I know nothing about. Some of my favorite podcasts are produced by the Changelog network. More than once, I've had to use their search engine when researching something that was said during an episode.

One of the best things about the Changelog is that the whole thing is open source. From the podcast engine itself, to the transcripts of every episode. Why not take all of these transcripts and build an AI-powered™ search engine around them?

Neural search for the changelog

changelog.duarteocarmo.com

How it's built

Before I describe the stack, let's get the obvious out of the way: the whole thing is open source. Both the back-end and the front-end. If you prefer to go and poke around the code yourself, be my guest.

I love the chunk-embed-search-retrieve dance as much as the next guy, but for this one, I wanted to keep things a bit simpler, so I'm letting SuperDuperDB do most of the heavy lifting for me. With it, all I really need to do is add the embedding model to my serverless MongoDB instance, and it handles the rest for me:

# add model to DB
model = SentenceTransformer(...)
db.add(
    VectorIndex(
        identifier=index_id,
        indexing_listener=Listener(
            model=model,
            key=key,
            select=collection.find(),
        ),
    )
)
# search the DB
cur = db.execute(
    collection.find({"$regex": {"podcast": "practicalai"}}).like(
        {"text": "What are embeddings"}, n=limit, vector_index=index_id
    )
)

For the front-end, I finally took NextJS for a spin. Love the productivity gains - especially when we're talking about developer experience. Vercel is absolutely killing the developer experience side of things. On the other side, I have no clue how most of the magic is working - and I'm not sure that's a good thing.


October 6, 2023
Subscribe Reply