Governo Sombra transcripts

7 years. That's how long I've lived in Denmark for. I love it, but Portugal is still close to my heart. As an emigrant, it's always hard to stay connected to what's going on in Portugal. What are people talking about? What's in the news? What worries people? What is everyone arguing about over morning coffee?

One of the ways I like to stay in touch is by listening to Governo Sombra (now cleverly called Program whose name we are legally prevented from saying, after changing networks). It's a weekly show where the 3 guests (+1 host) comment on Portuguese and World news. Besides being funny, I also love the fact that the 3 guests represent different parts of the political spectrum, so I can get a good idea about how most of the people are feeling.

Inspired by Lexicap, I decided to build a website with the transcripts for all of the episodes of the show. More than once I've listened to a particular part of an episode and wanted to share it with a friend. Now, I can do it.

For the transcription, I used OpenAI's Open Source Whisper model. With a small caveat: the whole thing (serving + transcribing) needed to run in my 20 EUR/month VM. So it needed to be small and efficient.

I like Python, but Rust was the obvious choice. For the transcription part, I used (Rust bindings for whisper.cpp). For serving the app, I went with Actix Web - it's small, efficient, and reminds me a lot of Flask. Incredible how a small Linux box can handle transcribing 60min+ episodes without hiccuping much.

The quality of the transcription is something like a 6/10. I did use the base model so there is clearly space for improvement. Maybe when I get a dedicated box.

The entire thing is up on GitHub.

April 19, 2023

Get new posts in your inbox