Skip to main content

MFM Podcast Chat

AI-powered Semantic search over all 450+ My First Million (MFM) podcast episodes 💰

Website Github


Table of Contents

    📝 About
      💻 How to build
      🚀 Next steps
        🔧 Tools used
          👤 Contact

        About

        Semantic search over all 450+ My First Million (MFM) podcast episodes. Can be used for any YouTube playlist. I chose a podcast for this hackathon as it is something most people are familiar with. Can combine multiple channels, parts of channels, or just an assortment of videos of your choice.

        Videos are transcribed using some hacky Python scripts, combined with associated metadata, and pre-processed (cleaned). The transcipts are chunked and vectorized into a database by tokens and converted to text embeddings with ~ 16k dimensions. There are limitations; for those who care more about this topic, read the Milvus documentation.

        Next Steps & Feedback

        Some of my plans to improve this project:

        • Moving away from YouTube V3 API towards a faster transcribing solution. Whisper is good but expensive and pytube and other Python packages are probably going to be used once the amoutn of video content exceeds a certain storage capacity.
        • Adding visual elements to search experience (i.e. thumnbail generation specific to the exact timestamp) using Puppeteer or some other solution.

        🔧Tools Used

        python YouTubeV3API OpenAI Milvus Next


        👤Contact

        Email Twitter