Skip to main content

Perspicacious AI - Speak to the Top G

AI-Powered semantic search over hours of YouTube podcasts and interviews from Andrew Tate 🚬💸💬

Website Github


Table of Contents​

    đź“ť About
      đź’» How to build
      🚀 Next steps
        🔧 Tools used
          👤 Contact

        📝About​

        AI semantic search over hours of podcasts and interviews on YouTube from Andrew Tate.

        • Transcripts may not be perfect (blame YouTube API's stringent ban on non-OAuth caption access lol)

        💻How to Build​

        This project uses basic Python scripts, a vector database, and a semantic k-nearest search (KNN).

        Videos are transcribed using some hacky Python scripts, combined with associated metadata, and pre-processed (cleaned). The transcipts are chunked and vectorized into a database by tokens and converted to text embeddings with ~ 16k dimensions. There are limitations; for those who care more about this topic, read the Milvus documentation.


        🚀Next Steps​

        Some of my plans to improve this project:

        • Moving away from YouTube V3 API towards a faster transcribing solution. Whisper is good but expensive and pytube and other Python packages are probably going to be used once the amount of video content exceeds a certain storage capacity.
        • Adding visual elements to search experience (i.e. thumnbail generation specific to the exact timestamp) using Puppeteer or some other solution.

        🔧Tools Used​

        python YouTubeV3API OpenAI Milvus Next


        👤Contact​

        Email Twitter