Skip to main content

CS186 AI Chat

AI semantic search/chat over all lectures from CS186 Spring 2023.

Website Github


Table of Contents

    📝 About
      💻 How to build🚀 Next steps🔧 Tools used
        👤 Contact

      📝About

      More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on the weekly Notes for CS 186. It's like talking to your professor. Good for those who suck at taking notes. CS186 students, staff, and more generally anyone can clone and use this repo and adjust to their liking.

      🔗 Official Course Website
      UC Berkeley 🐻🔵🟡 • CS186: Introduction to Database Systems • Spring 2023


      💻How to Build

      Initial setup

      Clone the repo and install dependencies.

      git clone https://github.com/vdutts7/cs186-ai-chat
      cd cs186-ai-chat
      pnpm install

      Create a .env file and add your API keys (refer .env.local.example for this template):

      OPENAI_API_KEY=""
      NEXT_PUBLIC_SUPABASE_URL=""
      NEXT_PUBLIC_SUPABASE_ANON_KEY=""
      SUPABASE_SERVICE_ROLE_KEY=""

      Get API keys:

      IMPORTANT: Verify that .gitignore contains .env in it.

      Prepare Supabase environment

      I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc

      You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql. You should now have a documents table created with 4 columns.

      Embedding and upserting

      Inside the config folder is class-website-urls.ts. Modify to your liking. Project is setup to handle HTML pages in a consistent HTML/CSS format, which are then scraped using the cheerio jQuery package. Modify /utils/custom_web_loader.ts to control which CSS elements of the webpages' text you want scraped.

      Manually run scrape-embed.ts from the scripts folder OR run the package script from terminal:

      npm run scrape-embed

      This is a one-time process and depending on size of data, it can take up to a few minutes. Check documents in your Supabase project and you should see rows populated with the embeddings that were just created.

      Technical explanation

      The scrape-embed.ts script:

      • Retrieves URLs from /config/class-website-urls.ts, extract the HTML/CSS data via cheerio as specified in /utils/custom_web_loader.ts
      • Vectorizes and embeds data into a JSON object using OpenAI's Embeddings(text-embedding-ada-002). This makes several vectors of 1536 dimensionality optimized for cosine similarity searches.
      • Upserts embeddings into documents (Supabase vectorstore). The upsert operation inserts new rows and overwrites existing rows.

      visualized-flow-chart

      Run app

      npm run dev

      Go to http://localhost:3000. You should be able to type and ask questions now. Done ✅

      🚀Next Steps

      • UI/UX: change to your liking.
      • Bot behavior: edit prompt template in /utils/makechain.ts to fine-tune and add greater control on the bot's outputs.
      • Data: change URLs to handle whatever pages you want

      🔧Tools Used

      OpenAI cheerio Supabase Langchain Next

      👤Contact

      Email Twitter