CS186 AI Chat

AI semantic search/chat over all lectures from CS186 Spring 2023.

📝 About

💻 How to build

Initial setup
Prepare Supabase environment
Embed and upsert
Technical explanation
Run app

🚀 Next steps

Deploy
Customizations

🔧 Tools used

👤 Contact

📝About

More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on the weekly Notes for CS 186. It's like talking to your professor. Good for those who suck at taking notes. CS186 students, staff, and more generally anyone can clone and use this repo and adjust to their liking.

🔗 Official Course Website
UC Berkeley 🐻🔵🟡 • CS186: Introduction to Database Systems • Spring 2023

💻How to Build

Initial setup

Clone the repo and install dependencies.

git clone https://github.com/vdutts7/cs186-ai-chat
cd cs186-ai-chat
pnpm install

Create a .env file and add your API keys (refer .env.local.example for this template):

OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""

Get API keys:

IMPORTANT: Verify that .gitignore contains .env in it.

Prepare Supabase environment

I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc

You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql. You should now have a documents table created with 4 columns.

Embedding and upserting

Inside the config folder is class-website-urls.ts. Modify to your liking. Project is setup to handle HTML pages in a consistent HTML/CSS format, which are then scraped using the cheerio jQuery package. Modify /utils/custom_web_loader.ts to control which CSS elements of the webpages' text you want scraped.

Manually run scrape-embed.ts from the scripts folder OR run the package script from terminal:

npm run scrape-embed

This is a one-time process and depending on size of data, it can take up to a few minutes. Check documents in your Supabase project and you should see rows populated with the embeddings that were just created.

Technical explanation

The scrape-embed.ts script:

Retrieves URLs from /config/class-website-urls.ts, extract the HTML/CSS data via cheerio as specified in /utils/custom_web_loader.ts
Vectorizes and embeds data into a JSON object using OpenAI's Embeddings(text-embedding-ada-002). This makes several vectors of 1536 dimensionality optimized for cosine similarity searches.
Upserts embeddings into documents (Supabase vectorstore). The upsert operation inserts new rows and overwrites existing rows.

visualized-flow-chart

Run app

npm run dev

Go to http://localhost:3000. You should be able to type and ask questions now. Done ✅

🚀Next Steps

UI/UX: change to your liking.
Bot behavior: edit prompt template in /utils/makechain.ts to fine-tune and add greater control on the bot's outputs.
Data: change URLs to handle whatever pages you want