Table of Contents
📝About
More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on the weekly Notes for CS 186. It's like talking to your professor. Good for those who suck at taking notes. CS186 students, staff, and more generally anyone can clone and use this repo and adjust to their liking.
🔗 Official Course Website
UC Berkeley 🐻🔵🟡 • CS186: Introduction to Database Systems • Spring 2023
💻How to Build
Initial setup
Clone the repo and install dependencies.
git clone https://github.com/vdutts7/cs186-ai-chat
cd cs186-ai-chat
pnpm install
Create a .env file and add your API keys (refer .env.local.example
for this template):
OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""
Get API keys:
IMPORTANT: Verify that .gitignore
contains .env
in it.
Prepare Supabase environment
I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc
You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql
. You should now have a documents
table created with 4 columns.
Embedding and upserting
Inside the config
folder is class-website-urls.ts
. Modify to your liking. Project is setup to handle HTML pages in a consistent HTML/CSS format, which are then scraped using the cheerio
jQuery package. Modify /utils/custom_web_loader.ts
to control which CSS elements of the webpages' text you want scraped.
Manually run scrape-embed.ts
from the scripts
folder OR run the package script from terminal:
npm run scrape-embed
This is a one-time process and depending on size of data, it can take up to a few minutes. Check documents
in your Supabase project and you should see rows populated with the embeddings that were just created.
Technical explanation
The scrape-embed.ts
script:
- Retrieves URLs from
/config/class-website-urls.ts
, extract the HTML/CSS data viacheerio
as specified in/utils/custom_web_loader.ts
- Vectorizes and embeds data into a JSON object using OpenAI's Embeddings(text-embedding-ada-002). This makes several vectors of 1536 dimensionality optimized for cosine similarity searches.
- Upserts embeddings into
documents
(Supabase vectorstore). The upsert operation inserts new rows and overwrites existing rows.
Run app
npm run dev
Go to http://localhost:3000
. You should be able to type and ask questions now. Done ✅
🚀Next Steps
- UI/UX: change to your liking.
- Bot behavior: edit prompt template in
/utils/makechain.ts
to fine-tune and add greater control on the bot's outputs. - Data: change URLs to handle whatever pages you want