Table of Contentsβ
π Aboutβ
More natural way to help students study for exams, review weekly content, and customize learnings to recreate similar problems etc to their prefernce. Trained on the weekly Notes for EE16B. It's like talking to your professor. Good for those who suck at taking notes. EE16B students, staff, and more generally anyone can clone and use this repo and adjust to their liking.
π Official Course Website
UC Berkeley π»π΅π‘ β’ EE16B: Designing Information Devices and Systems II β’ Spring 2023
π» How to Buildβ
Note: macOS version, adjust accordingly for Windows / Linux
Initial setupβ
Clone the repo and install dependencies.
git clone https://github.com/vdutts7/ee16b-ai-chat
cd ee16b-ai-chat
pnpm install
Create a .env file and add your API keys (refer .env.local.example
for this template):
OPENAI_API_KEY=""
NEXT_PUBLIC_SUPABASE_URL=""
NEXT_PUBLIC_SUPABASE_ANON_KEY=""
SUPABASE_SERVICE_ROLE_KEY=""
Get API keys:
IMPORTANT: Verify that .gitignore
contains .env
in it.
Prepare Supabase environmentβ
I used Supabase as my vectorstore. Alternatives: Pinecone, Qdrant, Weaviate, Chroma, etc
You should have already created a Supabase project to get your API keys. Inside the project's SQL editor, create a new query and run the schema.sql
. You should now have a documents
table created with 4 columns.
Embed and upsertβ
Inside the config
folder is the transcripts
folder with all lectures as .txt files and the corresponding JSON files for the metadatas. .txt files were scraped from the lecture recordings separately ahead of time but OpenAI's Whisper is a great package for Speech-to-Text transcription). Change according to preferences. pageContent
and metadata
are by default stored in Supabase along with an int8 type for the 'id' column.
Manually run the embed-script.ipynb
notebook in the scripts
folder OR run the package script from terminal:
npm run embed
This is a one-time process and depending on size of data you wish to upsert, it can take a few minutes. Check Supabase database to see updates reflected in the rows of your table there.
Technical explanationβ
This code performs the following:
Installs the
supabase
Python library usingpip
. This allows interaction with a Supabase database.Loads various libraries:
supabase
- For interacting with Supabaselangchain
- For text processing and vectorizationjson
- For loading JSON metadata filesLoads the Supabase URL and API key from
.env
. This is used to create asupabase_client
to connect to the Supabase database.Loads text data from .txt lecture transcripts and JSON metadata files.
Uses a
RecursiveCharacterTextSplitter
to split the lecture text into chunks. This allows breaking the text into manageable pieces for processing. Chunk size and chunk overlap can be changed according to preference and basically control the amount of specificity. A larger chunk size and smaller overlap will result in fewer, broader chunks, while a smaller chunk size and larger overlap will produce more, narrower chunks.Creates OpenAI
text-embedding-ada-002
embeddings. This makes several vectors of 1536 dimensionality optimized for cosine similarity searches. These vectors are then combined with the metadata in the JSON files along with other lecture-specific info and upserted to the database as vector embeddings in row tabular format i.e. aSupabaseVectorStore
.
Run appβ
Run app and verify everything went smoothly:
npm run dev
Go to http://localhost:3000
. You should be able to type and ask questions now. Done β
π Next stepsβ
Customizationsβ
UI/UX: change to your liking.
Bot behavior: edit prompt template in /utils/makechain.ts
to fine-tune and add greater control on the bot's outputs.
Data: modify .txt files in /config/transcripts
and main script in /scripts/embed-script.ipynb