Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: DocsGPT POC#12056

Merged
merged 64 commits into from Feb 6, 2023
Merged

WIP: DocsGPT POC #12056

merged 64 commits into from Feb 6, 2023

Conversation

kiwicopple
Copy link
Member

@kiwicopple kiwicopple commented Jan 30, 2023

(copied from OP: #12054)

End-to-end POC for DocsGPT project.

Pre-reqs

Config/secrets

Updated your ./apps/docs/.env.local file with some keys (use .env.sample as a base):

NEXT_PUBLIC_SUPABASE_URL=http://localhost:54321
NEXT_PUBLIC_SUPABASE_ANON_KEY=
OPENAI_KEY=

# TODO: merge META_SUPABASE_URL with NEXT_PUBLIC_SUPABASE_URL
# Currently separate in order to run edge function locally
# (runs in container with different network namespace)
META_SUPABASE_URL=http://supabase_kong_supabase:8000
META_SUPABASE_SERVICE_KEY=

To get the OpenAI key, you will need an OpenAI account and create a key here:
https://beta.openai.com/account/api-keys

Run local Supabase stack

We have extended ./supabase to include DB migrations required for DocsGPT. Be sure to run a local Supabase stack. From the project root:

$ supabase start

Generate embeddings (first time only)

The first time you will need to pre-generate embeddings for the documents (guide-only for now). Simply call the following script from ./apps/docs:

$ npm run build:embeddings

You can safely call this multiple times if you like - it uses a checksum to determine whether or not it has already generated an embedding for each document and will skip if its already there.

In the future this will most likely be called from a CI pipeline.

Note: This does have a (very small) cost every time you run. It queries OpenAI's embeddings endpoint to generate embeddings. If you find yourself constantly restarting your Postgres instance, you can use the following commands to quickly backup/restore without re-generating embeddings every time:

Backup:

pg_dump --column-inserts --data-only -h localhost -p 54322 -U postgres -t page -t page_section > backup.sql

Restore:

psql -h localhost -p 54322 -U postgres -q -f backup.sql

Run edge function

A server side edge function was built to handle DocGPT queries (search embeddings in Postgres, inject as context in prompt, send prompt request to OpenAI).

You will need to run this function locally and pass in the above environment variables. From the project root:

$ supabase functions serve clippy-search --env-file ./apps/docs/.env

Run docs project

Of course we will need to run the docs project to use the frontend. From ./apps/docs:

$ npm run dev

@kiwicopple kiwicopple requested a review from a team as a code owner January 30, 2023 16:30
@vercel
Copy link

vercel bot commented Jan 30, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
zone-www-dot-com ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Feb 6, 2023 at 9:19PM (UTC)
4 Ignored Deployments
Name Status Preview Comments Updated
about ⬜️ Ignored (Inspect) Feb 6, 2023 at 9:19PM (UTC)
docs ⬜️ Ignored (Inspect) Visit Preview Feb 6, 2023 at 9:19PM (UTC)
supabase-studio-prod ⬜️ Ignored (Inspect) Visit Preview Feb 6, 2023 at 9:19PM (UTC)
supabase-studio-staging ⬜️ Ignored (Inspect) Visit Preview Feb 6, 2023 at 9:19PM (UTC)

kiwicopple and others added 3 commits February 6, 2023 21:57
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

A new PostgreSQL extension is now available in Supabase: [`pgvector`](https://github.com/pgvector/pgvector), an open-source vector similarity search.

The exponential progress of AI functionality over the past year has inspired many new real world applications. One specific challenge has been the ability to store and query _embeddings_ at scale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[prettier] reported by reviewdog 🐶

Suggested change
The exponential progress of AI functionality over the past year has inspired many new real world applications. One specific challenge has been the ability to store and query _embeddings_ at scale.
The exponential progress of AI functionality over the past year has inspired many new real world applications. One specific challenge has been the ability to store and query _embeddings_ at scale.

@kiwicopple kiwicopple merged commit b0b3212 into master Feb 6, 2023
@kiwicopple kiwicopple deleted the feat/docs-gpt-poc branch February 6, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants