How to use GPT to search documents and databases?

Ramin
2 min readMay 16, 2023

--

CharGPT is very useful, but what if you want it to use it to search relevant segments in a big document or database?
In this article, I explain how to do this using openAI API.

Outline

  • Segment out the database or big chunk of text or document into smaller chunks that consist of a few sentences.
  • Assign a vector to each segment using openAI embedding. This assigns a numeric vector based on the meaning of the chunk of text. So segments whose meaning are close semantically, will have embedding vectors that are close to each other (in L2 norm sense)

At query time:

  • Assign an embedding vector to the query
  • Find the K nearest neighbors in the list of chunks in embedding space. Call this “context”
  • Construct a prompt for ChatGPT that reads something like this:
    system = f"""
Answer questions based on the context provided below. :

Context: {context}
"""

Ask openAI chatCompletion API to create a response based on the prompt.

You can also create reference to your original database/document by saving the indexes to the K nearest chunks which was computed above.

Code

This is the code for generating the embeddings associated with the big database/document:

This is the code for searching and using CharGPT to come up with the response and index to the original database/document:

In this example, I am using this method to create a web app for answering questions from Tesla car manuals. You can easily modify it to work with your own document or database. The web app is here:

I used streamlit for creating this site and hosting it. It’s a fantastic way of creating web pages with python backend. Please don’t judge me on the quality and cleanliness of my code :)
Also, I should mention that significant chunks of this code is from the sample codes provided by openAI

Shameless plug: If you are interested in this kind of stuff, you may like my most recent app which summarizes web articles right from the browser on iPhone/iPad. It also puts chatGPT on your watch:

--

--