This notebook is prepared for a scenario where:
- Your data is already in Weaviate
- You want to use Weaviate with the Generative OpenAI module (generative-openai).
This notebook is prepared for a scenario where:
This cookbook only coveres Generative Search examples, however, it doesn't cover the configuration and data imports.
In order to make the most of this cookbook, please complete the Getting Started cookbook first, where you will learn the essentials of working with Weaviate and import the demo data.
Checklist:
Weaviate
instance,Weaviate
instance,===========================================================
The OpenAI API key
is used for vectorization of your data at import, and for running queries.
If you don't have an OpenAI API key, you can get one from https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY
.
# Export OpenAI API Key
!export OPENAI_API_KEY="your key"
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for the env variables to be live.
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'
if os.getenv("OPENAI_API_KEY") is not None:
print ("OPENAI_API_KEY is ready")
else:
print ("OPENAI_API_KEY environment variable not found")
In this section, we will:
OPENAI_API_KEY
– make sure you completed the step in #Prepare-your-OpenAI-API-keyOpenAI API Key
After this step, the client
object will be used to perform all Weaviate-related operations.
import weaviate
from datasets import load_dataset
import os
# Connect to your Weaviate instance
client = weaviate.Client(
url="https://your-wcs-instance-name.weaviate.network/",
# url="http://localhost:8080/",
auth_client_secret=weaviate.auth.AuthApiKey(api_key="<YOUR-WEAVIATE-API-KEY>"), # comment out this line if you are not using authentication for your Weaviate instance (i.e. for locally deployed instances)
additional_headers={
"X-OpenAI-Api-Key": os.getenv("OPENAI_API_KEY")
}
)
# Check if your instance is live and ready
# This should return `True`
client.is_ready()
Weaviate offers a Generative Search OpenAI module, which generates responses based on the data stored in your Weaviate instance.
The way you construct a generative search query is very similar to a standard semantic search query in Weaviate.
For example:
result = (
client.query
.get("Articles", ["title", "content", "url"])
.with_near_text("concepts": "football clubs")
.with_limit(5)
# generative query will go here
.do()
)
Now, you can add with_generate()
function to apply generative transformation. with_generate
takes either:
single_prompt
- to generate a response for each returned object,grouped_task
– to generate a single response from all returned objects.def generative_search_per_item(query, collection_name):
prompt = "Summarize in a short tweet the following content: {content}"
result = (
client.query
.get(collection_name, ["title", "content", "url"])
.with_near_text({ "concepts": [query], "distance": 0.7 })
.with_limit(5)
.with_generate(single_prompt=prompt)
.do()
)
# Check for errors
if ("errors" in result):
print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
raise Exception(result["errors"][0]['message'])
return result["data"]["Get"][collection_name]
query_result = generative_search_per_item("football clubs", "Article")
for i, article in enumerate(query_result):
print(f"{i+1}. { article['title']}")
print(article['_additional']['generate']['singleResult']) # print generated response
print("-----------------------")
def generative_search_group(query, collection_name):
generateTask = "Explain what these have in common"
result = (
client.query
.get(collection_name, ["title", "content", "url"])
.with_near_text({ "concepts": [query], "distance": 0.7 })
.with_generate(grouped_task=generateTask)
.with_limit(5)
.do()
)
# Check for errors
if ("errors" in result):
print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the limit is set at 60 per minute.")
raise Exception(result["errors"][0]['message'])
return result["data"]["Get"][collection_name]
query_result = generative_search_group("football clubs", "Article")
print (query_result[0]['_additional']['generate']['groupedResult'])
Thanks for following along, you're now equipped to set up your own vector databases and use embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue to work through other cookbook examples in this repo.