This page provides an instruction & guide for developers building a GPT Action for a specific application. Before you proceed, make sure to first familiarize yourself with the following information:
This guide provides details on how to connect ChatGPT with a Snowflake Data Warehouse for the purposes of returning a SQL query to ChatGPT for use with Data Analysis. The GPT requires an action that interfaces with middleware (ie Azure function) so that the action can properly format the response from Snowflake for use in the Python notebook environment. Data must be returned as a file, so the middleware function should transform the SQL response into a CSV/Excel file, under 10MB in size.
This document will outline the Middleware function GPT action. For setting up the middleware function itself, see GPT Actions library (Middleware) - Azure Functions. You can combine this Snowflake middleware action with an action to Snowflake Directly to enable a GPT that can form and test SQL queries prior to executing them.
Existing Snowflake customers can leverage these guidelines to query data from their data warehouse and load that data into the Data Analysis Python environment for further insights. This enables ChatGPT powered analysis such as visualizing data sets, identifying patterns/anomalies, or identifying gaps for data cleansing purposes. This GPT can be used to drive business decisions from relatively small datasets, or to explore subsets of data through AI to generate hypotheses as you explore the holistic dataset in your BI tool, saving time and money, while identifying previously unseen patterns.
Check out these links from Snowflake and Azure before you get started:
Snowflake Action
Azure Function
Before you get started, make sure you go through the following steps in your application environment:
In addition, before creating your application in Azure Function App, you’ll need a way to handle user authentication. You’ll need to set up an OAuth App Registration in Azure Entra ID that can be linked with a Snowflake External OAuth security integration. Snowflake’s External OAuth security integrations allow external systems to issue access tokens that Snowflake can use for determining level of access. In this case, that external token provider is Azure Entra ID. Since ChatGPT will connect to Azure rather than Snowflake, the GPT user’s OAuth token will be provisioned by Azure associated with their user in Entra ID. Thus you’ll need a way to map users in Snowflake to their corresponding user in Azure.
All of the necessary steps for both the Azure side and the Snowflake side are laid out below.
We’ll set up a new App Registration, configure the necessary Snowflake Scopes in Azure that will be used, and retrieve all of the OAuth configuration parameters that will be needed in both Snowflake and ChatGPT. This section will all be in Azure so that in the next section, you’ll have the necessary info to link to this App Registration when configuring on the Snowflake side.
Snowflake GPT OAuth Client
, or similar value as the Name.TENANT_ID
) under Essentials. You will use this to generate your AZURE_AD_ISSUER
and AZURE_AD_JWS_KEY_ENDPOINT.
AZURE_AD_ISSUER
is https://sts.windows.net/TENANT_ID/
AZURE_AD_JWS_KEY_ENDPOINT
is https://login.microsoftonline.com/TENANT_ID/discovery/v2.0/keys
AZURE_AD_OAUTH_AUTHORIZATION_ENDPOINT
and OAuth 2.0 token endpoint (v2) as the AZURE_AD_OAUTH_TOKEN_ENDPOINT
.
https://login.microsoftonline.com/90288a9b-97df-4c6d-b025-95713f21cef9/oauth2/v2.0/authorization
and https://login.microsoftonline.com/90288a9b-97df-4c6d-b025-95713f21cef9/oauth2/v2.0/token
.Application ID URI
.
Application ID URI
must be unique within your organization’s directory, such as https://your.company.com/4d2a8c2b-a5f4-4b86-93ca-294185f45f2e
. This value will be referred to as the <SNOWFLAKE_APPLICATION_ID_URI>
in the subsequent configuration steps.session:scope:
prefix. For example, for the Snowflake Analyst role, enter session:scope:analyst
.AZURE_AD_SCOPE
. It should be a concatenation of your Application ID URI
and your Scope name
ClientID
from the Application (client) ID field. This will be known as the OAUTH_CLIENT_ID
in the following steps.OAUTH_CLIENT_SECRET
in the following steps.Once the App Registration is complete in Azure Entra ID, the next step is to link that App Registration to Snowflake via an External OAuth Security Integration. The external_oauth_audience_list
parameter of the security integration must match the Application ID URI that you specified while configuring Azure Entra ID.
The Issuer and the JWS Keys endpoint will also come from values collected in the previous steps. The User Mapping Attribute can either be set to EMAIL_ADDRESS
or LOGIN_NAME
, and this is how user’s Microsoft login credentials will be mapped to their user in Snowflake to ensure permissions in Snowflake are honored by the access token issued to ChatGPT.
CREATE OR REPLACE SECURITY INTEGRATION AZURE_OAUTH_INTEGRATION
TYPE = EXTERNAL_OAUTH
ENABLED = TRUE
EXTERNAL_OAUTH_TYPE = 'AZURE'
EXTERNAL_OAUTH_ISSUER = '<AZURE_AD_ISSUER>'
EXTERNAL_OAUTH_JWS_KEYS_URL = '<AZURE_AD_JWS_KEY_ENDPOINT>'
EXTERNAL_OAUTH_AUDIENCE_LIST = ('<SNOWFLAKE_APPLICATION_ID_URI>')
EXTERNAL_OAUTH_TOKEN_USER_MAPPING_CLAIM = 'upn'
EXTERNAL_OAUTH_SNOWFLAKE_USER_MAPPING_ATTRIBUTE = 'EMAIL_ADDRESS';
Make sure you go through the following steps in your Azure environment:
Now that we have the GPT created and handled Azure/Snowflake authentication, we can create the Azure Function App itself to execute the SQL query and handle the response formatting enabling the GPT to download the result as a CSV for use with Data Analysis.
Follow this Azure Cookbook Guide for further details deploying an Azure Function App. Below you will find sample code to add to the function.
This code is meant to be directional - while it should work out of the box, you should customize it based on the needs specific to your GPT and your IT setup.
You’ll need to setup the following flows in your Azure Function App:
*These steps may not be required if you use the file stream option instead of the url option for returning files to your GPT. More on this below.
Ensure you have the necessary libraries installed and imported into your script. In addition to Python standard libraries, this sample script leveraged the following:
import azure.functions as func
from azure.storage.blob import BlobServiceClient, generate_blob_sas, BlobSasPermissions, ContentSettings
import snowflake.connector
import jwt # pyjwt for token decoding
To connect to Snowflake, you’ll need to extract the access token assigned from Azure Entra ID from the Authorization header and use that token when connecting to the Snowflake server.
In this this example, Snowflake usernames are email addresses which simplifies the mapping of the Entra ID user extracted from the HTTP access token to the Snowflake user ID needed to connect. If this is not the case for your organization, you can map email addresses to Snowflake user IDs in your Python application.
My application was built to interface with a single Snowflake Account (i.e. ab12345.eastus2.azure) and Warehouse. If you need to access multiple accounts or warehouses, you may consider passing these parameters in your GPT action parameters so you can extract them from the HTTP request.
# Extract the token from the Authorization header
auth_header = req.headers.get('Authorization')
token_type, token = auth_header.split()
try:
# Extract email address from token to use for Snowflake user mapping
# If Snowflake usernames are not emails, then identify the username accordingly
decoded_token = jwt.decode(token, options={"verify_signature": False})
email = decoded_token.get('upn')
conn = snowflake.connector.connect(
user=email, # Snowflake username, i.e., user's email in my example
account=SNOWFLAKE_ACCOUNT, # Snowflake account, i.e., ab12345.eastus2.azure
authenticator="oauth",
token=token,
warehouse=SNOWFLAKE_WAREHOUSE # Replace with Snowflake warehouse
)
logging.info("Successfully connected to Snowflake.")
except Exception as e:
logging.error(f"Failed to connect to Snowflake: {e}")
Once you connect to Snowflake you’ll need to execute the query and store the results into a CSV. While the role in Snowflake should prevent any chance of harmful queries, you may want to sanitize your query in your application (not included below) just as you would any other programmatic SQL query execution.
# Extract SQL query from request parameters or body
sql_query = req.params.get('sql_query')
try:
# Use the specified warehouse
cursor = conn.cursor()
# Execute the query
cursor.execute(sql_query)
results = cursor.fetchall()
column_names = [desc[0] for desc in cursor.description]
logger.info(f"Query executed successfully: {sql_query}")
# Convert results to CSV
csv_file_path = write_results_to_csv(results, column_names)
except Exception as e:
logger.error(f"Error executing query or processing data: {e}")
def write_results_to_csv(results, column_names):
try:
# Create a temporary file
temp_file = tempfile.NamedTemporaryFile(delete=False, mode='w', newline='')
csv_writer = csv.writer(temp_file)
csv_writer.writerow(column_names) # Write the column headers
csv_writer.writerows(results) # Write the data rows
temp_file.close() # Close the file to flush the contents
return temp_file.name # Return file path
except Exception as e:
logger.error(f"Error writing results to CSV: {e}")
There are 2 methods for returning files to ChatGPT for processing. You can either stream the base64 encoded data along with the mimeType and file name in the openaiFileResponse list response, or you can return a list of URLs. In this solution we’ll focus on the latter.
To do this, you’ll need to upload the CSV to Azure Blob Storage and return a pre-signed URL for accessing that file securely in ChatGPT. It is important to note that in order to download a URL in ChatGPT, you’ll need to ensure that URL includes a content_type and content_disposition, as in the below example. If you’d like to inspect whether a URL has the necessary headers, you can use curl -I <url>
from any terminal.
You’ll need to get a connection String for your Azure storage bucket, as per instructions here.
def upload_csv_to_azure(file_path, container_name, blob_name, connect_str):
try:
# Create the BlobServiceClient object which will be used to create a container client
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
# Create a blob client using the local file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Upload the file with specified content settings
with open(file_path, "rb") as data:
blob_client.upload_blob(data, overwrite=True, content_settings=ContentSettings(
content_type='text/csv',
content_disposition=f'attachment; filename="{blob_name}"'
))
logger.info(f"Successfully uploaded {file_path} to {container_name}/{blob_name}")
# Generate a SAS token for the blob
sas_token = generate_blob_sas(
account_name=blob_service_client.account_name,
container_name=container_name,
blob_name=blob_name,
account_key=blob_service_client.credential.account_key,
permission=BlobSasPermissions(read=True),
expiry=datetime.datetime.utcnow() + datetime.timedelta(hours=1) # Token valid for 1 hour
)
# Generate a presigned URL using the SAS token
url = f"https://{blob_service_client.account_name}.blob.core.windows.net/{container_name}/{blob_name}?{sas_token}"
logger.info(f"Generated presigned URL: {url}")
return url
except Exception as e:
logger.error(f"Error uploading file to Azure Blob Storage: {e}")
raise
Lastly, you’ll need to format the response appropriately to instruct ChatGPT to process that response as a file or series of files. The openaiFileResponse is a list which can include up to 10 URLs (or base64 encodings if using the inline option).
# Format the response so ChatGPT treats it as a file
response = {
'openaiFileResponse': [csv_url]
}
cursor.close()
conn.close()
return func.HttpResponse(
json.dumps(response),
status_code=200
)
There are a lot of moving pieces to this application, so testing your Azure Function App can be important. ChatGPT can be a difficult testing grounds given that requests and responses can sometimes be more opaque than needed for debugging. Initial testing of your application through cURL or Postman to invoke the HTTP request from a more controlled environment will allow you to debug and triage issues more easily. Once you determine that responses are being returned as expected in those tools, you are ready to build your GPT.
Once you've created a Custom GPT, use the text below in the Instructions panel for inspiration. Have questions? Check out Getting Started Example to see how this step works in more detail.
It is important that ChatGPT understands your table schema to properly form SQL queries. There are different methods for doing so, and this Instruction set represents the most direct way. We are working to publish additional instructions for different versions of Snowflake GPTs you may want to build to allow for working with multiple different tables, schemas and databases, or to even learn dynamically for schemas that tend to change over time.
Below are some basic instructions when working with a single schema and table. This GPT has been optimized for a single use case (analyzing flight data from January 2013 out of NYC) which allows for the most simple instructions to provide the most reliable GPT performance.
You are an expert at writing SQL queries to fetch data from Snowflake. You help users convert their prompts into SQL queries. Any question around flight data will be converted into a Snowflake SQL query that hits the table FLIGHTS.PUBLIC.JAN_2013_NYC
. Pass any query into the "sql_query" parameter
The schema of the table includes
ID NUMBER A unique identifier for each flight
YEAR NUMBER The year of the flight
MONTH NUMBER The month of the flight
DAY NUMBER The day of the month on which the flight departed
DEP_TIME NUMBER The actual departure time of the flight
SCHED_DEP_TIME NUMBER The scheduled departure time of the flight
DEP_DELAY NUMBER The departure delay in minutes (negative values indicate early departures)
ARR_TIME NUMBER The actual arrival time of the flight
SCHED_ARR_TIME NUMBER The scheduled arrival time of the flight
ARR_DELAY NUMBER The arrival delay in minutes (negative values indicate early arrivals)
CARRIER_CODE TEXT The carrier code of the airline
FLIGHT NUMBER The flight number
TAILNUM TEXT The aircraft tail number
ORIGIN_AIRPORT_CODE TEXT The origin airport code
DEST_AIRPORT_CODE TEXT The destination airport code
AIR_TIME NUMBER The total airtime of the flight in minutes
DISTANCE NUMBER The distance traveled by the flight in miles
HOUR NUMBER The hour part of the scheduled departure time
MINUTE NUMBER The minute part of the scheduled departure time
TIME_HOUR NUMBER The time at which the flight departed (rounded to the nearest hour)
CARRIER_NAME TEXT The full name of the airline carrier
ORIGIN_AIRPORT_NAME TEXT The full name of the origin airport
ORIGIN_REGION TEXT The region code of the origin airport
ORIGIN_MUNICIPALITY TEXT The city where the origin airport is located
ORIGIN_COORDINATES TEXT The geographical coordinates of the origin airport
DEST_AIRPORT_NAME TEXT The full name of the destination airport
DEST_REGION TEXT The region code of the destination airport
DEST_MUNICIPALITY TEXT The city where the destination airport is located
DEST_COORDINATES TEXT The geographical coordinates of the destination airport
When a user asks for data around flights, perform the following:
executeSQL
action to send a POST request to the Azure function endpointThe user will wish to ask questions about the data in code interpreter, so use that for any data analysis insights from the dataset you pulled.
Once you've created a Custom GPT, copy the text below in the Actions panel, replacing the placeholder values with your specific function details and updating your parameters based on any additional inputs you built into your Azure Function App.
Have questions? Check out Getting Started Example to see how this step works in more detail.
openapi: 3.1.0
info:
title: Snowflake GPT API
description: API to execute SQL queries on Snowflake and get the results as a CSV file URL.
version: 1.0.0
servers:
- url: https://<server-name>.azurewebsites.net
description: Azure Function App server running Snowflake integration application
paths:
/api/<function_name>?code=<code>:
post:
operationId: executeSQL
summary: Executes a SQL query on Snowflake and returns the result file URL as a CSV.
requestBody:
required: true
content:
application/json:
schema:
type: object
properties:
sql_query:
type: string
description: The SQL query to be executed on Snowflake.
required:
- sql_query
responses:
'200':
description: Successfully executed the query.
content:
application/json:
schema:
type: object
properties:
openaiFileResponse:
type: array
items:
type: string
format: uri
description: Array of URLs pointing to the result files.
'401':
description: Unauthorized. Missing or invalid authentication token.
'400':
description: Bad Request. The request was invalid or cannot be otherwise served.
'500':
description: Internal Server Error. An error occurred on the server.
components:
schemas: {}
openaiFileResponse
object. This requires custom code to properly format the response.Are there integrations that you’d like us to prioritize? Are there errors in our integrations? File a PR or issue in our github, and we’ll take a look.