OpenAI now offers function calling using reasoning models. Reasoning models are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.
Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent.
For the most part, using these models via the API is very simple and comparable to using familiar 'chat' models.
However, there are some nuances to bear in mind, particularly when it comes to using features such as function calling.
All examples in this notebook use the newer Responses API which provides convenient abstractions for managing conversation state. However the principles here are relevant when using the older chat completions API.
Let's make a simple call to a reasoning model using the Responses API.
We specify a low reasoning effort and retrieve the response with the helpful output_text attribute.
We can ask follow up questions and use the previous_response_id to let OpenAI manage the conversation history automatically
response = client.responses.create(input="Which of the last four Olympic host cities has the highest average temperature?",**MODEL_DEFAULTS)print(response.output_text)response = client.responses.create(input="what about the lowest?",previous_response_id=response.id,**MODEL_DEFAULTS)print(response.output_text)
Among the last four Summer Olympic host cities (Beijing 2008, London 2012, Rio de Janeiro 2016 and Tokyo 2020), Rio de Janeiro has by far the highest mean annual temperature—around 23 °C, compared with about 16 °C in Tokyo, 13 °C in Beijing and 11 °C in London.
Of those four, London has the lowest mean annual temperature, at roughly 11 °C.
Nice and easy!
We're asking relatively complex questions that may require the model to reason out a plan and proceed through it in steps, but this reasoning is hidden from us - we simply wait a little longer before being shown the response.
However, if we inspect the output we can see that the model has made use of a hidden set of 'reasoning' tokens that were included in the model context window, but not exposed to us as end users.
We can see these tokens and a summary of the reasoning (but not the literal tokens used) in the response.
print(next(rx for rx in response.output if rx.type =='reasoning').summary[0].text)response.usage.to_dict()
**Determining Olympic cities**
The user is asking about the last four Olympic host cities, assuming it’s for the Summer Olympics. Those would be Beijing in 2008, London in 2012, Rio in 2016, and Tokyo in 2020. They’re interested in the lowest average temperature, which I see is London at around 11°C. Beijing is about 13°C, Tokyo 16°C, but London has the lowest. I should clarify it's the mean annual temperature. So, I'll present it neatly that London is the answer.
It is important to know about these reasoning tokens, because it means we will consume our available context window more quickly than with traditional chat models.
What happens if we ask the model a complex request that also requires the use of custom tools?
Let's imagine we have more questions about Olympic Cities, but we also have an internal database that contains IDs for each city.
It's possible that the model will need to invoke our tool partway through its reasoning process before returning a result.
Let's make a function that produces a random UUID and ask the model to reason about these UUIDs.
defget_city_uuid(city: str) -> str:"""Just a fake tool to return a fake UUID""" uuid =str(uuid4())returnf"{city} ID: {uuid}"# The tool schema that we will pass to the modeltools = [ {"type": "function","name": "get_city_uuid","description": "Retrieve the internal ID for a city from the internal database. Only invoke this function if the user needs to know the internal ID for a city.","parameters": {"type": "object","properties": {"city": {"type": "string", "description": "The name of the city to get information about"} },"required": ["city"] } }]# This is a general practice - we need a mapping of the tool names we tell the model about, and the functions that implement them.tool_mapping = {"get_city_uuid": get_city_uuid}# Let's add this to our defaults so we don't have to pass it every timeMODEL_DEFAULTS["tools"] = toolsresponse = client.responses.create(input="What's the internal ID for the lowest-temperature city?",previous_response_id=response.id,**MODEL_DEFAULTS)print(response.output_text)
We didn't get an output_text this time. Let's look at the response output
Along with the reasoning step, the model has successfully identified the need for a tool call and passed back instructions to send to our function call.
Let's invoke the function and send the results to the model so it can continue reasoning.
Function responses are a special kind of message, so we need to structure our next message as a special kind of input:
# Extract the function call(s) from the responsenew_conversation_items = []function_calls = [rx for rx in response.output if rx.type =='function_call']for function_call in function_calls: target_tool = tool_mapping.get(function_call.name)ifnot target_tool:raiseValueError(f"No tool found for function call: {function_call.name}") arguments = json.loads(function_call.arguments) # Load the arguments as a dictionary tool_output = target_tool(**arguments) # Invoke the tool with the arguments new_conversation_items.append({"type": "function_call_output","call_id": function_call.call_id, # We map the call_id back to the original function call"output": tool_output })
The internal ID for London is ce863d03-9c01-4de2-9af8-96b123852aec.
This works great here - as we know that a single function call is all that is required for the model to respond - but we also need to account for situations where multiple tool calls might need to be executed for the reasoning to complete.
Some OpenAI models support the parameter parallel_tool_calls which allows the model to return an array of functions which we can then execute in parallel. However, reasoning models may produce a sequence of function calls that must be made in series, particularly as some steps may depend on the results of previous ones.
As such, we ought to define a general pattern which we can use to handle arbitrarily complex reasoning workflows:
At each step in the conversation, initialise a loop
If the response contains function calls, we must assume the reasoning is ongoing and we should feed the function results (and any intermediate reasoning) back into the model for further inference
If there are no function calls and we instead receive a Reponse.output with a type of 'message', we can safely assume the agent has finished reasoning and we can break out of the loop
# Let's wrap our logic above into a function which we can use to invoke tool calls.definvoke_functions_from_response(response, tool_mapping: dict[str, Callable] = tool_mapping ) -> list[dict]:"""Extract all function calls from the response, look up the corresponding tool function(s) and execute them. (This would be a good place to handle asynchroneous tool calls, or ones that take a while to execute.) This returns a list of messages to be added to the conversation history. """ intermediate_messages = []for response_item in response.output:if response_item.type =='function_call': target_tool = tool_mapping.get(response_item.name)if target_tool:try: arguments = json.loads(response_item.arguments)print(f"Invoking tool: {response_item.name}({arguments})") tool_output = target_tool(**arguments) intermediate_messages.append({"type": "function_call_output","call_id": response_item.call_id,"output": tool_output })exceptExceptionas e: tool_output =f"Error executing function call: {function_call.name}: {e}"else:print(f"ERROR - No tool registered for function call: {function_call.name}")return intermediate_messages
Now let's demonstrate the loop concept we discussed before.
initial_question ="What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, and which cities have IDs beginning with the number '2'. Use your internal tools to look up the IDs?"# We fetch a response and then kick off a loop to handle the responseresponse = client.responses.create(input=initial_question,**MODEL_DEFAULTS,)whileTrue: function_responses = invoke_functions_from_response(response) messages = [rx.to_dict() for rx in response.output if rx.type =='message']iflen(function_responses) ==0: # We're done reasoningprint(response.output_text)breakelse:print("More reasoning required, continuing...") response = client.responses.create(input=function_responses,previous_response_id=response.id,**MODEL_DEFAULTS )
Invoking tool: get_city_uuid({'city': 'Turin'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Beijing'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Vancouver'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'London'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Sochi'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Pyeongchang'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Tokyo'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Paris'})
More reasoning required, continuing...
Here are the internal IDs for the cities that have hosted the Olympics in the last 20 years:
• Turin: 53c0e635-7a1c-478b-84ca-742a6f0df830
• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5
• Vancouver: cc8be1f1-5154-46f4-8879-451e97f771c7
• London: a24addb0-4dd4-444c-a4a9-199612e0aca8
• Sochi: da7386b3-2283-45cc-9244-c1e0f4121782
• Rio de Janeiro: 01f60ec2-0efd-40b8-bb85-e63c2d2ddf4c
• Pyeongchang: f5d3687a-0097-4551-800c-aec66c37e8db
• Tokyo: 15aa0b12-7f7c-43d0-9ba3-b91250cafe48
• Paris: 56d062f2-8835-4707-a826-5d68d8be9d3f
Of these, the only city whose ID begins with “2” is:
• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5
So far so good! It's really cool to watch the model pause execution to run a function before continuing.
In practice the example above is quite trivial, and production use cases may be much more complex:
Our context window may grow too large and we may wish to prune older and less relevant messages, or summarize the conversation so far
We may wish to allow users to navigate back and forth through the conversation and re-generate answers
We may wish to store messages in our own database for audit purposes rather than relying on OpenAI's storage and orchestration
etc.
In these situations we may wish to take full control of the conversation. Rather than using previous_message_id we can instead treat the API as 'stateless' and make and maintain an array of conversation items that we send to the model as input each time.
This poses some Reasoning model specific nuances to consider.
In particular, it is essential that we preserve any reasoning and function call responses in our conversation history.
This is how the model keeps track of what chain-of-thought steps it has run through. The API will error if these are not included.
Let's run through the example above again, orchestrating the messages ourselves and tracking token usage.
Note that the code below is structured for readibility - in practice you may wish to consider a more sophisticated workflow to handle edge cases
# Let's initialise our conversation with the first user messagetotal_tokens_used =0user_messages = ["Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.","Great thanks! We've just updated the IDs - could you please check again?" ]conversation = []for message in user_messages: conversation_item = {"role": "user","type": "message","content": message }print(f"{'*'*79}\nUser message: {message}\n{'*'*79}") conversation.append(conversation_item)whileTrue: # Response loop response = client.responses.create(input=conversation,**MODEL_DEFAULTS ) total_tokens_used += response.usage.total_tokens reasoning = [rx.to_dict() for rx in response.output if rx.type =='reasoning'] function_calls = [rx.to_dict() for rx in response.output if rx.type =='function_call'] messages = [rx.to_dict() for rx in response.output if rx.type =='message']iflen(reasoning) >0:print("More reasoning required, continuing...")# Ensure we capture any reasoning steps conversation.extend(reasoning)print('\n'.join(s['text'] for r in reasoning for s in r['summary']))iflen(function_calls) >0: function_outputs = invoke_functions_from_response(response)# Preserve order of function calls and outputs in case of multiple function calls (currently not supported by reasoning models, but worth considering) interleaved = [val for pair inzip(function_calls, function_outputs) for val in pair] conversation.extend(interleaved)iflen(messages) >0:print(response.output_text) conversation.extend(messages)iflen(function_calls) ==0: # No more functions = We're done reasoning and we're ready for the next user messagebreakprint(f"Total tokens used: {total_tokens_used} ({total_tokens_used /200_000:.2%} of o4-mini's context window)")
*******************************************************************************
User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.
*******************************************************************************
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Beijing'})
Invoking tool: get_city_uuid({'city': 'London'})
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
Invoking tool: get_city_uuid({'city': 'Tokyo'})
Invoking tool: get_city_uuid({'city': 'Paris'})
More reasoning required, continuing...
Here are the UUIDs for each Summer Olympic host city since 2005, with the leading numeric prefix highlighted and assessed for primality:
• Beijing (2008): 11ab370c-2f59-4c35-b557-f845e22c847b
– Leading digits “11” → 11 is prime
• London (2012): 0fdff00b-cbfb-4b82-bdd8-2107c4100319
– Leading digit “0” → 0 is not prime
• Rio de Janeiro (2016): 9c2202c4-00ab-46ee-a954-a17505e32d64
– Leading digit “9” → 9 is not prime
• Tokyo (2020): c4bf0281-7e84-4489-88e4-750e07211334
– No leading digit → N/A
• Paris (2024): b8c4b88e-dece-435d-b398-94f0ff762c88
– No leading digit → N/A
Conclusion: Only Beijing’s ID begins with a prime number (“11”).
*******************************************************************************
User message: Great thanks! We've just updated the IDs - could you please check again?
*******************************************************************************
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Beijing'})
Invoking tool: get_city_uuid({'city': 'London'})
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
Invoking tool: get_city_uuid({'city': 'Tokyo'})
Invoking tool: get_city_uuid({'city': 'Paris'})
Here are the updated UUIDs and their leading numeric prefixes:
• Beijing (2008): 30b0886f-c4da-431c-8983-33e8bbb4c352
– Leading “30” → 30 is not prime
• London (2012): 72ff5a9d-d147-4ba8-9a87-64e3572ba3bc
– Leading “72” → 72 is not prime
• Rio de Janeiro (2016): 7a45a392-b43a-41be-8eaf-07ec44d42a2b
– Leading “7” → 7 is prime
• Tokyo (2020): f725244f-079f-44e1-a91c-5c31c270c209
– Leading “f” → no numeric prefix
• Paris (2024): b0230ad4-bc35-48be-a198-65a9aaf28fb5
– Leading “b” → no numeric prefix
Conclusion: After the update, only Rio de Janeiro’s ID begins with a prime number (“7”).
Total tokens used: 9734 (4.87% of o4-mini's context window)
In this cookbook, we identified how to combine function calling with OpenAI's reasoning models to demonstrate multi-step tasks that are dependent on external data sources.
Importantly, we covered reasoning-model specific nuances in the function calling process, specifically that:
The model may choose to make multiple function calls or reasoning steps in series, and some steps may depend on the results of previous ones
We cannot know how many of these steps there will be, so we must process responses with a loop
The responses API makes orchestration easy using the previous_response_id parameter, but where manual control is needed, it's important to maintain the correct order of conversation item to preserve the 'chain-of-thought'
The examples used here are rather simple, but you can imagine how this technique could be extended to more real-world use cases, such as:
Looking up a customer's transaction history and recent correspondence to determine if they are eligible for a promotional offer
Calling recent transaction logs, geolocation data, and device metadata to assess the likelihood of a transaction being fraudulent
Reviewing internal HR databases to fetch an employee’s benefits usage, tenure, and recent policy changes to answer personalized HR questions
Reading internal dashboards, competitor news feeds, and market analyses to compile a daily executive briefing tailored to their focus areas