Apr 25, 2025

Handling Function Calls with Reasoning Models

Open in Github

OpenAI now offers function calling using reasoning models. Reasoning models are trained to follow logical chains of thought, making them better suited for complex or multi-step tasks.

Reasoning models like o3 and o4-mini are LLMs trained with reinforcement learning to perform reasoning. Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows. They're also the best models for Codex CLI, our lightweight coding agent.

For the most part, using these models via the API is very simple and comparable to using familiar 'chat' models.

However, there are some nuances to bear in mind, particularly when it comes to using features such as function calling.

All examples in this notebook use the newer Responses API which provides convenient abstractions for managing conversation state. However the principles here are relevant when using the older chat completions API.

# pip install openai
# Import libraries 
import json
from openai import OpenAI
from uuid import uuid4
from typing import Callable

client = OpenAI()
MODEL_DEFAULTS = {
    "model": "o4-mini", # 200,000 token context window
    "reasoning": {"effort": "low", "summary": "auto"}, # Automatically summarise the reasoning process. Can also choose "detailed" or "none"
}

Let's make a simple call to a reasoning model using the Responses API. We specify a low reasoning effort and retrieve the response with the helpful output_text attribute. We can ask follow up questions and use the previous_response_id to let OpenAI manage the conversation history automatically

response = client.responses.create(
    input="Which of the last four Olympic host cities has the highest average temperature?",
    **MODEL_DEFAULTS
)
print(response.output_text)

response = client.responses.create(
    input="what about the lowest?",
    previous_response_id=response.id,
    **MODEL_DEFAULTS
)
print(response.output_text)
Among the last four Summer Olympic host cities (Beijing 2008, London 2012, Rio de Janeiro 2016 and Tokyo 2020), Rio de Janeiro has by far the highest mean annual temperature—around 23 °C, compared with about 16 °C in Tokyo, 13 °C in Beijing and 11 °C in London.
Of those four, London has the lowest mean annual temperature, at roughly 11 °C.

Nice and easy!

We're asking relatively complex questions that may require the model to reason out a plan and proceed through it in steps, but this reasoning is hidden from us - we simply wait a little longer before being shown the response.

However, if we inspect the output we can see that the model has made use of a hidden set of 'reasoning' tokens that were included in the model context window, but not exposed to us as end users. We can see these tokens and a summary of the reasoning (but not the literal tokens used) in the response.

print(next(rx for rx in response.output if rx.type == 'reasoning').summary[0].text)
response.usage.to_dict()
**Determining Olympic cities**

The user is asking about the last four Olympic host cities, assuming it’s for the Summer Olympics. Those would be Beijing in 2008, London in 2012, Rio in 2016, and Tokyo in 2020. They’re interested in the lowest average temperature, which I see is London at around 11°C. Beijing is about 13°C, Tokyo 16°C, but London has the lowest. I should clarify it's the mean annual temperature. So, I'll present it neatly that London is the answer.
{'input_tokens': 109,
 'input_tokens_details': {'cached_tokens': 0},
 'output_tokens': 89,
 'output_tokens_details': {'reasoning_tokens': 64},
 'total_tokens': 198}

It is important to know about these reasoning tokens, because it means we will consume our available context window more quickly than with traditional chat models.

Calling custom functions

What happens if we ask the model a complex request that also requires the use of custom tools?

  • Let's imagine we have more questions about Olympic Cities, but we also have an internal database that contains IDs for each city.
  • It's possible that the model will need to invoke our tool partway through its reasoning process before returning a result.
  • Let's make a function that produces a random UUID and ask the model to reason about these UUIDs.

def get_city_uuid(city: str) -> str:
    """Just a fake tool to return a fake UUID"""
    uuid = str(uuid4())
    return f"{city} ID: {uuid}"

# The tool schema that we will pass to the model
tools = [
    {
        "type": "function",
        "name": "get_city_uuid",
        "description": "Retrieve the internal ID for a city from the internal database. Only invoke this function if the user needs to know the internal ID for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "The name of the city to get information about"}
            },
            "required": ["city"]
        }
    }
]

# This is a general practice - we need a mapping of the tool names we tell the model about, and the functions that implement them.
tool_mapping = {
    "get_city_uuid": get_city_uuid
}

# Let's add this to our defaults so we don't have to pass it every time
MODEL_DEFAULTS["tools"] = tools

response = client.responses.create(
    input="What's the internal ID for the lowest-temperature city?",
    previous_response_id=response.id,
    **MODEL_DEFAULTS)
print(response.output_text)

We didn't get an output_text this time. Let's look at the response output

response.output
[ResponseReasoningItem(id='rs_680bcde645a08191bbb8b42ba4613aef07423969e3977116', summary=[], type='reasoning', status=None),
 ResponseFunctionToolCall(arguments='{"city":"London"}', call_id='call_VcyIJQnP7HW2gge7Nh8HmPNG', name='get_city_uuid', type='function_call', id='fc_680bcde7cda48191ada496d462ca7c5407423969e3977116', status='completed')]

Along with the reasoning step, the model has successfully identified the need for a tool call and passed back instructions to send to our function call.

Let's invoke the function and send the results to the model so it can continue reasoning. Function responses are a special kind of message, so we need to structure our next message as a special kind of input:

{
    "type": "function_call_output",
    "call_id": function_call.call_id,
    "output": tool_output
}
# Extract the function call(s) from the response
new_conversation_items = []
function_calls = [rx for rx in response.output if rx.type == 'function_call']
for function_call in function_calls:
    target_tool = tool_mapping.get(function_call.name)
    if not target_tool:
        raise ValueError(f"No tool found for function call: {function_call.name}")
    arguments = json.loads(function_call.arguments) # Load the arguments as a dictionary
    tool_output = target_tool(**arguments) # Invoke the tool with the arguments
    new_conversation_items.append({
        "type": "function_call_output",
        "call_id": function_call.call_id, # We map the call_id back to the original function call
        "output": tool_output
    })
response = client.responses.create(
    input=new_conversation_items,
    previous_response_id=response.id,
    **MODEL_DEFAULTS
)
print(response.output_text)
The internal ID for London is ce863d03-9c01-4de2-9af8-96b123852aec.

This works great here - as we know that a single function call is all that is required for the model to respond - but we also need to account for situations where multiple tool calls might need to be executed for the reasoning to complete.

Executing multiple functions in series

Some OpenAI models support the parameter parallel_tool_calls which allows the model to return an array of functions which we can then execute in parallel. However, reasoning models may produce a sequence of function calls that must be made in series, particularly as some steps may depend on the results of previous ones. As such, we ought to define a general pattern which we can use to handle arbitrarily complex reasoning workflows:

  • At each step in the conversation, initialise a loop
  • If the response contains function calls, we must assume the reasoning is ongoing and we should feed the function results (and any intermediate reasoning) back into the model for further inference
  • If there are no function calls and we instead receive a Reponse.output with a type of 'message', we can safely assume the agent has finished reasoning and we can break out of the loop
# Let's wrap our logic above into a function which we can use to invoke tool calls.
def invoke_functions_from_response(response,
                                   tool_mapping: dict[str, Callable] = tool_mapping
                                   ) -> list[dict]:
    """Extract all function calls from the response, look up the corresponding tool function(s) and execute them.
    (This would be a good place to handle asynchroneous tool calls, or ones that take a while to execute.)
    This returns a list of messages to be added to the conversation history.
    """
    intermediate_messages = []
    for response_item in response.output:
        if response_item.type == 'function_call':
            target_tool = tool_mapping.get(response_item.name)
            if target_tool:
                try:
                    arguments = json.loads(response_item.arguments)
                    print(f"Invoking tool: {response_item.name}({arguments})")
                    tool_output = target_tool(**arguments)
                    intermediate_messages.append({
                        "type": "function_call_output",
                        "call_id": response_item.call_id,
                        "output": tool_output
                    })
                except Exception as e:
                    tool_output = f"Error executing function call: {function_call.name}: {e}"
            else:
                print(f"ERROR - No tool registered for function call: {function_call.name}")
    return intermediate_messages

Now let's demonstrate the loop concept we discussed before.

initial_question = "What are the internal IDs for the cities that have hosted the Olympics in the last 20 years, and which cities have IDs beginning with the number '2'. Use your internal tools to look up the IDs?"

# We fetch a response and then kick off a loop to handle the response
response = client.responses.create(
    input=initial_question,
    **MODEL_DEFAULTS,
)
while True:   
    function_responses = invoke_functions_from_response(response)
    messages = [rx.to_dict() for rx in response.output if rx.type == 'message']
    if len(function_responses) == 0: # We're done reasoning
        print(response.output_text)
        break
    else:
        print("More reasoning required, continuing...")
        response = client.responses.create(
            input=function_responses,
            previous_response_id=response.id,
            **MODEL_DEFAULTS
        )
Invoking tool: get_city_uuid({'city': 'Turin'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Beijing'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Vancouver'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'London'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Sochi'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Pyeongchang'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Tokyo'})
More reasoning required, continuing...
Invoking tool: get_city_uuid({'city': 'Paris'})
More reasoning required, continuing...
Here are the internal IDs for the cities that have hosted the Olympics in the last 20 years:

• Turin: 53c0e635-7a1c-478b-84ca-742a6f0df830  
• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5  
• Vancouver: cc8be1f1-5154-46f4-8879-451e97f771c7  
• London: a24addb0-4dd4-444c-a4a9-199612e0aca8  
• Sochi: da7386b3-2283-45cc-9244-c1e0f4121782  
• Rio de Janeiro: 01f60ec2-0efd-40b8-bb85-e63c2d2ddf4c  
• Pyeongchang: f5d3687a-0097-4551-800c-aec66c37e8db  
• Tokyo: 15aa0b12-7f7c-43d0-9ba3-b91250cafe48  
• Paris: 56d062f2-8835-4707-a826-5d68d8be9d3f  

Of these, the only city whose ID begins with “2” is:
• Beijing: 2c48757a-a1ed-48e7-897f-9edecf4909b5

Manual conversation orchestration

So far so good! It's really cool to watch the model pause execution to run a function before continuing. In practice the example above is quite trivial, and production use cases may be much more complex:

  • Our context window may grow too large and we may wish to prune older and less relevant messages, or summarize the conversation so far
  • We may wish to allow users to navigate back and forth through the conversation and re-generate answers
  • We may wish to store messages in our own database for audit purposes rather than relying on OpenAI's storage and orchestration
  • etc.

In these situations we may wish to take full control of the conversation. Rather than using previous_message_id we can instead treat the API as 'stateless' and make and maintain an array of conversation items that we send to the model as input each time.

This poses some Reasoning model specific nuances to consider.

  • In particular, it is essential that we preserve any reasoning and function call responses in our conversation history.
  • This is how the model keeps track of what chain-of-thought steps it has run through. The API will error if these are not included.

Let's run through the example above again, orchestrating the messages ourselves and tracking token usage.


Note that the code below is structured for readibility - in practice you may wish to consider a more sophisticated workflow to handle edge cases

# Let's initialise our conversation with the first user message
total_tokens_used = 0
user_messages = [
    "Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.",
    "Great thanks! We've just updated the IDs - could you please check again?"
    ]

conversation = []
for message in user_messages:
    conversation_item = {
        "role": "user",
        "type": "message",
        "content": message
    }
    print(f"{'*' * 79}\nUser message: {message}\n{'*' * 79}")
    conversation.append(conversation_item)
    while True: # Response loop
        response = client.responses.create(
            input=conversation,
            **MODEL_DEFAULTS
        )
        total_tokens_used += response.usage.total_tokens
        reasoning = [rx.to_dict() for rx in response.output if rx.type == 'reasoning']
        function_calls = [rx.to_dict() for rx in response.output if rx.type == 'function_call']
        messages = [rx.to_dict() for rx in response.output if rx.type == 'message']
        if len(reasoning) > 0:
            print("More reasoning required, continuing...")
            # Ensure we capture any reasoning steps
            conversation.extend(reasoning)
            print('\n'.join(s['text'] for r in reasoning for s in r['summary']))
        if len(function_calls) > 0:
            function_outputs = invoke_functions_from_response(response)
            # Preserve order of function calls and outputs in case of multiple function calls (currently not supported by reasoning models, but worth considering)
            interleaved = [val for pair in zip(function_calls, function_outputs) for val in pair]
            conversation.extend(interleaved)
        if len(messages) > 0:
            print(response.output_text)
            conversation.extend(messages)
        if len(function_calls) == 0:  # No more functions = We're done reasoning and we're ready for the next user message
            break
print(f"Total tokens used: {total_tokens_used} ({total_tokens_used / 200_000:.2%} of o4-mini's context window)")
*******************************************************************************
User message: Of those cities that have hosted the summer Olympic games in the last 20 years - do any of them have IDs beginning with a prime number? Use your available tools to look up the IDs for each city.
*******************************************************************************
More reasoning required, continuing...

Invoking tool: get_city_uuid({'city': 'Beijing'})
Invoking tool: get_city_uuid({'city': 'London'})
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
Invoking tool: get_city_uuid({'city': 'Tokyo'})
Invoking tool: get_city_uuid({'city': 'Paris'})
More reasoning required, continuing...

Here are the UUIDs for each Summer Olympic host city since 2005, with the leading numeric prefix highlighted and assessed for primality:

• Beijing (2008): 11ab370c-2f59-4c35-b557-f845e22c847b  
  – Leading digits “11” → 11 is prime  
• London (2012): 0fdff00b-cbfb-4b82-bdd8-2107c4100319  
  – Leading digit “0” → 0 is not prime  
• Rio de Janeiro (2016): 9c2202c4-00ab-46ee-a954-a17505e32d64  
  – Leading digit “9” → 9 is not prime  
• Tokyo (2020): c4bf0281-7e84-4489-88e4-750e07211334  
  – No leading digit → N/A  
• Paris (2024): b8c4b88e-dece-435d-b398-94f0ff762c88  
  – No leading digit → N/A  

Conclusion: Only Beijing’s ID begins with a prime number (“11”).
*******************************************************************************
User message: Great thanks! We've just updated the IDs - could you please check again?
*******************************************************************************
More reasoning required, continuing...

Invoking tool: get_city_uuid({'city': 'Beijing'})
Invoking tool: get_city_uuid({'city': 'London'})
Invoking tool: get_city_uuid({'city': 'Rio de Janeiro'})
Invoking tool: get_city_uuid({'city': 'Tokyo'})
Invoking tool: get_city_uuid({'city': 'Paris'})
Here are the updated UUIDs and their leading numeric prefixes:

• Beijing (2008): 30b0886f-c4da-431c-8983-33e8bbb4c352  
  – Leading “30” → 30 is not prime  
• London (2012): 72ff5a9d-d147-4ba8-9a87-64e3572ba3bc  
  – Leading “72” → 72 is not prime  
• Rio de Janeiro (2016): 7a45a392-b43a-41be-8eaf-07ec44d42a2b  
  – Leading “7” → 7 is prime  
• Tokyo (2020): f725244f-079f-44e1-a91c-5c31c270c209  
  – Leading “f” → no numeric prefix  
• Paris (2024): b0230ad4-bc35-48be-a198-65a9aaf28fb5  
  – Leading “b” → no numeric prefix  

Conclusion: After the update, only Rio de Janeiro’s ID begins with a prime number (“7”).
Total tokens used: 9734 (4.87% of o4-mini's context window)

Summary

In this cookbook, we identified how to combine function calling with OpenAI's reasoning models to demonstrate multi-step tasks that are dependent on external data sources.

Importantly, we covered reasoning-model specific nuances in the function calling process, specifically that:

  • The model may choose to make multiple function calls or reasoning steps in series, and some steps may depend on the results of previous ones
  • We cannot know how many of these steps there will be, so we must process responses with a loop
  • The responses API makes orchestration easy using the previous_response_id parameter, but where manual control is needed, it's important to maintain the correct order of conversation item to preserve the 'chain-of-thought'

The examples used here are rather simple, but you can imagine how this technique could be extended to more real-world use cases, such as:

  • Looking up a customer's transaction history and recent correspondence to determine if they are eligible for a promotional offer
  • Calling recent transaction logs, geolocation data, and device metadata to assess the likelihood of a transaction being fraudulent
  • Reviewing internal HR databases to fetch an employee’s benefits usage, tenure, and recent policy changes to answer personalized HR questions
  • Reading internal dashboards, competitor news feeds, and market analyses to compile a daily executive briefing tailored to their focus areas