How to handle rate limits

Sep 10, 2022
Open in Github

When you call the OpenAI API repeatedly, you may encounter error messages that say 429: 'Too Many Requests' or RateLimitError. These error messages come from exceeding the API's rate limits.

This guide shares tips for avoiding and handling rate limit errors.

To see an example script for throttling parallel requests to avoid rate limit errors, see api_request_parallel_processor.py.

Why rate limits exist

Rate limits are a common practice for APIs, and they're put in place for a few different reasons.

  • First, they help protect against abuse or misuse of the API. For example, a malicious actor could flood the API with requests in an attempt to overload it or cause disruptions in service. By setting rate limits, OpenAI can prevent this kind of activity.
  • Second, rate limits help ensure that everyone has fair access to the API. If one person or organization makes an excessive number of requests, it could bog down the API for everyone else. By throttling the number of requests that a single user can make, OpenAI ensures that everyone has an opportunity to use the API without experiencing slowdowns.
  • Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If requests to the API increase dramatically, it could tax the servers and cause performance issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience for all users.

Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation of the API for its users.

Default rate limits

Your rate limit and spending limit (quota) are automatically adjusted based on a number of factors. As your usage of the OpenAI API goes up and you successfully pay the bill, we automatically increase your usage tier. You can find specific information regarding rate limits using the resources below.

Other rate limit resources

Read more about OpenAI's rate limits in these other resources:

Requesting a rate limit increase

If you'd like your organization's rate limit increased, please visit your Limits settings page to see how you can increase your usage tier

import openai
import os

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))

Example rate limit error

A rate limit error will occur when API requests are sent too quickly. If using the OpenAI Python library, they will look something like:

RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min. Contact support@openai.com if you continue to have issues or if you’d like to request an increase.

Below is example code for triggering a rate limit error.

# request a bunch of completions in a loop
for _ in range(100):
    client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Hello"}],
        max_tokens=10,
    )

How to avoid rate limit errors

Retrying with exponential backoff

One easy way to avoid rate limit errors is to automatically retry requests with a random exponential backoff. Retrying with exponential backoff means performing a short sleep when a rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful, the sleep length is increased and the process is repeated. This continues until the request is successful or until a maximum number of retries is reached.

This approach has many benefits:

  • Automatic retries means you can recover from rate limit errors without crashes or missing data
  • Exponential backoff means that your first retries can be tried quickly, while still benefiting from longer delays if your first few retries fail
  • Adding random jitter to the delay helps retries from all hitting at the same time

Note that unsuccessful requests contribute to your per-minute limit, so continuously resending a request won’t work.

Below are a few example solutions.

Example #1: Using the Tenacity library

Tenacity is an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.

To add exponential backoff to your requests, you can use the tenacity.retry decorator. The following example uses the tenacity.wait_random_exponential function to add random exponential backoff to a request.

Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.

from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)  # for exponential backoff

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)


completion_with_backoff(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Once upon a time,"}])
ChatCompletion(id='chatcmpl-8PAu6anX2JxQdYmJRzps38R8u0ZBC', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='in a small village nestled among green fields and rolling hills, there lived a kind-hearted and curious young girl named Lily. Lily was known for her bright smile and infectious laughter, bringing joy to everyone around her.\n\nOne sunny morning, as Lily played in the meadows, she stumbled upon a mysterious book tucked away beneath a tall oak tree. Intrigued, she picked it up and dusted off its weathered cover to reveal intricate golden patterns. Without hesitation, she opened it, discovering that its pages were filled with magical tales and enchanting adventures.\n\nAmong the stories she found, one particularly caught her attention—a tale of a long-lost treasure hidden deep within a mysterious forest. Legend had it that whoever found this hidden treasure would be granted one wish, no matter how big or small. Excited by the prospect of finding such treasure and fulfilling her wildest dreams, Lily decided to embark on a thrilling journey to the forest.\n\nGathering her courage, Lily told her parents about the magical book and her quest to find the hidden treasure. Though concerned for their daughter\'s safety, they couldn\'t help but admire her spirit and determination. They hugged her tightly and blessed her with love and luck, promising to await her return.\n\nEquipped with a map she found within the book, Lily ventured into the depths of the thick forest. The trees whispered tales of forgotten secrets, and the enchanted creatures hidden within watched her every step. But Lily remained undeterred, driven by her desire to discover what lay ahead.\n\nDays turned into weeks as Lily traversed through dense foliage, crossed swift rivers, and climbed treacherous mountains. She encountered mystical beings who offered guidance and protection along her perilous journey. With their help, she overcame countless obstacles and grew braver with each passing day.\n\nFinally, after what felt like an eternity, Lily reached the heart of the forest. There, beneath a jeweled waterfall, she found the long-lost treasure—a magnificent chest adorned with sparkling gemstones. Overwhelmed with excitement, she gently opened the chest to reveal a brilliant light that illuminated the forest.\n\nWithin the glow, a wise voice echoed, "You have proven your courage and pure heart, young Lily. Make your wish, and it shall be granted."\n\nLily thought deeply about her wish, realizing that her true treasure was the love and happiness she felt in her heart. Instead of making a wish for herself, she asked for the wellbeing and prosperity of her village, spreading joy and harmony to everyone living there.\n\nAs the light faded, Lily knew her quest was complete. She retraced her steps through the forest, returning home to find her village flourishing. Fields bloomed with vibrant flowers, and laughter filled the air.\n\nThe villagers greeted Lily with open arms, recognizing her selflessness and the magic she had brought into their lives. From that day forward, they told the tale of Lily\'s journey, celebrating her as a heroine who embodied the power of love, kindness, and the belief that true treasure lies within oneself.\n\nAnd so, the story of Lily became an everlasting legend, inspiring generations to follow their dreams, be selfless, and find the true treasures that lie within their hearts.', role='assistant', function_call=None, tool_calls=None))], created=1701010806, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=641, prompt_tokens=12, total_tokens=653))

Example #2: Using the backoff library

Another library that provides function decorators for backoff and retry is backoff.

Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about its reliability or security.

import backoff  # for exponential backoff

@backoff.on_exception(backoff.expo, openai.RateLimitError)
def completions_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)


completions_with_backoff(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Once upon a time,"}])
ChatCompletion(id='chatcmpl-8PAwkg7Q9pPeAkvVuAZ8AyA108WhR', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="in a small village, there lived a young girl named Lily. She had fiery red hair, lively green eyes, and a spirit as wild as the rushing river nearby. Lily was known for her curious nature and her desire to explore the world beyond the village boundaries.\n\nOne day, while playing near the river, Lily spotted an injured bird nested on a branch. Its wing was broken, and it seemed unable to fly away. Lily's heart filled with sadness, and she knew she couldn't leave the bird alone.\n\nCarefully, she climbed up the tree and gently placed the bird inside her pocket. Lily brought it home and made a cozy bed for it in a small wooden box. She named the bird Ruby, after its shimmering red feathers.\n\nDays turned into weeks, and Ruby's wing slowly healed under Lily's constant care and attention. As they spent time together, a deep bond grew between them. Ruby would chirp happily whenever Lily approached, and she would spend hours talking to the bird, sharing stories of her adventures, dreams, and fears.\n\nOne evening, as Lily was about to go to bed, a peculiar thing happened. Ruby hopped out of his box and fluttered onto the windowsill. He turned to face Lily with his bright eyes and began to sing a beautiful melody.\n\nLily was astonished. Never before had she heard Ruby sing. The tune was so captivating that it filled the room and made the quiet night come alive. The magical music seemed to touch Lily's soul, awakening a deep sense of wonder and wanderlust within her.\n\nFilled with an undeniable urge to explore, Lily decided it was time to go on an adventure with her newfound friend, Ruby. She packed a small bag and bid farewell to her family and friends, promising to return one day.\n\nTogether, Lily and Ruby embarked on a grand journey, soaring across expansive skies, diving into lush forests, and exploring hidden caves. They encountered magnificent landscapes, unique creatures, and encountered kind-hearted individuals who shared their wisdom and stories.\n\nThroughout their journey, Ruby's song continued to inspire and guide them. It became a symbol of hope, reminding them to embrace bravery, follow their dreams, and always remain true to themselves.\n\nAs the years passed, Lily and Ruby traversed the world, weaving their stories into the tapestry of time. They became renowned for their extraordinary bond and the magic they shared with everyone they encountered.\n\nEventually, it was time for Lily to return to her village, a place eagerly awaiting her return. She had grown wise, learned many lessons, and gained a deeper understanding of herself and the world around her.\n\nWith Ruby perched on her shoulder, they descended upon the village like a ray of sunshine, bringing joy and wonder to every heart. Lily shared the wisdom she had acquired and inspired others to embrace their own adventures, no matter how big or small.\n\nAnd so, the tale of Lily and Ruby became legend, passed down from generation to generation. Their story reminded people to cherish the connections they make, to nurture their dreams, and to believe in the magic that lies within them.", role='assistant', function_call=None, tool_calls=None))], created=1701010970, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=621, prompt_tokens=12, total_tokens=633))
# imports
import random
import time

# define a retry decorator
def retry_with_exponential_backoff(
    func,
    initial_delay: float = 1,
    exponential_base: float = 2,
    jitter: bool = True,
    max_retries: int = 10,
    errors: tuple = (openai.RateLimitError,),
):
    """Retry a function with exponential backoff."""

    def wrapper(*args, **kwargs):
        # Initialize variables
        num_retries = 0
        delay = initial_delay

        # Loop until a successful response or max_retries is hit or an exception is raised
        while True:
            try:
                return func(*args, **kwargs)

            # Retry on specified errors
            except errors as e:
                # Increment retries
                num_retries += 1

                # Check if max retries has been reached
                if num_retries > max_retries:
                    raise Exception(
                        f"Maximum number of retries ({max_retries}) exceeded."
                    )

                # Increment the delay
                delay *= exponential_base * (1 + jitter * random.random())

                # Sleep for the delay
                time.sleep(delay)

            # Raise exceptions for any errors not specified
            except Exception as e:
                raise e

    return wrapper


@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)


completions_with_backoff(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Once upon a time,"}])
ChatCompletion(id='chatcmpl-8PAxGvV3GbLpnOoKSvJ00XCUdOglM', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="in a faraway kingdom, there lived a young princess named Aurora. She was known for her beauty, grace, and kind heart. Aurora's kingdom was filled with lush green meadows, towering mountains, and sparkling rivers. The princess loved spending time exploring the enchanting forests surrounding her castle.\n\nOne day, while Aurora was wandering through the woods, she stumbled upon a hidden clearing. At the center stood a majestic oak tree, its branches reaching towards the sky. Aurora approached the tree with curiosity, and as she got closer, she noticed a small door at its base.\n\nIntrigued, she gently pushed open the door and was amazed to find herself in a magical realm. The forest transformed into a breathtaking wonderland, with colorful flowers blooming in every direction and woodland creatures frolicking joyously. Aurora's eyes widened with wonder as she explored this extraordinary world.\n\nAs she explored further, Aurora came across a small cottage in the distance. Curiosity overcame her, and she cautiously approached the cottage. To her surprise, an elderly woman with twinkling eyes and a warm smile stood in the doorway, welcoming her inside.\n\nThe woman revealed herself to be a fairy named Luna. Luna informed Aurora that she had been chosen to undertake a quest that would bring harmony to both her kingdom and the mystical realm. Aurora, eager to help, listened intently as Luna explained that a powerful enchantress had cast a spell on the kingdom, causing darkness and despair to loom over the land.\n\nTo break the curse, Aurora had to embark on a journey to retrieve a magical crystal hidden deep within the heart of an ancient cave. Without hesitation, the princess agreed and bid farewell to Luna, promising to return victorious.\n\nWith newfound determination, Aurora set off on her quest. Along the way, she encountered numerous challenges and obstacles but never lost hope. She often drew strength from the enchanting woodland creatures who accompanied her on this journey, reminding her that she was not alone.\n\nAfter a long and arduous journey, Aurora reached the entrance of the ancient cave. Inside, she faced a series of tests that pushed her physical and emotional limits. With sheer determination and unwavering courage, she overcame each trial, paving her way to the crystal's resting place.\n\nAs Aurora held the crystal in her hands, its warmth spread through her body. The artifact contained unimaginable power that could shatter the enchantress's curse and restore light to her kingdom. Brimming with joy and newfound strength, she made her way back to Luna's cottage.\n\nUpon her return, Aurora and Luna performed a powerful ritual, using the crystal's magic to break the curse. Waves of light and color spread across the kingdom, banishing darkness and despair. The once-gray skies turned blue, and laughter filled the air once again. The kingdom rejoiced, thanking Princess Aurora for her bravery and selflessness.\n\nFrom that day forward, Aurora was hailed as a hero, not only in her kingdom but also in the mystical realm. She continued to be a beacon of hope and kindness, reminding everyone that true courage lies within, waiting to be awakened.\n\nAnd so, Princess Aurora's tale lived on as a timeless reminder that even in the darkest of times, there is always light and hope to be found.", role='assistant', function_call=None, tool_calls=None))], created=1701011002, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=657, prompt_tokens=12, total_tokens=669))

How to maximize throughput of batch processing given rate limits

If you're processing real-time requests from users, backoff and retry is a great strategy to minimize latency while avoiding rate limit errors.

However, if you're processing large volumes of batch data, where throughput matters more than latency, there are a few other things you can do in addition to backoff and retry.

Proactively adding delay between requests

If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again, then backing off again, it's possible that a good fraction of your request budget will be 'wasted' on requests that need to be retried. This limits your processing throughput, given a fixed rate limit.

Here, one potential solution is to calculate your rate limit and add a delay equal to its reciprocal (e.g., if your rate limit 20 requests per minute, add a delay of 3–6 seconds to each request). This can help you operate near the rate limit ceiling without hitting it and incurring wasted requests.

Example of adding delay to a request

# imports
import time

# Define a function that adds a delay to a Completion API call
def delayed_completion(delay_in_seconds: float = 1, **kwargs):
    """Delay a completion by a specified amount of time."""

    # Sleep for the delay
    time.sleep(delay_in_seconds)

    # Call the Completion API and return the result
    return client.chat.completions.create(**kwargs)


# Calculate the delay based on your rate limit
rate_limit_per_minute = 20
delay = 60.0 / rate_limit_per_minute

delayed_completion(
    delay_in_seconds=delay,
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Once upon a time,"}]
)
ChatCompletion(id='chatcmpl-8PAyCR1axKsomV0e349XiCN1Z81pH', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content="in a small village, there lived a young girl named Maya. Maya was known for her kindness and love for nature. She spent hours exploring the forests surrounding the village, admiring the vibrant flowers and talking to the animals.\n\nOne sunny day, as Maya was picking wildflowers, she stumbled upon a wounded blackbird with a broken wing. Feeling sorry for the bird, Maya gently picked it up and cradled it in her hands. She knew she had to help the bird, so she hurried back to her cottage.\n\nMaya set up a cozy nest for the blackbird and carefully splinted its wing. She fed it worms and berries, doing everything she could to nurse it back to health. Each day, she would sing lullabies and tell stories to keep the blackbird company. Slowly, the bird's wing healed, and before long, it was ready to fly again.\n\nOn a beautiful morning, Maya opened the window of her cottage and released the blackbird into the sky. As the bird soared into the air, Maya's heart filled with joy and gratitude. Little did she know, this act of kindness would change her life forever.\n\nThe following night, a mysterious glowing light illuminated Maya's room. Startled, she sat up and saw a magical creature standing before her. It was a fairy, tiny yet radiating warmth and light.\n\nThe fairy introduced herself as Luna, the Guardian of the Forest. She had witnessed Maya's kindness towards the blackbird and had been watching her ever since. Luna explained that she had come to reward Maya for her selflessness.\n\nWith a wave of her wand, Luna granted Maya the ability to communicate with animals. Maya's eyes widened with amazement as she realized she could now understand the language of nature. Birds chirped melodies, rabbits whispered secrets, and trees shared their ancient wisdom.\n\nOver time, Maya's ability made her beloved by both humans and animals. Farmers sought her advice on how to care for their crops, and children flocked to her for stories of her enchanting encounters with the forest creatures. Maya used her gift to teach others about the importance of living in harmony with nature.\n\nAs years passed, Maya became known as the Village Guardian. She dedicated herself to protecting the surrounding forests from harm and educating others on sustainable living. The village flourished under Maya's guidance, and animals and humans lived side by side peacefully.\n\nAnd so, Maya's story became a legend passed down through generations. Her kindness, love for nature, and her ability to communicate with animals inspired people to treat the world around them with compassion and care.", role='assistant', function_call=None, tool_calls=None))], created=1701011060, model='gpt-3.5-turbo-0613', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=524, prompt_tokens=12, total_tokens=536))

Batching requests

The OpenAI API has separate limits for requests per minute and tokens per minute.

If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with the smaller models.

Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a list of strings to prompt parameter instead of a single string.

Warning: the response object may not return completions in the order of the prompts, so always remember to match responses back to prompts using the index field.

Example without batching

num_stories = 10
content = "Once upon a time,"

# serial example, with one story completion per request
for _ in range(num_stories):
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": content}],
        max_tokens=20,
    )

    # print story
    print(content + response.choices[0].message.content)
Once upon a time,in a small village nestled between rolling green hills, there lived a young girl named Lily. She had
Once upon a time,in a small village nestled in the heart of a lush forest, lived a young girl named Evelyn.
Once upon a time,in a faraway kingdom, there lived a young princess named Aurora. She was known for her kind
Once upon a time,in a faraway kingdom called Enchantia, there lived a young girl named Ella. Ella was
Once upon a time,in a small village nestled among the rolling hills, lived a young woman named Lucy. Lucy was known
Once upon a time,in a small village nestled between rolling hills, there lived a young girl named Ava. Ava was a
Once upon a time,in a faraway kingdom, there lived a wise and just king named Arthur. King Arthur ruled over
Once upon a time,in a small village nestled among towering mountains, lived a young girl named Lily. She was known for
Once upon a time,in a small village nestled in the heart of a lush forest, there lived a young girl named Lily
Once upon a time,in a far-off kingdom, there lived a kind and beloved queen named Isabella. She ruled with
num_stories = 10
prompts = ["Once upon a time,"] * num_stories

# batched example, with 10 stories completions per request
response = client.chat.completions.create(
    model="curie",
    prompt=prompts,
    max_tokens=20,
)

# match completions to prompts by index
stories = [""] * len(prompts)
for choice in response.choices:
    stories[choice.index] = prompts[choice.index] + choice.text

# print stories
for story in stories:
    print(story)
Once upon a time, I lived in hope. I convinced myself I knew best, because, naive as it might sound,
Once upon a time, Thierry Henry was invited to have a type of frosty exchange with English fans, in which
Once upon a time, and a long time ago as well, PV was passively cooled because coils cooled by use of metal driving
Once upon a time, there was a land called Texas. It was about the size of Wisconsin. It contained, however,
Once upon a time, there was an old carpenter who had three sons. The locksmith never learned to read or write
Once upon a time, there was a small farming town called Moonridge Village, far West across the great vast plains that lay
Once upon a time, California’s shorelines, lakes, and valleys were host to expanses of untamed wilderness
Once upon a time, she said. It started with a simple question: Why don’t we know any stories?
Once upon a time, when I was a young woman, there was a movie named Wuthering Heights. Stand by alleges
Once upon a time, a very long time I mean, in the year 1713, died a beautiful Duchess called the young

Example parallel processing script

We've written an example script for parallel processing large quantities of API requests: api_request_parallel_processor.py.

The script combines some handy features:

  • Streams requests from file, to avoid running out of memory for giant jobs
  • Makes requests concurrently, to maximize throughput
  • Throttles both request and token usage, to stay under rate limits
  • Retries failed requests, to avoid missing data
  • Logs errors, to diagnose problems with requests

Feel free to use it as is or modify it to suit your needs.