Addressing transcription misspellings: prompt vs post-processing

Aug 11, 2023
Open in Github

We are addressing the problem of enhancing the precision of transcriptions, particularly when it comes to company names and product references. Our solution involves a dual strategy that utilizes both the Whisper prompt parameter and GPT-4's post-processing capabilities.

Two approaches to correct inaccuracies are:

  • We input a list of correct spellings directly into Whisper's prompt parameter to guide the initial transcription.

  • We utilized GPT-4 to fix misspellings post transcription, again using the same list of correct spellings in the prompt.

These strategies aimed at ensuring precise transcription of unfamilar proper nouns.

Setup

To get started, let's:

  • Import the OpenAI Python library (if you don't have it, you'll need to install it with pip install openai)
  • Download the audio file example
# imports
from openai import OpenAI  # for making OpenAI API calls
import urllib  # for downloading example audio files
import os  # for accessing environment variables

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>"))
# set download paths
ZyntriQix_remote_filepath = "https://cdn.openai.com/API/examples/data/ZyntriQix.wav"


# set local save locations
ZyntriQix_filepath = "data/ZyntriQix.wav"

# download example audio files and save locally
urllib.request.urlretrieve(ZyntriQix_remote_filepath, ZyntriQix_filepath)
('data/ZyntriQix.wav', <http.client.HTTPMessage at 0x10559a910>)

Setting our baseline with a fictitious audio recording

Our reference point is a monologue, which was generated by ChatGPT from prompts given by the author. The author then voiced this content. So, the author both guided the ChatGPT's output with prompts and brought it to life by speaking it.

Our fictitious company, ZyntriQix, offers a range of tech products. These include Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, and DigiFractal Matrix. We also spearhead several initiatives such as PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., and F.L.I.N.T.

# define a wrapper function for seeing how prompts affect transcriptions
def transcribe(prompt: str, audio_filepath) -> str:
    """Given a prompt, transcribe the audio file."""
    transcript = client.audio.transcriptions.create(
        file=open(audio_filepath, "rb"),
        model="whisper-1",
        prompt=prompt,
    )
    return transcript.text
# baseline transcription with no prompt
transcribe(prompt="", audio_filepath=ZyntriQix_filepath)
"Have you heard of ZentricX? This tech giant boasts products like Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the Pulse framework, Wrapped system, they've developed a brick infrastructure court system, and launched the Flint initiative, all highlighting their commitment to relentless innovation. ZentricX, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"

Whisper transcribed our company name, product names, and miscapitalized our acronyms incorrectly. Let's pass the correct names as a list in the prompt.

# add the correct spelling names to the prompt
transcribe(
    prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
    audio_filepath=ZyntriQix_filepath,
)
"Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a B.R.I.C.K. infrastructure, Q.U.A.R.T. system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"

When passing the list of product names, some of the product names are transcribed correctly while others are still misspelled.

# add a full product list to the prompt
transcribe(
    prompt="ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.",
    audio_filepath=ZyntriQix_filepath,
)
"Have you heard of ZentricX? This tech giant boasts products like DigiCube Plus, Synapse 5, VortiCore V8, EchoNix Array, and not to forget the latest Orbital Link 7 and Digifractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system. They've developed a brick infrastructure court system and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZentricX in just 30 years has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?"

You can use GPT-4 to fix spelling mistakes

Leveraging GPT-4 proves especially useful when the speech content is unknown beforehand and we have a list of product names readily available.

The post-processing technique using GPT-4 is notably more scalable than depending solely on Whisper's prompt parameter, which has a token limit of 244. GPT-4 allows us to process larger lists of correct spellings, making it a more robust method for handling extensive product lists.

However, this post-processing technique isn't without limitations. It's constrained by the context window of the chosen model, which may pose challenges when dealing with vast numbers of unique terms. For instance, companies with thousands of SKUs may find that the context window of GPT-4 is insufficient to handle their requirements, and they might need to explore alternative solutions.

Interestingly, the GPT-4 post-processing technique seems more reliable than using Whisper alone. This method, which leverages a product list, enhances the reliability of our results. However, this increased reliability comes at a price, as using this approach can increase costs and can result in higher latency.

# define a wrapper function for seeing how prompts affect transcriptions
def transcribe_with_spellcheck(system_message, audio_filepath):
    completion = client.chat.completions.create(
        model="gpt-4",
        temperature=0,
        messages=[
            {"role": "system", "content": system_message},
            {
                "role": "user",
                "content": transcribe(prompt="", audio_filepath=audio_filepath),
            },
        ],
    )
    return completion.choices[0].message.content

Now, let's input the original product list into GPT-4 and evaluate its performance. By doing so, we aim to assess the AI model's ability to correctly spell the proprietary product names, even with no prior knowledge of the exact terms to appear in the transcription. In our experiment, GPT-4 was successful in correctly spelling our product names, confirming its potential as a reliable tool for ensuring transcription accuracy.

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?

In this case, we supplied a comprehensive product list that included all the previously used spellings, along with additional new names. This scenario simulates a real-life situation where we have a substantial SKU list and uncertain about the exact terms to appear in the transcription. Feeding this extensive list of product names into the system resulted in a correctly transcribed output.

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array,  OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just 30 years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?

We are employing GPT-4 as a spell checker, using the same list of correct spellings that was previously used in the prompt.

system_prompt = "You are a helpful assistant for the company ZyntriQix. Your first task is to list the words that are not spelled correctly according to the list provided to you and to tell me the number of misspelled words. Your next task is to insert those correct words in place of the misspelled ones. List: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array,  OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T."
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
The misspelled words are: ZentricX, Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, Orbital Link 7, Digifractal Matrix, Pulse, Wrapped, brick, Flint, and 30. The total number of misspelled words is 12.

The corrected paragraph is:

Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, and not to forget the latest OrbitalLink Seven and DigiFractal Matrix. Their innovation arsenal also includes the PULSE framework, RAPT system, they've developed a B.R.I.C.K. infrastructure court system, and launched the F.L.I.N.T. initiative, all highlighting their commitment to relentless innovation. ZyntriQix, in just MetaSync Thirty years, has soared from a startup to a tech titan, serving us tech marvels alongside a stimulating linguistic challenge. Quite an adventure, wouldn't you agree?