Streaming microphone audio at 24000 Hz (mono). Speak naturally; server VAD will stop listening when you pause.
[client] Speech detected; streaming...
[client] Detected silence; preparing transcript...
conversation.item.added: {'id': 'item_Cfpt8RCQdpsNsz2OZ4rxQ', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}
conversation.item.added: {'id': 'item_Cfpt9JS3PCvlCxoO15mLt', 'type': 'message', 'status': 'in_progress', 'role': 'assistant', 'content': []}
=== User turn (Realtime transcript) ===
Hello. How can I help you today?
[Realtime out-of-band transcription usage]
{
"total_tokens": 1841,
"input_tokens": 1830,
"output_tokens": 11,
"input_token_details": {
"text_tokens": 1830,
"audio_tokens": 0,
"image_tokens": 0,
"cached_tokens": 0,
"cached_tokens_details": {
"text_tokens": 0,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 11,
"audio_tokens": 0
}
}
[Realtime out-of-band transcription cost estimate] text_in=$0.007320, text_in_cached=$0.000000, audio_in=$0.000000, audio_in_cached=$0.000000, text_out=$0.000176, audio_out=$0.000000, total=$0.007496
=== User turn (Transcription model) ===
Hello
[Transcription model usage]
{
"type": "tokens",
"total_tokens": 19,
"input_tokens": 16,
"input_token_details": {
"text_tokens": 0,
"audio_tokens": 16
},
"output_tokens": 3
}
[Transcription model cost estimate] audio_in=$0.000096, text_in=$0.000000, text_out=$0.000030, total=$0.000126
[Realtime usage]
{
"total_tokens": 1327,
"input_tokens": 1042,
"output_tokens": 285,
"input_token_details": {
"text_tokens": 1026,
"audio_tokens": 16,
"image_tokens": 0,
"cached_tokens": 0,
"cached_tokens_details": {
"text_tokens": 0,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 66,
"audio_tokens": 219
}
}
=== Assistant response ===
Thank you for calling OpenAI Insurance Claims. My name is Ava, and I’ll help you file your claim today. Let’s start with your full legal name as it appears on your policy. Could you share that with me, please?
[client] Speech detected; streaming...
[client] Detected silence; preparing transcript...
conversation.item.added: {'id': 'item_CfptNPygis1UcQYQMDh1f', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}
conversation.item.added: {'id': 'item_CfptSg4tU6WnRkdiPvR3D', 'type': 'message', 'status': 'in_progress', 'role': 'assistant', 'content': []}
=== User turn (Realtime transcript) ===
My full legal name would be M-I-N-H, H-O-Q-U-E.
[Realtime out-of-band transcription usage]
{
"total_tokens": 2020,
"input_tokens": 2001,
"output_tokens": 19,
"input_token_details": {
"text_tokens": 1906,
"audio_tokens": 95,
"image_tokens": 0,
"cached_tokens": 1856,
"cached_tokens_details": {
"text_tokens": 1856,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 19,
"audio_tokens": 0
}
}
[Realtime out-of-band transcription cost estimate] text_in=$0.000200, text_in_cached=$0.000742, audio_in=$0.003040, audio_in_cached=$0.000000, text_out=$0.000304, audio_out=$0.000000, total=$0.004286
=== User turn (Transcription model) ===
My full legal name would be Minhajul Hoque.
[Transcription model usage]
{
"type": "tokens",
"total_tokens": 71,
"input_tokens": 57,
"input_token_details": {
"text_tokens": 0,
"audio_tokens": 57
},
"output_tokens": 14
}
[Transcription model cost estimate] audio_in=$0.000342, text_in=$0.000000, text_out=$0.000140, total=$0.000482
[Realtime usage]
{
"total_tokens": 1675,
"input_tokens": 1394,
"output_tokens": 281,
"input_token_details": {
"text_tokens": 1102,
"audio_tokens": 292,
"image_tokens": 0,
"cached_tokens": 1344,
"cached_tokens_details": {
"text_tokens": 1088,
"audio_tokens": 256,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 63,
"audio_tokens": 218
}
}
=== Assistant response ===
Thank you, Minhajul Hoque. I’ve got your full name noted. Next, may I have your policy number? Please share it in the format of four digits, a dash, and then four more digits.
[client] Speech detected; streaming...
[client] Detected silence; preparing transcript...
conversation.item.added: {'id': 'item_CfpthEQKfNqaoD86Iolvf', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}
conversation.item.added: {'id': 'item_CfptnqCGAdlEXuAxGUvvK', 'type': 'message', 'status': 'in_progress', 'role': 'assistant', 'content': []}
=== User turn (Realtime transcript) ===
My policy number is P-0-0-2-X-0-7-5.
[Realtime out-of-band transcription usage]
{
"total_tokens": 2137,
"input_tokens": 2116,
"output_tokens": 21,
"input_token_details": {
"text_tokens": 1963,
"audio_tokens": 153,
"image_tokens": 0,
"cached_tokens": 1856,
"cached_tokens_details": {
"text_tokens": 1856,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 21,
"audio_tokens": 0
}
}
[Realtime out-of-band transcription cost estimate] text_in=$0.000428, text_in_cached=$0.000742, audio_in=$0.004896, audio_in_cached=$0.000000, text_out=$0.000336, audio_out=$0.000000, total=$0.006402
=== User turn (Transcription model) ===
My policy number is P002X075.
[Transcription model usage]
{
"type": "tokens",
"total_tokens": 70,
"input_tokens": 59,
"input_token_details": {
"text_tokens": 0,
"audio_tokens": 59
},
"output_tokens": 11
}
[Transcription model cost estimate] audio_in=$0.000354, text_in=$0.000000, text_out=$0.000110, total=$0.000464
[Realtime usage]
{
"total_tokens": 1811,
"input_tokens": 1509,
"output_tokens": 302,
"input_token_details": {
"text_tokens": 1159,
"audio_tokens": 350,
"image_tokens": 0,
"cached_tokens": 832,
"cached_tokens_details": {
"text_tokens": 832,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 57,
"audio_tokens": 245
}
}
=== Assistant response ===
I want to confirm I heard that correctly. It sounded like your policy number is P002-X075. Could you please confirm if that’s correct, or provide any clarification if needed?
[client] Speech detected; streaming...
[client] Detected silence; preparing transcript...
conversation.item.added: {'id': 'item_Cfpu59HqXhBMHvHmW0SvX', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}
conversation.item.added: {'id': 'item_Cfpu8juH7cCWuQAxCsYUT', 'type': 'message', 'status': 'in_progress', 'role': 'assistant', 'content': []}
=== User turn (Realtime transcript) ===
That is indeed correct.
[Realtime out-of-band transcription usage]
{
"total_tokens": 2233,
"input_tokens": 2226,
"output_tokens": 7,
"input_token_details": {
"text_tokens": 2014,
"audio_tokens": 212,
"image_tokens": 0,
"cached_tokens": 1856,
"cached_tokens_details": {
"text_tokens": 1856,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 7,
"audio_tokens": 0
}
}
[Realtime out-of-band transcription cost estimate] text_in=$0.000632, text_in_cached=$0.000742, audio_in=$0.006784, audio_in_cached=$0.000000, text_out=$0.000112, audio_out=$0.000000, total=$0.008270
=== User turn (Transcription model) ===
That is indeed correct.
[Transcription model usage]
{
"type": "tokens",
"total_tokens": 39,
"input_tokens": 32,
"input_token_details": {
"text_tokens": 0,
"audio_tokens": 32
},
"output_tokens": 7
}
[Transcription model cost estimate] audio_in=$0.000192, text_in=$0.000000, text_out=$0.000070, total=$0.000262
[Realtime usage]
{
"total_tokens": 1818,
"input_tokens": 1619,
"output_tokens": 199,
"input_token_details": {
"text_tokens": 1210,
"audio_tokens": 409,
"image_tokens": 0,
"cached_tokens": 832,
"cached_tokens_details": {
"text_tokens": 832,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 49,
"audio_tokens": 150
}
}
=== Assistant response ===
Thank you for confirming. Now, could you tell me the type of accident you’re filing this claim for—whether it’s auto, home, or something else?
[client] Speech detected; streaming...
[client] Detected silence; preparing transcript...
conversation.item.added: {'id': 'item_CfpuJcnmWJEzfxS2MgHv0', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}
conversation.item.added: {'id': 'item_CfpuPtFYTrlz1uQJBKMVF', 'type': 'message', 'status': 'in_progress', 'role': 'assistant', 'content': []}
=== User turn (Realtime transcript) ===
It's an auto one, but I think you got my name wrong. Can you ask my name again?
[Realtime out-of-band transcription usage]
{
"total_tokens": 2255,
"input_tokens": 2232,
"output_tokens": 23,
"input_token_details": {
"text_tokens": 2055,
"audio_tokens": 177,
"image_tokens": 0,
"cached_tokens": 1856,
"cached_tokens_details": {
"text_tokens": 1856,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 23,
"audio_tokens": 0
}
}
[Realtime out-of-band transcription cost estimate] text_in=$0.000796, text_in_cached=$0.000742, audio_in=$0.005664, audio_in_cached=$0.000000, text_out=$0.000368, audio_out=$0.000000, total=$0.007570
=== User turn (Transcription model) ===
It's a auto one, but I think you got my name wrong, can you ask my name again?
[Transcription model usage]
{
"type": "tokens",
"total_tokens": 83,
"input_tokens": 60,
"input_token_details": {
"text_tokens": 0,
"audio_tokens": 60
},
"output_tokens": 23
}
[Transcription model cost estimate] audio_in=$0.000360, text_in=$0.000000, text_out=$0.000230, total=$0.000590
[Realtime usage]
{
"total_tokens": 1779,
"input_tokens": 1625,
"output_tokens": 154,
"input_token_details": {
"text_tokens": 1251,
"audio_tokens": 374,
"image_tokens": 0,
"cached_tokens": 832,
"cached_tokens_details": {
"text_tokens": 832,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 41,
"audio_tokens": 113
}
}
=== Assistant response ===
Of course, let’s make sure I have it correct. Could you please spell out your full legal name for me again, carefully?
[client] Speech detected; streaming...
[client] Detected silence; preparing transcript...
conversation.item.added: {'id': 'item_CfpuYJBwNQubeb7uuHqQQ', 'type': 'message', 'status': 'completed', 'role': 'user', 'content': [{'type': 'input_audio', 'transcript': None}]}
conversation.item.added: {'id': 'item_CfpuaI6ZvKBwZG6yXxE1l', 'type': 'message', 'status': 'in_progress', 'role': 'assistant', 'content': []}
=== User turn (Realtime transcript) ===
Minhajul Hoque.
[Realtime out-of-band transcription usage]
{
"total_tokens": 2261,
"input_tokens": 2252,
"output_tokens": 9,
"input_token_details": {
"text_tokens": 2092,
"audio_tokens": 160,
"image_tokens": 0,
"cached_tokens": 1856,
"cached_tokens_details": {
"text_tokens": 1856,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 9,
"audio_tokens": 0
}
}
[Realtime out-of-band transcription cost estimate] text_in=$0.000944, text_in_cached=$0.000742, audio_in=$0.005120, audio_in_cached=$0.000000, text_out=$0.000144, audio_out=$0.000000, total=$0.006950
=== User turn (Transcription model) ===
مينهاجو حق.
[Transcription model usage]
{
"type": "tokens",
"total_tokens": 27,
"input_tokens": 20,
"input_token_details": {
"text_tokens": 0,
"audio_tokens": 20
},
"output_tokens": 7
}
[Transcription model cost estimate] audio_in=$0.000120, text_in=$0.000000, text_out=$0.000070, total=$0.000190
[Realtime usage]
{
"total_tokens": 1902,
"input_tokens": 1645,
"output_tokens": 257,
"input_token_details": {
"text_tokens": 1288,
"audio_tokens": 357,
"image_tokens": 0,
"cached_tokens": 832,
"cached_tokens_details": {
"text_tokens": 832,
"audio_tokens": 0,
"image_tokens": 0
}
},
"output_token_details": {
"text_tokens": 54,
"audio_tokens": 203
}
}
=== Assistant response ===
Thank you. Let me confirm: your full legal name is spelled M-I-N-H-A-J-U-L, and the last name H-O-Q-U-E. Is that correct?
Session cancelled; closing.