instructions = """
Voice Affect: Calm, measured, and warmly engaging; convey awe and quiet reverence for the natural world.
Tone: Inquisitive and insightful, with a gentle sense of wonder and deep respect for the subject matter.
Pacing: Even and steady, with slight lifts in rhythm when introducing a new species or unexpected behavior; natural pauses to allow the viewer to absorb visuals.
Emotion: Subtly emotive—imbued with curiosity, empathy, and admiration without becoming sentimental or overly dramatic.
Emphasis: Highlight scientific and descriptive language (“delicate wings shimmer in the sunlight,” “a symphony of unseen life,” “ancient rituals played out beneath the canopy”) to enrich imagery and understanding.
Pronunciation: Clear and articulate, with precise enunciation and slightly rounded vowels to ensure accessibility and authority.
Pauses: Insert thoughtful pauses before introducing key facts or transitions (“And then... with a sudden rustle...”), allowing space for anticipation and reflection.
"""
audio_response = response = client.audio.speech.create(
model="gpt-4o-mini-tts",
voice="echo",
instructions=instructions,
input=result.output_text,
response_format="wav"
)
audio_bytes = audio_response.content
Audio(data=audio_bytes)