This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4o doesn't take videos as input directly, but we can use vision and the 128K context window to describe the static frames of a whole video at once. We'll walk through two examples:
- Using GPT-4o to get a description of a video
- Generating a voiceover for a video with GPT-o and the TTS API