Eval Driven System Design - From Prototype to Production
CompletionsEvalsFunctionsResponses
Jun 2, 2025Selecting a Model Based on Stripe Conversion – A Practical Eval for Startups
Evals
Jun 2, 2025Reinforcement Fine-Tuning for Conversational Reasoning with the OpenAI API
EvalsFine-tuning
May 21, 2025Evals API Use-case - Responses Evaluation
EvalsResponses
May 13, 2025Evals API Use-case - Detecting prompt regressions
CompletionsEvals
Apr 8, 2025Evals API Use-case - Bulk model and prompt experimentation
CompletionsEvals
Apr 8, 2025Evals API Use-case - Monitoring stored completions
CompletionsEvals
Apr 8, 2025Evaluating Agents with Langfuse
Agents SDKEvals
Mar 31, 2025Custom LLM as a Judge to Detect Hallucinations with Braintrust
CompletionsEvals
Oct 14, 2024