OpenAI API Updates April 2026: GPT-5.4 Mini/Nano, Server-Side Compaction, and Sora Batch API
OpenAI API Updates April 2026: GPT-5.4 Mini/Nano, Server-Side Compaction, and Sora Batch API
April 2026 is one of the busiest months for the OpenAI developer platform. Two new models, a fundamentally new way to manage context in the Responses API, video generation at scale, and a countdown clock on the Assistants API all landed within weeks of each other. This post covers every API-facing change, with pricing tables, code examples, and migration guidance.
Summary of April 2026 Changes
| Category | Update | Developer Impact | |---|---|---| | Models | GPT-5.4 mini and GPT-5.4 nano released | High - cheaper alternatives for production workloads | | Models | gpt-realtime-1.5 added to Realtime API | Medium - new option for voice applications | | Responses API | Server-side compaction launched | High - automatic context management for long conversations | | Responses API | Skills support added | Medium - local and hosted container execution | | Sora API | Batch API support for video generation | Medium - async video rendering at scale | | Sora API | Character references, 20s clips, 1080p | Medium - richer video generation options | | Images | Batch API for GPT Image models | Medium - async image generation at scale | | Realtime API | DTMF key press support | Low - telephony integration improvement | | Codex | Pay-as-you-go Codex-only seats | Medium - flexible pricing for code-focused teams | | Deprecations | Assistants API sunset confirmed for August 26, 2026 | High - migration to Responses API required | | Deprecations | Older Codex models removed from model picker | Low - affects gpt-5.2-codex and earlier |
GPT-5.4 Mini and Nano: The Headline Model Launches
GPT-5.4 mini and nano bring GPT-5.4-class intelligence to lower price points. Mini targets production workloads that need strong reasoning at reduced cost. Nano is built for high-volume, latency-sensitive tasks where speed matters more than depth.
Pricing Comparison
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Tool Search | Computer Use | |---|---|---|---|---|---| | GPT-5.4 | ~$7.50 | ~$30.00 | 1M | Yes | Yes | | GPT-5.4 mini | $0.75 | $4.50 | 1M | Yes | Yes | | GPT-5.4 nano | $0.20 | $1.25 | 400K | No | No |
GPT-5.4 mini costs roughly 90% less than the full GPT-5.4 on input tokens and 85% less on output tokens. Nano drops the price further but trades away tool search and computer use capabilities.
Benchmark Highlights
| Benchmark | GPT-5.4 | GPT-5.4 mini | GPT-5.4 nano | |---|---|---|---| | SWE-Bench Pro | ~58% | 54.4% | N/A | | OSWorld-Verified (desktop navigation) | 75.0% | 72.1% | N/A | | Intelligence Index | ~55 | ~50 | 44.4 |
GPT-5.4 mini nearly matches the flagship on coding benchmarks while being 70% cheaper. The OSWorld gap of less than 3 percentage points makes mini a strong candidate for agentic workflows that involve computer use.
Both Models Support Compaction
Both mini and nano support server-side compaction (covered below), making them suitable for long-running agent conversations where context management is critical.
Quick Start
from openai import OpenAI
client = OpenAI()
# GPT-5.4 mini
response = client.responses.create(
model="gpt-5.4-mini",
input="Explain the tradeoffs between SSR and SSG in Next.js"
)
print(response.output_text)
# GPT-5.4 nano for high-volume classification
response = client.responses.create(
model="gpt-5.4-nano",
input="Classify this support ticket as billing, technical, or general: 'I can't log in'"
)
print(response.output_text)
Server-Side Compaction in the Responses API
Server-side compaction is the most significant infrastructure change this month. Instead of manually calling /responses/compact to shrink context, you can now configure automatic compaction directly in your responses.create call.
How It Works
Set a compact_threshold in the context_management parameter. When the rendered token count crosses that threshold during streaming, the server automatically runs a compaction pass, emits a compaction output item in the same stream, and prunes context before continuing inference.
response = client.responses.create(
model="gpt-5.4-mini",
input=[
{"role": "user", "content": "Summarize the project status"}
],
previous_response_id=prev_id,
context_management={
"compact_threshold": 80000 # trigger compaction at 80K tokens
},
stream=True
)
Context Management Flow
After receiving a compaction item, you can drop all input items that came before it on your next request. The compaction item carries enough context to continue the conversation without the original messages.
When to Use Server-Side Compaction
Server-side compaction works best for long-running agent loops and multi-turn conversations where context grows over time. For short, single-turn requests, it adds no value and should be left disabled.
Skills in the Responses API
April also brought Skills support to the Responses API. Skills allow you to define reusable tool configurations that execute either locally or in hosted containers. This is a step toward making the Responses API the single surface for all agentic workflows, replacing the Assistants API pattern of predefined tool sets.
Sora API: Batch Video Generation and Character References
The Sora API expanded significantly, though developers should note that Sora 2 models and the Videos API are deprecated with a shutdown date of September 24, 2026.
New Video API Capabilities
| Feature | Details |
|---|---|
| Character references | Upload a character once, reuse across videos with consistent appearance |
| Longer clips | Up to 20 seconds per generation (previously 10s) |
| 1080p output | Available on sora-2-pro |
| Video extensions | Continue existing scenes |
| Aspect ratios | 16:9 and 9:16 exports |
| Batch API | Async video generation via POST /v1/videos through the Batch API |
Batch Video Generation Example
# Submit a batch video job
import json
# Create a JSONL file with video requests
requests = [
{
"custom_id": "video-1",
"method": "POST",
"url": "/v1/videos",
"body": {
"model": "sora-2",
"prompt": "A drone shot over a mountain lake at sunrise",
"duration": 15,
"resolution": "1080p",
"aspect_ratio": "16:9"
}
}
]
# Upload and submit batch
batch_file = client.files.create(
file=open("video_requests.jsonl", "rb"),
purpose="batch"
)
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/videos",
completion_window="24h"
)
Deprecation Notice
The Sora 2 video generation models and the Videos API will shut down on September 24, 2026. If you are building new video workflows, plan your timeline around this date.
GPT Image Batch API Support
Batch API support was extended to all GPT Image models, enabling async image generation for large-scale production workflows.
Supported models:
gpt-image-1.5chatgpt-image-latestgpt-image-1gpt-image-1-mini
This is particularly useful for e-commerce, marketing, and content pipelines that need to generate hundreds or thousands of images without managing rate limits manually.
Realtime API: DTMF and gpt-realtime-1.5
Two updates hit the Realtime API in April:
DTMF key press support: The Realtime API now emits DTMF events when using a sideband connection. This is essential for telephony integrations where callers interact through their phone keypad (press 1 for sales, press 2 for support, etc.).
gpt-realtime-1.5: A new model option for the Realtime API, offering improved voice interaction quality.
# Handling DTMF events in a Realtime session
async for event in realtime_session:
if event.type == "input_audio.dtmf":
digit = event.digit # "1", "2", "#", "*", etc.
if digit == "1":
await route_to_sales(realtime_session)
elif digit == "2":
await route_to_support(realtime_session)
Codex: Pay-As-You-Go Seats
OpenAI expanded ChatGPT Business and Enterprise plans with Codex-only seats. These seats use pay-as-you-go pricing with no rate limits, giving teams access to Codex without a fixed per-seat fee. This lowers the barrier for teams that want Codex for code review, generation, and debugging without committing to full ChatGPT Enterprise seats.
Model Picker Cleanup
Starting April 7, several older Codex models no longer appear in the model picker:
- gpt-5.2-codex
- gpt-5.1-codex-mini
- gpt-5.1-codex-max
- gpt-5.1-codex
- gpt-5.1
- gpt-5
These models will be fully removed from Codex for ChatGPT sign-in on April 14. If your workflows reference these model IDs, update them to current models.
Assistants API Sunset: August 26, 2026
The biggest long-term change this month is the confirmation that the Assistants API will sunset on August 26, 2026. OpenAI first announced this deprecation in August 2025, giving developers a full year to migrate.
Migration Path
The recommended path is to move from Assistants API (/v1/assistants, /v1/threads) to the Responses API. Key advantages of the migration:
| Aspect | Assistants API | Responses API | |---|---|---| | Execution model | Async Runs (poll for completion) | Synchronous or streamed responses | | Context management | Server-managed threads | Client-managed with compaction | | Cache utilization | Standard | 40% to 80% improvement in internal tests | | Tool execution | Predefined tool sets | Flexible Skills + inline tools | | Cost | Higher due to thread overhead | Lower with better caching |
OpenAI provides a migration guide at platform.openai.com/docs/guides/migrate-to-responses.
Migration Checklist
- Audit current usage: Identify all endpoints hitting
/v1/assistantsand/v1/threads - Map tools: Convert Assistant tool definitions to Responses API tool parameters or Skills
- Replace threads: Move from server-managed threads to
previous_response_idchaining with compaction - Update polling logic: Replace Run polling with streaming or synchronous responses
- Test cache performance: Measure token savings from improved cache utilization
- Set a deadline: Complete migration well before August 26, 2026
What This Means for Developers Using Fazm
If you use Fazm to monitor your development workflow, these API changes surface in a few ways. Teams using OpenAI models through their IDE or CI/CD pipelines will see changes in model availability as older Codex models are removed. The GPT-5.4 mini and nano models offer new price/performance options for AI-assisted code review and generation tasks that Fazm can help you track and optimize.
Timeline of April 2026 OpenAI API Changes
| Date | Change | |---|---| | March 18 | GPT-5.4 mini and nano released | | April 1 | Server-side compaction available in Responses API | | April 1 | Skills support launched in Responses API | | April 7 | Older Codex models removed from model picker | | April 7 | Sora API expanded with character references, 20s clips, Batch API | | April 7 | GPT Image Batch API support added | | April 7 | gpt-realtime-1.5 released | | April 14 | Older Codex models removed from Codex for ChatGPT sign-in | | August 26 | Assistants API sunset (confirmed) | | September 24 | Sora 2 / Videos API shutdown |
Key Takeaways
The April 2026 updates push developers toward two clear directions: the Responses API as the single unified surface for all OpenAI interactions, and smaller, cheaper models for production workloads. GPT-5.4 mini at $0.75/$4.50 per million tokens with near-flagship performance makes it the default choice for most production use cases. Server-side compaction removes the complexity of manual context management. And the Assistants API sunset clock is now under five months away, making migration planning urgent for any team still on threads and runs.