OpenAI API Updates April 2026: GPT-5.4 Mini/Nano, Server-Side Compaction, and Sora Batch API

Matthew Diakonov·April 13, 2026·11 min read

openai api april-2026 gpt-5.4 responses-api sora codex realtime-api

OpenAI API Updates April 2026: GPT-5.4 Mini/Nano, Server-Side Compaction, and Sora Batch API

April 2026 is one of the busiest months for the OpenAI developer platform. Two new models, a fundamentally new way to manage context in the Responses API, video generation at scale, and a countdown clock on the Assistants API all landed within weeks of each other. This post covers every API-facing change, with pricing tables, code examples, and migration guidance.

Summary of April 2026 Changes

| Category | Update | Developer Impact | |---|---|---| | Models | GPT-5.4 mini and GPT-5.4 nano released | High - cheaper alternatives for production workloads | | Models | gpt-realtime-1.5 added to Realtime API | Medium - new option for voice applications | | Responses API | Server-side compaction launched | High - automatic context management for long conversations | | Responses API | Skills support added | Medium - local and hosted container execution | | Sora API | Batch API support for video generation | Medium - async video rendering at scale | | Sora API | Character references, 20s clips, 1080p | Medium - richer video generation options | | Images | Batch API for GPT Image models | Medium - async image generation at scale | | Realtime API | DTMF key press support | Low - telephony integration improvement | | Codex | Pay-as-you-go Codex-only seats | Medium - flexible pricing for code-focused teams | | Deprecations | Assistants API sunset confirmed for August 26, 2026 | High - migration to Responses API required | | Deprecations | Older Codex models removed from model picker | Low - affects gpt-5.2-codex and earlier |

GPT-5.4 Mini and Nano: The Headline Model Launches

GPT-5.4 mini and nano bring GPT-5.4-class intelligence to lower price points. Mini targets production workloads that need strong reasoning at reduced cost. Nano is built for high-volume, latency-sensitive tasks where speed matters more than depth.

Pricing Comparison

| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Tool Search | Computer Use | |---|---|---|---|---|---| | GPT-5.4 | ~$7.50 | ~$30.00 | 1M | Yes | Yes | | GPT-5.4 mini | $0.75 | $4.50 | 1M | Yes | Yes | | GPT-5.4 nano | $0.20 | $1.25 | 400K | No | No |

GPT-5.4 mini costs roughly 90% less than the full GPT-5.4 on input tokens and 85% less on output tokens. Nano drops the price further but trades away tool search and computer use capabilities.

Benchmark Highlights

| Benchmark | GPT-5.4 | GPT-5.4 mini | GPT-5.4 nano | |---|---|---|---| | SWE-Bench Pro | ~58% | 54.4% | N/A | | OSWorld-Verified (desktop navigation) | 75.0% | 72.1% | N/A | | Intelligence Index | ~55 | ~50 | 44.4 |

GPT-5.4 mini nearly matches the flagship on coding benchmarks while being 70% cheaper. The OSWorld gap of less than 3 percentage points makes mini a strong candidate for agentic workflows that involve computer use.

Both Models Support Compaction

Both mini and nano support server-side compaction (covered below), making them suitable for long-running agent conversations where context management is critical.

Quick Start

from openai import OpenAI

client = OpenAI()

# GPT-5.4 mini
response = client.responses.create(
    model="gpt-5.4-mini",
    input="Explain the tradeoffs between SSR and SSG in Next.js"
)
print(response.output_text)

# GPT-5.4 nano for high-volume classification
response = client.responses.create(
    model="gpt-5.4-nano",
    input="Classify this support ticket as billing, technical, or general: 'I can't log in'"
)
print(response.output_text)

Server-Side Compaction in the Responses API

Server-side compaction is the most significant infrastructure change this month. Instead of manually calling /responses/compact to shrink context, you can now configure automatic compaction directly in your responses.create call.

How It Works

Set a compact_threshold in the context_management parameter. When the rendered token count crosses that threshold during streaming, the server automatically runs a compaction pass, emits a compaction output item in the same stream, and prunes context before continuing inference.

response = client.responses.create(
    model="gpt-5.4-mini",
    input=[
        {"role": "user", "content": "Summarize the project status"}
    ],
    previous_response_id=prev_id,
    context_management={
        "compact_threshold": 80000  # trigger compaction at 80K tokens
    },
    stream=True
)

Context Management Flow

After receiving a compaction item, you can drop all input items that came before it on your next request. The compaction item carries enough context to continue the conversation without the original messages.

When to Use Server-Side Compaction

Server-side compaction works best for long-running agent loops and multi-turn conversations where context grows over time. For short, single-turn requests, it adds no value and should be left disabled.

Skills in the Responses API

April also brought Skills support to the Responses API. Skills allow you to define reusable tool configurations that execute either locally or in hosted containers. This is a step toward making the Responses API the single surface for all agentic workflows, replacing the Assistants API pattern of predefined tool sets.

Sora API: Batch Video Generation and Character References

The Sora API expanded significantly, though developers should note that Sora 2 models and the Videos API are deprecated with a shutdown date of September 24, 2026.

New Video API Capabilities

| Feature | Details | |---|---| | Character references | Upload a character once, reuse across videos with consistent appearance | | Longer clips | Up to 20 seconds per generation (previously 10s) | | 1080p output | Available on sora-2-pro | | Video extensions | Continue existing scenes | | Aspect ratios | 16:9 and 9:16 exports | | Batch API | Async video generation via POST /v1/videos through the Batch API |

Batch Video Generation Example

# Submit a batch video job
import json

# Create a JSONL file with video requests
requests = [
    {
        "custom_id": "video-1",
        "method": "POST",
        "url": "/v1/videos",
        "body": {
            "model": "sora-2",
            "prompt": "A drone shot over a mountain lake at sunrise",
            "duration": 15,
            "resolution": "1080p",
            "aspect_ratio": "16:9"
        }
    }
]

# Upload and submit batch
batch_file = client.files.create(
    file=open("video_requests.jsonl", "rb"),
    purpose="batch"
)
batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/videos",
    completion_window="24h"
)

Deprecation Notice

The Sora 2 video generation models and the Videos API will shut down on September 24, 2026. If you are building new video workflows, plan your timeline around this date.

GPT Image Batch API Support

Batch API support was extended to all GPT Image models, enabling async image generation for large-scale production workflows.

Supported models:

gpt-image-1.5
chatgpt-image-latest
gpt-image-1
gpt-image-1-mini

This is particularly useful for e-commerce, marketing, and content pipelines that need to generate hundreds or thousands of images without managing rate limits manually.

Realtime API: DTMF and gpt-realtime-1.5

Two updates hit the Realtime API in April:

DTMF key press support: The Realtime API now emits DTMF events when using a sideband connection. This is essential for telephony integrations where callers interact through their phone keypad (press 1 for sales, press 2 for support, etc.).

gpt-realtime-1.5: A new model option for the Realtime API, offering improved voice interaction quality.

# Handling DTMF events in a Realtime session
async for event in realtime_session:
    if event.type == "input_audio.dtmf":
        digit = event.digit  # "1", "2", "#", "*", etc.
        if digit == "1":
            await route_to_sales(realtime_session)
        elif digit == "2":
            await route_to_support(realtime_session)

Codex: Pay-As-You-Go Seats

OpenAI expanded ChatGPT Business and Enterprise plans with Codex-only seats. These seats use pay-as-you-go pricing with no rate limits, giving teams access to Codex without a fixed per-seat fee. This lowers the barrier for teams that want Codex for code review, generation, and debugging without committing to full ChatGPT Enterprise seats.

Model Picker Cleanup

Starting April 7, several older Codex models no longer appear in the model picker:

gpt-5.2-codex
gpt-5.1-codex-mini
gpt-5.1-codex-max
gpt-5.1-codex
gpt-5.1
gpt-5

These models will be fully removed from Codex for ChatGPT sign-in on April 14. If your workflows reference these model IDs, update them to current models.

Assistants API Sunset: August 26, 2026

The biggest long-term change this month is the confirmation that the Assistants API will sunset on August 26, 2026. OpenAI first announced this deprecation in August 2025, giving developers a full year to migrate.

Migration Path

The recommended path is to move from Assistants API (/v1/assistants, /v1/threads) to the Responses API. Key advantages of the migration:

| Aspect | Assistants API | Responses API | |---|---|---| | Execution model | Async Runs (poll for completion) | Synchronous or streamed responses | | Context management | Server-managed threads | Client-managed with compaction | | Cache utilization | Standard | 40% to 80% improvement in internal tests | | Tool execution | Predefined tool sets | Flexible Skills + inline tools | | Cost | Higher due to thread overhead | Lower with better caching |

OpenAI provides a migration guide at platform.openai.com/docs/guides/migrate-to-responses.

Migration Checklist

Audit current usage: Identify all endpoints hitting /v1/assistants and /v1/threads
Map tools: Convert Assistant tool definitions to Responses API tool parameters or Skills
Replace threads: Move from server-managed threads to previous_response_id chaining with compaction
Update polling logic: Replace Run polling with streaming or synchronous responses
Test cache performance: Measure token savings from improved cache utilization
Set a deadline: Complete migration well before August 26, 2026

What This Means for Developers Using Fazm

If you use Fazm to monitor your development workflow, these API changes surface in a few ways. Teams using OpenAI models through their IDE or CI/CD pipelines will see changes in model availability as older Codex models are removed. The GPT-5.4 mini and nano models offer new price/performance options for AI-assisted code review and generation tasks that Fazm can help you track and optimize.

Timeline of April 2026 OpenAI API Changes

| Date | Change | |---|---| | March 18 | GPT-5.4 mini and nano released | | April 1 | Server-side compaction available in Responses API | | April 1 | Skills support launched in Responses API | | April 7 | Older Codex models removed from model picker | | April 7 | Sora API expanded with character references, 20s clips, Batch API | | April 7 | GPT Image Batch API support added | | April 7 | gpt-realtime-1.5 released | | April 14 | Older Codex models removed from Codex for ChatGPT sign-in | | August 26 | Assistants API sunset (confirmed) | | September 24 | Sora 2 / Videos API shutdown |

Key Takeaways

The April 2026 updates push developers toward two clear directions: the Responses API as the single unified surface for all OpenAI interactions, and smaller, cheaper models for production workloads. GPT-5.4 mini at $0.75/$4.50 per million tokens with near-flagship performance makes it the default choice for most production use cases. Server-side compaction removes the complexity of manual context management. And the Assistants API sunset clock is now under five months away, making migration planning urgent for any team still on threads and runs.

OpenAI API Updates April 2026: GPT-5.4 Mini/Nano, Server-Side Compaction, and Sora Batch API

OpenAI API Updates April 2026: GPT-5.4 Mini/Nano, Server-Side Compaction, and Sora Batch API

Summary of April 2026 Changes

GPT-5.4 Mini and Nano: The Headline Model Launches

Pricing Comparison

Benchmark Highlights

Both Models Support Compaction

Quick Start

Server-Side Compaction in the Responses API

How It Works

Context Management Flow

When to Use Server-Side Compaction

Skills in the Responses API

Sora API: Batch Video Generation and Character References

New Video API Capabilities

Batch Video Generation Example

GPT Image Batch API Support

Realtime API: DTMF and gpt-realtime-1.5

Codex: Pay-As-You-Go Seats

Model Picker Cleanup

Assistants API Sunset: August 26, 2026

Migration Path

Migration Checklist

What This Means for Developers Using Fazm

Timeline of April 2026 OpenAI API Changes

Key Takeaways

Related Posts

AI Developer Tools Release Notes and Changelog: April 2026

Open Source AI Projects and Tools Updates: April 11-12, 2026

Zapier X Integration Status April 2026: What Works, What Changed, and How to Connect