ScreenCaptureKit Demo App: Build a Working Screen Capture Tool on macOS

Matthew Diakonov··14 min read

ScreenCaptureKit Demo App: Build a Working Screen Capture Tool on macOS

ScreenCaptureKit is Apple's framework for capturing screen content on macOS 12.3 and later. It replaced the old CGWindowListCreateImage approach with a modern, async, permission-aware API that gives you per-frame control over what gets captured and how. This guide walks you through building a complete demo app from scratch, with every code block ready to compile and run.

Why ScreenCaptureKit Instead of the Old APIs

Before macOS 12.3, capturing the screen meant calling CGWindowListCreateImage or using AVCaptureScreenInput. Both worked, but both had problems. CGWindowListCreateImage is synchronous, blocks the calling thread, and gives you a single snapshot with no streaming support. AVCaptureScreenInput was deprecated in macOS 13, and Apple has been steering developers away from it since 2022.

ScreenCaptureKit solves these issues:

| Feature | CGWindowListCreateImage | AVCaptureScreenInput | ScreenCaptureKit | |---|---|---|---| | Streaming support | No (single frame) | Yes | Yes | | Async/await | No | No | Yes | | Per-window filtering | No | No | Yes | | Frame rate control | N/A | Limited | Fine-grained (1-60 fps) | | Pixel format options | BGRA only | Limited | BGRA, YUV 420, l10r | | macOS requirement | 10.5+ | 10.7-12 (deprecated) | 12.3+ | | Screen recording permission | Not required (pre-14) | Required | Required |

ScreenCaptureKit is the only option Apple actively maintains. If you are building anything that captures screen content in 2025 or 2026, this is the API you should use.

Prerequisites

Before writing any code, make sure you have:

  • macOS 12.3 or later (macOS 14+ recommended for the latest APIs)
  • Xcode 14 or later
  • A project with the ScreenCaptureKit framework linked
  • The "Screen Recording" permission entitlement, or a development signing identity that allows the permission prompt

ScreenCaptureKit requires the user to grant screen recording permission the first time your app requests content. During development, you will see the system prompt automatically. For distribution, you need the com.apple.security.screen-capture entitlement if you are sandboxed.

Project Setup

Create a new macOS App project in Xcode (SwiftUI lifecycle). Then add the ScreenCaptureKit framework:

  1. Select your target in Xcode
  2. Go to "Frameworks, Libraries, and Embedded Content"
  3. Click "+" and add ScreenCaptureKit.framework

For the entitlements, add this to your .entitlements file if you are using the App Sandbox:

<key>com.apple.security.screen-capture</key>
<true/>

If you are not sandboxed (common during development), the framework will prompt for permission automatically.

Step 1: Discover Available Content

ScreenCaptureKit organizes capturable content into three types: displays (SCDisplay), windows (SCWindow), and applications (SCRunningApplication). Before you can capture anything, you need to query what is available.

import ScreenCaptureKit

func discoverContent() async throws {
    let availableContent = try await SCShareableContent.excludingDesktopWindows(
        false,
        onScreenWindowsOnly: true
    )

    print("Displays: \(availableContent.displays.count)")
    print("Windows: \(availableContent.windows.count)")
    print("Applications: \(availableContent.applications.count)")

    for display in availableContent.displays {
        print("  Display \(display.displayID): \(display.width)x\(display.height)")
    }

    for window in availableContent.windows.prefix(10) {
        let title = window.title ?? "(no title)"
        let app = window.owningApplication?.applicationName ?? "(unknown)"
        print("  Window: \(title) [\(app)]")
    }
}

Note

The first call to SCShareableContent triggers the screen recording permission dialog. If the user denies it, the call throws an error. Handle this gracefully in production apps.

Step 2: Build a Content Filter

A content filter (SCContentFilter) tells ScreenCaptureKit exactly what to capture. You can capture an entire display, a single window, or a display minus specific apps.

// Capture the main display
let mainDisplay = availableContent.displays.first!
let filter = SCContentFilter(
    display: mainDisplay,
    excludingApplications: [],
    exceptingWindows: []
)

For single window capture:

// Capture a specific window
let targetWindow = availableContent.windows.first {
    $0.owningApplication?.bundleIdentifier == "com.apple.Safari"
}!
let windowFilter = SCContentFilter(desktopIndependentWindow: targetWindow)

For capturing everything except your own app (useful for screen recording tools):

let selfApp = availableContent.applications.first {
    $0.bundleIdentifier == Bundle.main.bundleIdentifier
}
let excludeSelfFilter = SCContentFilter(
    display: mainDisplay,
    excludingApplications: selfApp.map { [$0] } ?? [],
    exceptingWindows: []
)
SCShareableContentSCContentFilterSCStreamConfigurationSCStream(running)SCStreamOutput (delegate)CMSampleBuffer (frames)ScreenCaptureKit capture pipeline

Step 3: Configure the Stream

SCStreamConfiguration controls resolution, frame rate, pixel format, and other capture parameters. Getting this right matters for both quality and performance.

let config = SCStreamConfiguration()

// Resolution: match the display or pick a custom size
config.width = Int(mainDisplay.width) * 2   // Retina
config.height = Int(mainDisplay.height) * 2

// Frame rate
config.minimumFrameInterval = CMTime(value: 1, timescale: 30) // 30 fps

// Pixel format
config.pixelFormat = kCVPixelFormatType_32BGRA

// Show the cursor in captures
config.showsCursor = true

// Queue depth: how many frames can buffer before dropping
config.queueDepth = 5

Pixel Format Choices

| Format | Constant | Use case | CPU cost | |---|---|---|---| | BGRA 8-bit | kCVPixelFormatType_32BGRA | UI rendering, image processing | Low | | YUV 420v | kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange | Video encoding (H.264/HEVC) | Lowest | | YUV 420f | kCVPixelFormatType_420YpCbCr8BiPlanarFullRange | Video encoding (full range) | Lowest | | l10r | kCVPixelFormatType_14Bayer_RGGB | HDR capture | Higher |

For a demo app, BGRA is the simplest choice. If you plan to encode to video, switch to YUV 420 and let VideoToolbox handle the encoding without a color space conversion step.

Step 4: Implement the Stream Output Delegate

SCStreamOutput is where your app receives frames. Each frame arrives as a CMSampleBuffer on a dispatch queue you specify.

import ScreenCaptureKit
import CoreMedia
import CoreImage

class CaptureEngine: NSObject, SCStreamOutput {
    private var stream: SCStream?
    private let captureQueue = DispatchQueue(label: "com.demo.capture", qos: .userInteractive)

    var onFrameCaptured: ((CIImage) -> Void)?

    func startCapture(filter: SCContentFilter, config: SCStreamConfiguration) async throws {
        stream = SCStream(filter: filter, configuration: config, delegate: nil)
        try stream?.addStreamOutput(self, type: .screen, sampleHandlerQueue: captureQueue)
        try await stream?.startCapture()
    }

    func stopCapture() async throws {
        try await stream?.stopCapture()
        stream = nil
    }

    func stream(
        _ stream: SCStream,
        didOutputSampleBuffer sampleBuffer: CMSampleBuffer,
        of type: SCStreamOutputType
    ) {
        guard type == .screen else { return }
        guard let pixelBuffer = sampleBuffer.imageBuffer else { return }

        let ciImage = CIImage(cvPixelBuffer: pixelBuffer)
        onFrameCaptured?(ciImage)
    }
}

Warning

The delegate callback fires on the dispatch queue you provided, not on the main thread. If you update UI from here, dispatch back to the main queue. Doing heavy work directly in the callback will cause frame drops.

Step 5: Display Frames in SwiftUI

Now connect the capture engine to a SwiftUI view that shows the live capture:

import SwiftUI

@MainActor
class ScreenCaptureViewModel: ObservableObject {
    @Published var latestFrame: NSImage?
    private let engine = CaptureEngine()
    private let ciContext = CIContext()

    func startCapture() async {
        do {
            let content = try await SCShareableContent.excludingDesktopWindows(
                false, onScreenWindowsOnly: true
            )

            guard let display = content.displays.first else {
                print("No displays found")
                return
            }

            let filter = SCContentFilter(
                display: display,
                excludingApplications: [],
                exceptingWindows: []
            )

            let config = SCStreamConfiguration()
            config.width = Int(display.width)
            config.height = Int(display.height)
            config.minimumFrameInterval = CMTime(value: 1, timescale: 10) // 10 fps for demo
            config.pixelFormat = kCVPixelFormatType_32BGRA
            config.showsCursor = true

            engine.onFrameCaptured = { [weak self] ciImage in
                guard let self else { return }
                let cgImage = self.ciContext.createCGImage(ciImage, from: ciImage.extent)
                let nsImage = cgImage.map { NSImage(cgImage: $0, size: NSSize(width: $0.width, height: $0.height)) }
                Task { @MainActor in
                    self.latestFrame = nsImage
                }
            }

            try await engine.startCapture(filter: filter, config: config)
        } catch {
            print("Capture failed: \(error)")
        }
    }

    func stopCapture() async {
        try? await engine.stopCapture()
    }
}

struct ContentView: View {
    @StateObject private var viewModel = ScreenCaptureViewModel()
    @State private var isCapturing = false

    var body: some View {
        VStack(spacing: 16) {
            if let frame = viewModel.latestFrame {
                Image(nsImage: frame)
                    .resizable()
                    .aspectRatio(contentMode: .fit)
                    .frame(maxWidth: 800, maxHeight: 500)
                    .clipShape(RoundedRectangle(cornerRadius: 12))
            } else {
                Text("No capture running")
                    .foregroundColor(.secondary)
                    .frame(maxWidth: 800, maxHeight: 500)
            }

            Button(isCapturing ? "Stop Capture" : "Start Capture") {
                Task {
                    if isCapturing {
                        await viewModel.stopCapture()
                    } else {
                        await viewModel.startCapture()
                    }
                    isCapturing.toggle()
                }
            }
            .buttonStyle(.borderedProminent)
        }
        .padding(24)
        .frame(minWidth: 600, minHeight: 400)
    }
}

This gives you a live preview of your screen inside the app window at 10 fps. Bump the frame rate to 30 or 60 for smoother output, but be aware that higher rates increase CPU usage.

Step 6: Add Window Filtering

One of ScreenCaptureKit's best features is the ability to capture specific windows. Here is how to add a window picker to the demo:

func captureWindow(bundleID: String) async throws {
    let content = try await SCShareableContent.excludingDesktopWindows(
        false, onScreenWindowsOnly: true
    )

    guard let window = content.windows.first(where: {
        $0.owningApplication?.bundleIdentifier == bundleID && $0.isOnScreen
    }) else {
        throw CaptureError.windowNotFound
    }

    let filter = SCContentFilter(desktopIndependentWindow: window)

    let config = SCStreamConfiguration()
    config.width = Int(window.frame.width) * 2
    config.height = Int(window.frame.height) * 2
    config.minimumFrameInterval = CMTime(value: 1, timescale: 30)
    config.pixelFormat = kCVPixelFormatType_32BGRA
    config.showsCursor = true

    try await engine.startCapture(filter: filter, config: config)
}

enum CaptureError: Error {
    case windowNotFound
}

Common Pitfalls

  • Forgetting to handle the permission denial. If the user clicks "Don't Allow" on the screen recording prompt, SCShareableContent throws SCStreamError.userDeclined. Your app needs to show a message explaining how to enable the permission in System Settings > Privacy & Security > Screen Recording. There is no way to re-trigger the prompt programmatically.

  • Retina resolution mismatch. On Retina displays, SCDisplay.width and SCDisplay.height return logical points, not pixels. If you want pixel-perfect capture, multiply by the display's scale factor (usually 2). If you set the stream configuration to the logical size, you get a half-resolution capture.

  • Frame drops from slow processing. The stream delivers frames at the rate you requested. If your processing (color conversion, encoding, UI updates) takes longer than the frame interval, frames queue up to queueDepth and then drop. Use the .didDropSampleBuffer delegate method to detect this and reduce your frame rate or offload work.

  • Not cleaning up the stream. If you deallocate the SCStream without calling stopCapture(), the screen recording indicator in the menu bar stays visible until your app exits. Always stop explicitly.

  • Capturing audio without adding the audio output. ScreenCaptureKit can capture audio, but you need to add a separate stream output with type .audio. The screen output only delivers video frames.

Minimal Working Example

Here is the complete CaptureEngine and ContentView in a single file you can drop into a new Xcode project:

// ScreenCaptureDemoApp.swift
import SwiftUI
import ScreenCaptureKit
import CoreMedia
import CoreImage

@main
struct ScreenCaptureDemoApp: App {
    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

class CaptureEngine: NSObject, SCStreamOutput {
    private var stream: SCStream?
    private let queue = DispatchQueue(label: "capture", qos: .userInteractive)
    var onFrame: ((CIImage) -> Void)?

    func start(filter: SCContentFilter, config: SCStreamConfiguration) async throws {
        stream = SCStream(filter: filter, configuration: config, delegate: nil)
        try stream?.addStreamOutput(self, type: .screen, sampleHandlerQueue: queue)
        try await stream?.startCapture()
    }

    func stop() async throws {
        try await stream?.stopCapture()
        stream = nil
    }

    func stream(_ stream: SCStream, didOutputSampleBuffer buf: CMSampleBuffer, of type: SCStreamOutputType) {
        guard type == .screen, let pb = buf.imageBuffer else { return }
        onFrame?(CIImage(cvPixelBuffer: pb))
    }
}

@MainActor
class VM: ObservableObject {
    @Published var frame: NSImage?
    let engine = CaptureEngine()
    let ctx = CIContext()

    func start() async {
        let content = try? await SCShareableContent.excludingDesktopWindows(false, onScreenWindowsOnly: true)
        guard let display = content?.displays.first else { return }

        let filter = SCContentFilter(display: display, excludingApplications: [], exceptingWindows: [])
        let config = SCStreamConfiguration()
        config.width = Int(display.width) * 2
        config.height = Int(display.height) * 2
        config.minimumFrameInterval = CMTime(value: 1, timescale: 15)
        config.pixelFormat = kCVPixelFormatType_32BGRA
        config.showsCursor = true

        engine.onFrame = { [weak self] ci in
            guard let self, let cg = self.ctx.createCGImage(ci, from: ci.extent) else { return }
            let ns = NSImage(cgImage: cg, size: .init(width: cg.width, height: cg.height))
            Task { @MainActor in self.frame = ns }
        }
        try? await engine.start(filter: filter, config: config)
    }

    func stop() async { try? await engine.stop() }
}

struct ContentView: View {
    @StateObject var vm = VM()
    @State var on = false
    var body: some View {
        VStack(spacing: 16) {
            if let f = vm.frame {
                Image(nsImage: f).resizable().aspectRatio(contentMode: .fit)
                    .frame(maxWidth: 800, maxHeight: 500)
                    .clipShape(RoundedRectangle(cornerRadius: 12))
            } else {
                Text("Press Start").foregroundColor(.secondary)
                    .frame(maxWidth: 800, maxHeight: 500)
            }
            Button(on ? "Stop" : "Start") {
                Task { on ? await vm.stop() : await vm.start(); on.toggle() }
            }.buttonStyle(.borderedProminent)
        }.padding(24).frame(minWidth: 600, minHeight: 400)
    }
}

Build and run. Click "Start", approve the screen recording permission, and you will see your own screen rendered inside the app window. That is the full ScreenCaptureKit pipeline in under 60 lines of business logic.

Going Further

Once the basic capture is working, there are several directions to extend the demo:

  • Save frames as images. Use CGImage to write PNGs via NSBitmapImageRep and representationUsingType(.png).
  • Record to video. Pipe CMSampleBuffer frames into an AVAssetWriter with an H.264 or HEVC codec. Use YUV 420 pixel format to skip the color conversion.
  • Add audio capture. Create a second stream output with .audio type and write the audio samples alongside video in your asset writer.
  • Filter specific apps. Use the excludingApplications parameter to hide your own app, or capture just one app by using desktopIndependentWindow.
  • Respond to display changes. Subscribe to SCStream.delegate methods to handle display configuration changes (resolution switches, displays connecting/disconnecting).

Wrapping Up

ScreenCaptureKit gives you async, filtered, high-performance screen capture with about 50 lines of setup code. The key steps are: discover content, build a filter, configure the stream, implement the output delegate, and start capturing. The demo app above covers the entire pipeline and runs as a standalone Xcode project.

Fazm is an open source macOS AI agent that uses ScreenCaptureKit for real-time screen understanding. Open source on GitHub.

Related Posts