whisper.cpp / ggml models

There is no ggml-base.bin.zip

People search for that filename all the time, so let me save you the digging: it is not in the repo. The whisper base model ships uncompressed. The only zip sitting next to it is something else entirely, and you probably do not need it.

Matthew Diakonov, Written with AI

Published June 19, 20267 min read

Direct answer · verified 2026-06-19

No file named ggml-base.bin.zip exists in ggerganov/whisper.cpp. The model is the plain ggml-base.bin at 147,951,465 bytes (about 148 MB), shipped uncompressed. The only .zip beside it is the optional Apple Core ML encoder ggml-base-encoder.mlmodelc.zip (~38 MB). The model loads as a raw .bin; there is no unzip step for the model itself.

Source of truth: huggingface.co/ggerganov/whisper.cpp/tree/main (Files and versions tab). Re-checked 2026-06-19.

What you pictured vs what the repo holds

The mental model behind the search is "the model is a zip I download and extract." That is true for plenty of ML toolchains, but not for ggml. A ggml file is already one packed binary blob, so it ships as-is. Here is the gap between the two listings.

ggml-base.bin.zip vs the actual files

models/
└── ggml-base.bin.zip      # the file you searched for
                           # 404 — it is not in the repo

-33% fewer lines

The exact byte count for the multilingual base model is 147,951,465 bytes. The whisper.cpp README rounds it to "142 MB" because it counts in MiB; Hugging Face reports it as roughly 148 MB. Same file, two ways of writing the size. If you want the literal download URL for the bare model, the ggml-base.bin resolve/main guide has it.

Why one file is zipped and the other is not

The reason comes down to file shape, not compression policy. A .bin is a single file, so Hugging Face can host it directly and whisper.cpp can mmap it straight off disk. Zipping it would only force an extra extraction step for no size win.

The Core ML encoder is the opposite. A compiled Core ML model is a .mlmodelc directory: weights, a manifest, and metadata in separate files inside a folder. A folder cannot be a single download, so the repo archives it as ggml-base-encoder.mlmodelc.zip. You expand that zip back into the .mlmodelc folder before whisper.cpp can use it. That is the entire reason a .zip shows up in a directory full of .bin files, and why it confuses people into looking for a zipped model.

Third source of confusion: the SourceForge mirror also publishes whisper-bin-x64.zip and whisper-bin-Win32.zip (about 4 MB each). Those are the prebuilt Windows binaries, not the model and not the Core ML encoder. Three different zips in this ecosystem, and none of them is a zipped ggml-base.bin.

If you do want the Core ML encoder (the legit zip)

On a Mac the Core ML encoder is worth it for long jobs: it runs the encoder pass on the Apple Neural Engine. The model .bin stays separate and never gets zipped. The encoder is the only thing you unzip.

setup-coreml.sh

Two gotchas worth knowing. First, the build flag is -DWHISPER_COREML=1; without it the encoder zip on disk is simply ignored. Second, the first transcription after install is slow because the Neural Engine compiles the model on the fly. Do not benchmark on run one. If you would rather generate the encoder yourself instead of downloading it, the models README documents ./models/generate-coreml-model.sh base.en, which produces the same .mlmodelc folder locally.

When you are doing all this just to talk to your Mac

Plenty of people land on ggml-base.bin because they want voice input on macOS and whisper.cpp is the local, hackable route. That is a real and good path if you want to own the pipeline. It is also a fair amount of plumbing: the model, maybe the Core ML zip, a build, and then the wiring from microphone to text to whatever you actually wanted to do. I build fazm, a native macOS agent, so here is the honest trade-off.

Wiring whisper yourself vs a voice-first agent

Full control of the model, fully local, hackable end to end. You assemble it: download ggml-base.bin, decide on the Core ML zip, build with the right flags, then write the glue from audio to action.

Pick exactly which ggml model and quantization you run
Everything stays on-device, no API
You own the audio-to-text-to-action wiring
Setup, builds, and updates are on you

Neither is "better." If the point is to own a local whisper stack, keep going with the .bin and skip the rest of this site. If the point was just to stop typing and have something act on what you said, that is the gap fazm fills.

Going voice-first on macOS without wrangling model files?

Tell me what you are trying to drive by voice and I will tell you honestly whether whisper.cpp or fazm fits better.

Questions people actually ask about this

Frequently asked questions

Is there really no ggml-base.bin.zip file?

Correct. The official ggerganov/whisper.cpp repository on Hugging Face does not contain a file named ggml-base.bin.zip. The model is shipped as a plain ggml-base.bin (147,951,465 bytes, about 148 MB), uncompressed. A ggml file is already a single packed binary, so there is nothing to gain by zipping it, and whisper.cpp loads the .bin directly with no unzip step.

Then what is the .zip I see next to ggml-base.bin?

It is the Apple Core ML encoder: ggml-base-encoder.mlmodelc.zip (about 37.9 MB) and the English-only ggml-base.en-encoder.mlmodelc.zip (about 38 MB). These are zipped because a .mlmodelc is a compiled Core ML model directory, not a single file, and a folder cannot be hosted as one downloadable object. So Hugging Face stores it as a .zip that you expand into a .mlmodelc folder.

Do I need the Core ML encoder zip to run whisper.cpp?

No. The .bin alone runs on CPU (and Metal on Apple Silicon). The Core ML encoder is purely an optional accelerator: on a Mac it moves the encoder pass onto the Apple Neural Engine, which makes long transcriptions faster after the first run. If you only have the .bin, whisper.cpp works fine; it just will not touch the ANE.

How do I install the Core ML encoder once I have the zip?

Unzip it in the same folder as the model so you have ggml-base.en-encoder.mlmodelc/ sitting next to ggml-base.en.bin, then build whisper.cpp with cmake -B build -DWHISPER_COREML=1. At runtime whisper.cpp finds the encoder by name and loads it automatically. The first run is slower because the Apple Neural Engine compiles the model; subsequent runs are fast.

Is whisper-bin-x64.zip the model?

No. whisper-bin-x64.zip and whisper-bin-Win32.zip on the SourceForge mirror are the prebuilt Windows command-line binaries (around 4 MB), not a model. You still download a ggml-*.bin model separately. Mixing these two up is one of the main reasons people end up searching for a ggml-base.bin.zip that does not exist.

base or base.en, and how big is base really?

ggml-base.bin is multilingual; ggml-base.en.bin is English-only and slightly more accurate on English audio. Both are about 148 MB (the README rounds to 142 MB using MiB; Hugging Face shows the byte count 147,951,465). If you only transcribe English, take base.en. There are also quantized variants like ggml-base-q5_1.bin if you want a smaller file at a small accuracy cost.

Where is the authoritative file list?

The Files and versions tab at huggingface.co/ggerganov/whisper.cpp/tree/main is the source of truth. Every model .bin, every quantized variant, and every *-encoder.mlmodelc.zip Core ML bundle is listed there with its real size. If a file is not in that list, it is not an official whisper.cpp artifact.

There is no ggml-base.bin.zip

What you pictured vs what the repo holds

Why one file is zipped and the other is not

If you do want the Core ML encoder (the legit zip)

When you are doing all this just to talk to your Mac

Wiring whisper yourself vs a voice-first agent

Going voice-first on macOS without wrangling model files?

Questions people actually ask about this

Frequently asked questions

Related, on this site

Comments ()

What you pictured vs what the repo holds

Why one file is zipped and the other is not

If you do want the Core ML encoder (the legit zip)

When you are doing all this just to talk to your Mac

Wiring whisper yourself vs a voice-first agent

Going voice-first on macOS without wrangling model files?

Questions people actually ask about this

Frequently asked questions

Related, on this site

Comments (••)

Comments ()