There is no ggml-tiny.bin.zip
If you went looking for a file called ggml-tiny.bin.zip, you will not find one in the place everyone downloads Whisper models from. The tiny model ships uncompressed. The only zip sitting next to it is a different file entirely, and confusing the two is the most common way people waste an hour on this. Here is the straight answer, the exact byte sizes, the one-line download, and a note from building voice input into a Mac agent on why we ship neither file.
ggml-tiny.bin.zip does not exist. The ggerganov/whisper.cpp repository hosts the tiny model uncompressed as ggml-tiny.bin ( 77,691,713 bytes, about 77.7 MB ). The only zip beside it is ggml-tiny-encoder.mlmodelc.zip ( 15,037,446 bytes ), the optional Apple Core ML encoder. The model file is never zipped.
Direct download for the real file:
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.binSource of truth for what exists in the repo: the file tree on Hugging Face.
bytes in ggml-tiny.bin
bytes in the Core ML encoder zip
files named ggml-tiny.bin.zip
Every file with "tiny" in the name, and its real size
This is the full set of tiny-related files in ggerganov/whisper.cpp. Sizes are exact byte counts from the Hugging Face model API, read on 2026-06-21. Note that exactly one of these is a zip, and it is not the model.
| File | Bytes | What it is |
|---|---|---|
ggml-tiny.bin77.7 MB | 77,691,713 | The full tiny model, multilingual. This is the file most people mean. |
ggml-tiny.en.bin77.7 MB | 77,704,715 | English-only tiny. Slightly better on English, useless for other languages. |
ggml-tiny-q8_0.bin43.5 MB | 43,537,433 | 8-bit quantized tiny. Smaller and faster, a small accuracy hit. |
ggml-tiny-q5_1.bin32.2 MB | 32,152,673 | 5-bit quantized tiny. The smallest practical tiny weight. |
ggml-tiny-encoder.mlmodelc.zip15.0 MB | 15,037,446 | The only zip beside tiny. A compiled Core ML encoder, not the model itself. |
There are also quantized and English-only variants of the other model sizes (base, small, medium, large), each with its own -encoder.mlmodelc.zip. None of them is a zipped weight file either. See the models README for the full matrix.
Why people type .zip when they want .bin
Three things collide. First, a lot of older tutorials and course bundles redistributed Whisper weights inside their own archives, so people remember a zip step that the official repo never had. Second, the GGML weight is a single binary, but the Core ML encoder is a .mlmodelc package, which is really a directory, so Hugging Face has to store it as a zip. The eye sees a zip in the file list and assumes the model is in there. Third, browsers and search boxes love to autocomplete .bin into .bin.zip because so many model files elsewhere are archived.
The practical rule: download the .bin directly and never unzip it. Only touch the zip if you specifically want the Apple Neural Engine path, and then you unzip the encoder, not the model.
Getting the real files, end to end
The official helper script fetches the plain weight. If you also want the Core ML encoder for the ANE speedup, you grab that separately and unzip it next to the model.
You can also pull the weight from its blob view or the encoder zip directly. The script just wraps the same Hugging Face URLs.
A field note: why our voice agent ships neither file
I build Fazm, a native Mac agent with voice-first input: hold a hotkey, talk, and the same agent loop runs with no typing. The obvious move for a local-first tool is to bundle a small Whisper model, and tiny is the smallest. We tried that path and did not take it. We do not ship ggml-tiny.bin or the Core ML encoder. Here is the honest reasoning, not a pitch.
Live dictation is a different problem from transcribing a finished clip. You need partial results as the user speaks, correct punctuation, and graceful behavior on silence. The tiny model is fast but it is the least accurate checkpoint, and at 39M parameters it hallucinates on low-energy audio in ways that are very visible when the words are being typed straight into a field. So our transcription path streams audio to a hosted streaming model (Deepgram's Nova-3) over a WebSocket instead. The concrete shape of that pipeline, straight from Desktop/Sources/TranscriptionService.swift:
Fazm voice input pipeline (no local Whisper weight)
Mic capture
16 kHz, linear16 PCM, mono for push-to-talk
100 ms buffer
audioBufferSize = 3200 bytes, flushed as it fills
Stream to Nova-3
WebSocket with keepalive every 8s and a 60s stale-connection watchdog
Clean the text
spoken-form rules turn "dot com" into ".com", "dot ts" into ".ts"
Drop hallucinations
isRepeatedTokenHallucination filters the loop-on-silence failure mode
Into the agent
final transcript becomes the prompt, no typing required
Two details in there are the kind of thing you only learn after running a small ASR model against real microphones. The isRepeatedTokenHallucination guard exists because small Whisper-class decoders, when fed near-silent audio, latch onto one token and loop it four, five, ten times. The fix is mechanical: if the segment is four or more tokens and every token is identical, drop it. The domain replacement table exists because developers dictate URLs and file extensions constantly, and no tiny model reliably writes ".com" instead of "dot com" on its own. Both problems get worse, not better, with a smaller local model, which is the core reason tiny did not make the cut for live input.
When ggml-tiny.bin is genuinely the right call
None of this means tiny is useless. It is the right tool in a few clear cases:
- You are fully offline or air-gapped and a hosted streaming service is not an option. A 77.7 MB local model that runs anywhere beats no transcription.
- You are batch-processing short, clean clips where latency does not matter and you can re-run anything that comes out wrong.
- You are prototyping the plumbing and want the fastest possible decode while you wire up audio capture, then swap in a larger model later.
- You are on Apple silicon and pair the weight with
ggml-tiny-encoder.mlmodelc.zipso the encoder pass runs on the Neural Engine. That is the one time the zip earns its place.
Putting voice input into a Mac tool?
Twenty minutes on what we learned shipping streaming dictation into a native agent, the parts that broke, and where a local Whisper model still makes sense.
Frequently asked questions
Where do I download ggml-tiny.bin.zip?
You cannot, because it does not exist. The ggerganov/whisper.cpp repository on Hugging Face hosts the tiny model uncompressed as ggml-tiny.bin (77,691,713 bytes). There is no .bin.zip anywhere in that repo. If a third-party mirror, torrent, or course bundle hands you a ggml-tiny.bin.zip, it is just someone who zipped the plain .bin themselves. Prefer the official file so the checksum matches what whisper.cpp expects.
Then what is the .zip file I keep seeing next to the tiny model?
It is ggml-tiny-encoder.mlmodelc.zip (15,037,446 bytes). That is a compiled Apple Core ML model package (.mlmodelc) that has been zipped because a .mlmodelc is actually a directory, not a single file. It is the optional Core ML encoder that lets whisper.cpp run the encoder pass on the Apple Neural Engine. You only need it if you built whisper.cpp with WHISPER_COREML and want the ANE speedup. It is not a substitute for ggml-tiny.bin; you use both together.
How big is ggml-tiny.bin exactly?
As of 2026-06-21 it is 77,691,713 bytes, which display tools round to about 77.7 MB (74.1 MiB). The English-only ggml-tiny.en.bin is 77,704,715 bytes. The 5-bit quantized ggml-tiny-q5_1.bin is 32,152,673 bytes.
Do I unzip ggml-tiny.bin after downloading?
No. ggml-tiny.bin is the raw GGML weight file. You point the whisper.cpp binary at it directly with -m models/ggml-tiny.bin. The only file you ever unzip in this flow is ggml-tiny-encoder.mlmodelc.zip, which expands into a ggml-tiny-encoder.mlmodelc directory that lives next to the .bin.
Is tiny good enough for real-time dictation?
For short, clean, single-speaker clips, often yes. For live dictation where you expect punctuation, domains like .com, code tokens, and no hallucinated loops on silence, tiny struggles. When we built voice input for our Mac agent we did not ship any local Whisper model; we stream to a hosted streaming model instead, because the tiny model's quality and the latency of running it locally did not clear the bar for live typing.
What is the difference between the GGML tiny model and Core ML or WhisperKit?
ggml-tiny.bin is the weight in GGML format for whisper.cpp. The .mlmodelc.zip is an Apple Core ML compilation of the encoder only, used to offload that pass to the Neural Engine while whisper.cpp still drives decoding. WhisperKit is a separate Swift package that ships its own Core ML model bundles and does not use ggml-tiny.bin at all. Three different packagings of the same underlying Whisper checkpoint.
Related
There is no "ggml-base.bin.zip" either
The same confusion, one model size up: the base weight, its Core ML zip, and which file you actually need.
ggml-tiny.bin: where to download it and when tiny is wrong
The exact download for the tiny weight and an honest take on when 77.7 MB is enough.
whisper.cpp Metal on Apple silicon
GPU acceleration for whisper.cpp on M-series Macs and where Core ML fits in.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.