There is no ggml-tiny.bin.zip

M
Matthew Diakonov
8 min read

If you went looking for a file called ggml-tiny.bin.zip, you will not find one in the place everyone downloads Whisper models from. The tiny model ships uncompressed. The only zip sitting next to it is a different file entirely, and confusing the two is the most common way people waste an hour on this. Here is the straight answer, the exact byte sizes, the one-line download, and a note from building voice input into a Mac agent on why we ship neither file.

Direct answerVerified 2026-06-21

ggml-tiny.bin.zip does not exist. The ggerganov/whisper.cpp repository hosts the tiny model uncompressed as ggml-tiny.bin ( 77,691,713 bytes, about 77.7 MB ). The only zip beside it is ggml-tiny-encoder.mlmodelc.zip ( 15,037,446 bytes ), the optional Apple Core ML encoder. The model file is never zipped.

Direct download for the real file:

https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin

Source of truth for what exists in the repo: the file tree on Hugging Face.

0

bytes in ggml-tiny.bin

0

bytes in the Core ML encoder zip

0 .bin.zip

files named ggml-tiny.bin.zip

Every file with "tiny" in the name, and its real size

This is the full set of tiny-related files in ggerganov/whisper.cpp. Sizes are exact byte counts from the Hugging Face model API, read on 2026-06-21. Note that exactly one of these is a zip, and it is not the model.

FileBytesWhat it is
ggml-tiny.bin
77.7 MB
77,691,713The full tiny model, multilingual. This is the file most people mean.
ggml-tiny.en.bin
77.7 MB
77,704,715English-only tiny. Slightly better on English, useless for other languages.
ggml-tiny-q8_0.bin
43.5 MB
43,537,4338-bit quantized tiny. Smaller and faster, a small accuracy hit.
ggml-tiny-q5_1.bin
32.2 MB
32,152,6735-bit quantized tiny. The smallest practical tiny weight.
ggml-tiny-encoder.mlmodelc.zip
15.0 MB
15,037,446The only zip beside tiny. A compiled Core ML encoder, not the model itself.

There are also quantized and English-only variants of the other model sizes (base, small, medium, large), each with its own -encoder.mlmodelc.zip. None of them is a zipped weight file either. See the models README for the full matrix.

Why people type .zip when they want .bin

Three things collide. First, a lot of older tutorials and course bundles redistributed Whisper weights inside their own archives, so people remember a zip step that the official repo never had. Second, the GGML weight is a single binary, but the Core ML encoder is a .mlmodelc package, which is really a directory, so Hugging Face has to store it as a zip. The eye sees a zip in the file list and assumes the model is in there. Third, browsers and search boxes love to autocomplete .bin into .bin.zip because so many model files elsewhere are archived.

The practical rule: download the .bin directly and never unzip it. Only touch the zip if you specifically want the Apple Neural Engine path, and then you unzip the encoder, not the model.

Getting the real files, end to end

The official helper script fetches the plain weight. If you also want the Core ML encoder for the ANE speedup, you grab that separately and unzip it next to the model.

whisper.cpp model setup

You can also pull the weight from its blob view or the encoder zip directly. The script just wraps the same Hugging Face URLs.

A field note: why our voice agent ships neither file

I build Fazm, a native Mac agent with voice-first input: hold a hotkey, talk, and the same agent loop runs with no typing. The obvious move for a local-first tool is to bundle a small Whisper model, and tiny is the smallest. We tried that path and did not take it. We do not ship ggml-tiny.bin or the Core ML encoder. Here is the honest reasoning, not a pitch.

Live dictation is a different problem from transcribing a finished clip. You need partial results as the user speaks, correct punctuation, and graceful behavior on silence. The tiny model is fast but it is the least accurate checkpoint, and at 39M parameters it hallucinates on low-energy audio in ways that are very visible when the words are being typed straight into a field. So our transcription path streams audio to a hosted streaming model (Deepgram's Nova-3) over a WebSocket instead. The concrete shape of that pipeline, straight from Desktop/Sources/TranscriptionService.swift:

Fazm voice input pipeline (no local Whisper weight)

1

Mic capture

16 kHz, linear16 PCM, mono for push-to-talk

2

100 ms buffer

audioBufferSize = 3200 bytes, flushed as it fills

3

Stream to Nova-3

WebSocket with keepalive every 8s and a 60s stale-connection watchdog

4

Clean the text

spoken-form rules turn "dot com" into ".com", "dot ts" into ".ts"

5

Drop hallucinations

isRepeatedTokenHallucination filters the loop-on-silence failure mode

6

Into the agent

final transcript becomes the prompt, no typing required

Two details in there are the kind of thing you only learn after running a small ASR model against real microphones. The isRepeatedTokenHallucination guard exists because small Whisper-class decoders, when fed near-silent audio, latch onto one token and loop it four, five, ten times. The fix is mechanical: if the segment is four or more tokens and every token is identical, drop it. The domain replacement table exists because developers dictate URLs and file extensions constantly, and no tiny model reliably writes ".com" instead of "dot com" on its own. Both problems get worse, not better, with a smaller local model, which is the core reason tiny did not make the cut for live input.

When ggml-tiny.bin is genuinely the right call

None of this means tiny is useless. It is the right tool in a few clear cases:

  • You are fully offline or air-gapped and a hosted streaming service is not an option. A 77.7 MB local model that runs anywhere beats no transcription.
  • You are batch-processing short, clean clips where latency does not matter and you can re-run anything that comes out wrong.
  • You are prototyping the plumbing and want the fastest possible decode while you wire up audio capture, then swap in a larger model later.
  • You are on Apple silicon and pair the weight with ggml-tiny-encoder.mlmodelc.zip so the encoder pass runs on the Neural Engine. That is the one time the zip earns its place.

Putting voice input into a Mac tool?

Twenty minutes on what we learned shipping streaming dictation into a native agent, the parts that broke, and where a local Whisper model still makes sense.

Frequently asked questions

Where do I download ggml-tiny.bin.zip?

You cannot, because it does not exist. The ggerganov/whisper.cpp repository on Hugging Face hosts the tiny model uncompressed as ggml-tiny.bin (77,691,713 bytes). There is no .bin.zip anywhere in that repo. If a third-party mirror, torrent, or course bundle hands you a ggml-tiny.bin.zip, it is just someone who zipped the plain .bin themselves. Prefer the official file so the checksum matches what whisper.cpp expects.

Then what is the .zip file I keep seeing next to the tiny model?

It is ggml-tiny-encoder.mlmodelc.zip (15,037,446 bytes). That is a compiled Apple Core ML model package (.mlmodelc) that has been zipped because a .mlmodelc is actually a directory, not a single file. It is the optional Core ML encoder that lets whisper.cpp run the encoder pass on the Apple Neural Engine. You only need it if you built whisper.cpp with WHISPER_COREML and want the ANE speedup. It is not a substitute for ggml-tiny.bin; you use both together.

How big is ggml-tiny.bin exactly?

As of 2026-06-21 it is 77,691,713 bytes, which display tools round to about 77.7 MB (74.1 MiB). The English-only ggml-tiny.en.bin is 77,704,715 bytes. The 5-bit quantized ggml-tiny-q5_1.bin is 32,152,673 bytes.

Do I unzip ggml-tiny.bin after downloading?

No. ggml-tiny.bin is the raw GGML weight file. You point the whisper.cpp binary at it directly with -m models/ggml-tiny.bin. The only file you ever unzip in this flow is ggml-tiny-encoder.mlmodelc.zip, which expands into a ggml-tiny-encoder.mlmodelc directory that lives next to the .bin.

Is tiny good enough for real-time dictation?

For short, clean, single-speaker clips, often yes. For live dictation where you expect punctuation, domains like .com, code tokens, and no hallucinated loops on silence, tiny struggles. When we built voice input for our Mac agent we did not ship any local Whisper model; we stream to a hosted streaming model instead, because the tiny model's quality and the latency of running it locally did not clear the bar for live typing.

What is the difference between the GGML tiny model and Core ML or WhisperKit?

ggml-tiny.bin is the weight in GGML format for whisper.cpp. The .mlmodelc.zip is an Apple Core ML compilation of the encoder only, used to offload that pass to the Neural Engine while whisper.cpp still drives decoding. WhisperKit is a separate Swift package that ships its own Core ML model bundles and does not use ggml-tiny.bin at all. Three different packagings of the same underlying Whisper checkpoint.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.