ggml-base.bin: the file, the size, and the one thing base gets wrong
You searched the exact /resolve/main/ path for ggml-base.bin, so you want one of two things: the actual file, or a straight answer on whether base is the model to build on. This page gives you both. The download is one line. The second half is the part the download guides skip, because I build a voice-first Mac agent and base's real limit is not the one its file size suggests.
ggml-base.bin lives in the ggerganov/whisper.cpp repository on Hugging Face. The direct download is:
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binAs of 2026-06-18 the file is 147,951,465 bytes (~148 MB), sha256 60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe. The viewable model page is the blob view in the same repo. Or let the official script fetch it:
# from inside a cloned whisper.cpp checkout
./models/download-ggml-model.sh base
# -> writes models/ggml-base.bin (~148 MB)
# or fetch it directly (follow the 302 redirect):
curl -L -o ggml-base.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.binon-disk size of ggml-base.bin
parameters in the base model
redirect on /resolve/ to the CDN
base in context, with exact byte counts
Everything below lives in the same whisper.cpp model store. The byte counts were read straight from the Hugging Face x-linked-size header on 2026-06-18, so you can sanity-check a download against them. base is the row most projects quietly standardize on.
| File | Bytes | Size | What it is |
|---|---|---|---|
| ggml-tiny.bin | 77,691,713 | ~77.7 MB | Smallest, fastest, least accurate. The cheap first pass. |
| ggml-base.bin | 147,951,465 | ~148 MB | The default most people settle on. Multilingual. Real-time on Apple Silicon. |
| ggml-base.en.bin | 147,964,211 | ~148 MB | English-only base. A touch better on English at the same size. |
| ggml-small.bin | 487,601,967 | ~488 MB | Noticeably more accurate. Where many stop for serious dictation. |
Quantized base variants (the q5_1 and q8_0 files) exist in the same repo. They shrink the file, not the accuracy ceiling. Check the Files and versions tab before you hardcode a filename, and read the official models README for the canonical size ranking.
What ggml-base.bin actually is, and why it is the default
GGML is the on-disk tensor format used by whisper.cpp, the C/C++ port of OpenAI's Whisper. The ggerganov/whisper.cpp repository is a model store of pre-converted binaries, so you do not need Python, PyTorch, or any conversion step. You download ggml-base.bin, point the whisper.cpp binary at it, and it runs.
base is the second of the five standard checkpoints, about 74 million parameters. People gravitate to it for a simple reason: it is the smallest model that stops obviously mangling common words, while still decoding faster than real time on Apple Silicon. tiny is cheaper but error-prone; small is more accurate but four times the size. base is the comfortable middle, which is exactly why so many tutorials default to it without saying why.
The thing base gets wrong: it writes "dot com", not ".com"
Here is the part nobody downloading this file mentions. base, like every vanilla Whisper checkpoint, was trained to transcribe natural speech. Say "open fazm dot ai slash download" and base gives you exactly that string of words. It has no concept that you meant a URL. Dictate a file path, a shell command, or an email address and you get the spoken form, not the written one. Going from base to small does not fix this; a bigger model spells the words better, it still does not know they were a URL.
This is the layer that matters once you try to turn any Whisper model into a real voice interface, and it sits entirely on top of the model. In Fazm, that layer is a defaultReplacements table in Desktop/Sources/TranscriptionService.swift: a list of spoken-form to written-form rules applied to the raw transcript. A sample:
Spoken form, then what the agent actually types
.com.io.ai@.json.ts.py.swiftThe full table covers common domains (.com, .org, .net, .io, .ai, .dev, .app, .co, .me, .gg), the email/URL @, and code file extensions (.json, .js, .ts, .py, .swift, .css, .html). It is a flat rewrite list, not a model upgrade.
The point is not that Fazm's table is special. It is that any serious use of base for command or code dictation needs something like it, and reaching for a heavier Whisper model will not give it to you. If you are wiring base into a voice flow, budget for the rewrite layer up front. It is cheaper than the model bump you would otherwise reach for, and it fixes a class of errors model size never touches.
Choosing base, honestly
base is a good default. Here is where it lands relative to the models on either side, and the one place it stops being a model-size question at all.
base is the right call when
You want a single offline model that handles most clean speech without thinking about it. base is the smallest checkpoint that stops mangling common words, and it still decodes faster than real time on an M-series chip. For batch transcription, note-taking, or anything where a half-second of latency is invisible, base is the sane default.
reach past base when accuracy bites
Names, jargon, and accented speech are where base starts dropping words. The next honest step is ggml-small.bin (~488 MB), not a quantized base. Quantization shrinks the file; it does not buy you accuracy base was never going to have.
drop below base only under pressure
tiny exists for constrained hardware and wake-word gating. If your machine can hold 148 MB in memory, there is little reason to run tiny for real transcription. base is the floor for output you will actually read.
base alone is not a voice interface
Any vanilla Whisper checkpoint, base included, transcribes speech literally. It has no idea you are dictating a URL, a file path, or a shell command. That is a layer you add on top, not a model size you upgrade to.
Why Fazm does not decode base.bin locally
Fazm is a voice-first agent for macOS: hold a hotkey, talk, and the same Claude Code or Codex agent loop acts on your machine. Voice is the front door, so transcription is on the hot path. You would expect us to ship a local Whisper model like base. We chose not to, for a reason orthogonal to accuracy.
whisper.cpp decodes a window of audio after that window closes. You finish a phrase, the model runs, then text appears. For dictation that pause is fine. For an agent that should start reacting while you are still mid-sentence, the pause is the whole experience. So Fazm streams instead: TranscriptionService.swift opens a WebSocket to a real-time ASR (Deepgram Nova-3) and gets interim transcript segments back mid-utterance, then applies the same defaultReplacements rewrite layer described above. The rewrite layer is the same problem whether you run base locally or stream; the streaming choice is purely about when the words arrive.
That is a real tradeoff, stated plainly: streaming means audio leaves the machine. If your requirement is that it must not, then ggml-base.bin (or a larger local Whisper model) is the correct choice, and this is the page that tells you where to get it and what to bolt on so it reads back ".com" instead of "dot com". We optimized for latency in the agent loop; you may weight privacy higher. Both are defensible.
Wiring base into a voice flow and stuck on the rewrite layer?
Walk through the local-vs-streaming call and the spoken-to-written mapping with the person who shipped it.
ggml-base.bin, answered
What is the exact download URL for ggml-base.bin?
The direct file is https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin. The /resolve/main/ path streams the raw binary; the /blob/main/ path of the same name shows the viewable page with the Files and versions tab. As of 2026-06-18 the file is 147,951,465 bytes (about 148 MB).
Why does /resolve/main/ download the file but /blob/main/ gives me a web page?
On Hugging Face, /blob/ is the human-facing viewer for a file (it renders an HTML page around the file metadata), while /resolve/ returns the actual bytes. If you wget or curl a /blob/ URL you get HTML, not the model. Always use the /resolve/main/ form for downloads, which is exactly the path in this query. Note that /resolve/main/ggml-base.bin currently issues a 302 redirect to a Hugging Face CDN host before the bytes start, so follow redirects (curl -L).
How big is ggml-base.bin and how do I verify the download?
147,951,465 bytes, roughly 148 MB on disk. The Hugging Face x-linked-etag header exposes the sha256 as 60ed5bc3dd14eea856493d334349b405782ddcaf0028d4b5df4088345fba2efe, so you can shasum -a 256 the file you downloaded and compare. The English-only ggml-base.en.bin is almost the same at 147,964,211 bytes.
Do I have to download from Hugging Face, or can the script fetch base?
Either works and gives the identical binary. The official whisper.cpp way is ./models/download-ggml-model.sh base, which constructs the same Hugging Face /resolve/main/ URL and writes models/ggml-base.bin. There is no extra conversion step in either path; the .bin is already in GGML format.
Should I use ggml-base.bin or ggml-base.en.bin?
If you only ever transcribe English, prefer base.en. At the same ~148 MB size it does not spend parameters on other languages, so it tends to be a little more accurate on English audio. If you need any other language, or you mix languages, use the multilingual ggml-base.bin.
Is ggml-base.bin good enough to run a voice command interface?
For accuracy on clean speech, base is a reasonable floor. But base, like every vanilla Whisper checkpoint, transcribes literally: say "open fazm dot ai slash download" and you get the words, not the URL. A usable voice interface needs a rewrite layer that maps spoken forms to written forms, plus a domain vocabulary so command words and app names land. That is a layer on top of the model, not a bigger model.
What does Fazm use instead of running ggml-base.bin locally?
Fazm is a voice-first macOS agent, so transcription sits on the hot path. We stream audio to a real-time ASR (Deepgram Nova-3) rather than decoding local windows, because for an agent the latency of when text arrives matters as much as accuracy. On top of the raw transcript we apply a defaultReplacements table in TranscriptionService.swift that rewrites spoken forms to written forms ("dot com" to ".com", "dot json" to ".json", "at sign" to "@", and more). That is the layer base.bin alone does not give you, and it is independent of which Whisper size you would otherwise pick.
More on Whisper, GGML models, and ASR for a Mac agent
Keep reading
ggml-tiny.bin: where it lives and when not to use it
The 77 MB tiny model: exact download, the silence-hallucination trap, and when tiny is the wrong tool.
download-ggml-model.sh large-v3-turbo
The other end of the size range: the script internals and the large-v3-turbo download.
whisper.cpp Metal on Apple Silicon
How Metal acceleration changes the latency math for local Whisper models on an M-series Mac.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.