download-ggml-model.sh large-v3: How to Download the Full Whisper Large Model

Matthew Diakonov·April 10, 2026·10 min read

whisper ggml large-v3 speech-to-text apple-silicon macos whisper-cpp

The download-ggml-model.sh script is the standard way to fetch pre-converted GGML model files for whisper.cpp. Running ./models/download-ggml-model.sh large-v3 downloads the full-size Whisper large-v3 model, which is OpenAI's most accurate speech recognition model in GGML format. At 3.1 GB, it is the largest model available through this script, and it delivers the highest transcription accuracy across all languages.

What the Script Does

The script lives in the models/ directory inside the whisper.cpp repository. When you pass large-v3 as the argument, it does the following:

Validates that large-v3 is a recognized model name
Builds the download URL pointing to huggingface.co/ggerganov/whisper.cpp
Downloads ggml-large-v3.bin using curl (or wget as a fallback)
Saves the file into the models/ directory

The result is a single binary file that whisper.cpp can load directly. No Python, no PyTorch, no conversion step required.

Step-by-Step Setup

# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make -j

# Download the large-v3 model
./models/download-ggml-model.sh large-v3

Verify the file downloaded correctly:

ls -lh models/ggml-large-v3.bin
# Expected: approximately 3.1 GB

Test it with the bundled sample:

./main -m models/ggml-large-v3.bin -f samples/jfk.wav

On an M-series Mac, you should see the transcription output within a few seconds.

large-v3 vs large-v3-turbo

The most common question when choosing between these two models is whether the accuracy difference justifies the speed and size trade-off. Here is how they compare on Apple Silicon:

Property	large-v3	large-v3-turbo
File size	3.1 GB	1.5 GB
Decoder layers	32	4
Speed (M2 Pro)	~3x real-time	~10x real-time
WER (English)	4.2%	4.5%
WER (Multilingual avg)	10.1%	10.8%
Memory usage	~3.3 GB RAM	~1.6 GB RAM
Best for	Maximum accuracy, batch processing	Real-time transcription, interactive use

The turbo variant was distilled from large-v3 by reducing decoder layers from 32 to 4 while keeping the encoder identical. This means the turbo variant "hears" the audio the same way but decodes tokens faster with some accuracy loss.

When to choose large-v3:

You are processing pre-recorded audio where speed does not matter
You need the absolute best accuracy for professional transcription
You are transcribing low-resource languages where the 0.7% WER difference compounds
You are building a batch pipeline that runs overnight

When to choose large-v3-turbo:

You need real-time or near-real-time transcription
You are building voice input for a desktop application
Your hardware has limited memory (16 GB Mac with other apps running)
English or major-language transcription where accuracy is already excellent

All Available Model Variants

The download-ggml-model.sh script accepts any of these model names:

Model	Size	Speed (M2 Pro)	Accuracy	Use Case
`tiny`	75 MB	~30x real-time	Low	Quick tests, prototyping
`tiny.en`	75 MB	~30x real-time	Low (EN only)	English-only testing
`base`	142 MB	~20x real-time	Fair	Lightweight deployments
`base.en`	142 MB	~20x real-time	Fair (EN only)	Constrained English use
`small`	466 MB	~12x real-time	Good	Mid-range hardware
`small.en`	466 MB	~12x real-time	Good (EN only)	English on limited RAM
`medium`	1.5 GB	~6x real-time	Very good	Balanced accuracy/speed
`medium.en`	1.5 GB	~6x real-time	Very good (EN only)	Dedicated English tasks
`large-v3`	3.1 GB	~3x real-time	Excellent	Maximum accuracy
`large-v3-turbo`	1.5 GB	~10x real-time	Near-excellent	Best speed/accuracy ratio

For batch transcription where you do not need real-time output, large-v3 is the best choice. For anything interactive, large-v3-turbo is usually the better option.

Note

The .en variants (like base.en or medium.en) are trained exclusively on English data. For non-English or mixed-language audio, always use the multilingual versions. The large-v3 model handles 99 languages.

Quantization Options for large-v3

The full ggml-large-v3.bin is stored in f16 (16-bit floating point) format. If 3.1 GB is too large for your system, you can quantize it to reduce size and memory usage:

# Download the full model first
./models/download-ggml-model.sh large-v3

# Quantize to Q5_0 (good balance of size and accuracy)
./quantize models/ggml-large-v3.bin models/ggml-large-v3-q5_0.bin q5_0

# Quantize to Q8_0 (minimal accuracy loss)
./quantize models/ggml-large-v3.bin models/ggml-large-v3-q8_0.bin q8_0

Format	File Size	RAM Usage	Accuracy Impact
f16 (default)	3.1 GB	~3.3 GB	Baseline
Q8_0	~1.6 GB	~1.8 GB	Negligible
Q5_0	~1.1 GB	~1.2 GB	Very slight degradation
Q4_0	~900 MB	~1.0 GB	Noticeable on difficult audio

Quantization is particularly useful if you want large-v3 accuracy but have a Mac with only 8 GB of unified memory. The Q5_0 variant brings memory usage down to roughly the same as the unquantized turbo model while keeping most of the accuracy advantage.

Common Pitfalls

Running from the wrong directory. Always run the script from the whisper.cpp root directory: ./models/download-ggml-model.sh large-v3. If you cd models first and run bash download-ggml-model.sh large-v3, the file may end up in a nested models/models/ path.
Typo in the model name. Writing large_v3 (underscore) or largev3 (no hyphen) will silently download an HTML error page instead of the model. The resulting file will be a few kilobytes instead of 3.1 GB. Always check the file size after downloading.
Not enough disk space. The large-v3 model is 3.1 GB. Check available space with df -h . before running the script. If the download fails partway through, delete the partial file and retry.
Confusing large-v3 with large-v3-turbo. These are different models. If you meant to download the smaller, faster turbo variant, use ./models/download-ggml-model.sh large-v3-turbo instead.
Corporate firewall blocking Hugging Face. If the download times out or refuses to connect, your network may be blocking huggingface.co. Download the file manually from the Hugging Face web interface and place it at models/ggml-large-v3.bin.
Confusing GGML with other formats. The script downloads GGML-format binaries specifically for whisper.cpp. PyTorch .pt files, ONNX models, or Core ML .mlmodelc bundles will not work.

Warning

If whisper.cpp reports "invalid model data" after downloading, the file is likely corrupted or is an HTML error page from Hugging Face. Delete models/ggml-large-v3.bin, check your internet connection, and re-run ./models/download-ggml-model.sh large-v3. The file should be approximately 3.1 GB.

Using the Model After Download

Once ggml-large-v3.bin is in place, you can use it with any whisper.cpp interface:

# Basic transcription
./main -m models/ggml-large-v3.bin -f recording.wav

# With timestamps
./main -m models/ggml-large-v3.bin -f recording.wav -otxt

# Output as JSON
./main -m models/ggml-large-v3.bin -f recording.wav -oj

# Specify language (skip auto-detection for faster processing)
./main -m models/ggml-large-v3.bin -f recording.wav -l ja

# Real-time microphone input on macOS
ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f wav - | \
  ./main -m models/ggml-large-v3.bin -f -

The model path works the same way across all whisper.cpp bindings, including Python (pywhispercpp), Swift, Go, and Rust wrappers.

Wrapping Up

Running ./models/download-ggml-model.sh large-v3 is the simplest way to get OpenAI's most accurate Whisper model working with whisper.cpp. The 3.1 GB download gives you the best transcription quality available, supporting 99 languages with the lowest word error rate. For batch processing, professional transcription, or cases where accuracy matters more than speed, it is the right choice. If you need real-time performance for interactive voice input, consider the large-v3-turbo variant instead.

Fazm is an open source macOS AI agent that uses local whisper.cpp for voice input. Open source on GitHub.

download-ggml-model.sh large-v3: How to Download the Full Whisper Large Model

What the Script Does

Step-by-Step Setup

large-v3 vs large-v3-turbo

All Available Model Variants

Quantization Options for large-v3

Common Pitfalls

Using the Model After Download

Wrapping Up

Related Posts

ggml-large-v3.bin: Complete Guide to Whisper's Largest GGML Model

ggml-large-v3-turbo.bin: The Fast Whisper Model for Real-Time Transcription

download-ggml-model.sh large-v3-turbo: Complete Guide to Downloading Whisper Models

Comments ()

What the Script Does

Step-by-Step Setup

large-v3 vs large-v3-turbo

All Available Model Variants

Quantization Options for large-v3

Common Pitfalls

Using the Model After Download

Wrapping Up

Related Posts

ggml-large-v3.bin: Complete Guide to Whisper's Largest GGML Model

ggml-large-v3-turbo.bin: The Fast Whisper Model for Real-Time Transcription

download-ggml-model.sh large-v3-turbo: Complete Guide to Downloading Whisper Models

Comments (••)

Comments ()