download-ggml-model.sh large-v3: How to Download the Full Whisper Large Model

Matthew Diakonov··10 min read

download-ggml-model.sh large-v3

The download-ggml-model.sh script is the standard way to fetch pre-converted GGML model files for whisper.cpp. Running ./models/download-ggml-model.sh large-v3 downloads the full-size Whisper large-v3 model, which is OpenAI's most accurate speech recognition model in GGML format. At 3.1 GB, it is the largest model available through this script, and it delivers the highest transcription accuracy across all languages.

What the Script Does

The script lives in the models/ directory inside the whisper.cpp repository. When you pass large-v3 as the argument, it does the following:

  1. Validates that large-v3 is a recognized model name
  2. Builds the download URL pointing to huggingface.co/ggerganov/whisper.cpp
  3. Downloads ggml-large-v3.bin using curl (or wget as a fallback)
  4. Saves the file into the models/ directory

The result is a single binary file that whisper.cpp can load directly. No Python, no PyTorch, no conversion step required.

download-ggml-model.sh large-v3 Pipelinedownload-ggml-model.shHugging Face CDNggerganov/whisper.cppggml-large-v3.bin~3.1 GB, ready to useScript fetches pre-converted GGML weights directly from Hugging Face

Step-by-Step Setup

# Clone and build whisper.cpp
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
make -j

# Download the large-v3 model
./models/download-ggml-model.sh large-v3

Verify the file downloaded correctly:

ls -lh models/ggml-large-v3.bin
# Expected: approximately 3.1 GB

Test it with the bundled sample:

./main -m models/ggml-large-v3.bin -f samples/jfk.wav

On an M-series Mac, you should see the transcription output within a few seconds.

large-v3 vs large-v3-turbo

The most common question when choosing between these two models is whether the accuracy difference justifies the speed and size trade-off. Here is how they compare on Apple Silicon:

| Property | large-v3 | large-v3-turbo | |---|---|---| | File size | 3.1 GB | 1.5 GB | | Decoder layers | 32 | 4 | | Speed (M2 Pro) | ~3x real-time | ~10x real-time | | WER (English) | 4.2% | 4.5% | | WER (Multilingual avg) | 10.1% | 10.8% | | Memory usage | ~3.3 GB RAM | ~1.6 GB RAM | | Best for | Maximum accuracy, batch processing | Real-time transcription, interactive use |

The turbo variant was distilled from large-v3 by reducing decoder layers from 32 to 4 while keeping the encoder identical. This means the turbo variant "hears" the audio the same way but decodes tokens faster with some accuracy loss.

When to choose large-v3:

  • You are processing pre-recorded audio where speed does not matter
  • You need the absolute best accuracy for professional transcription
  • You are transcribing low-resource languages where the 0.7% WER difference compounds
  • You are building a batch pipeline that runs overnight

When to choose large-v3-turbo:

  • You need real-time or near-real-time transcription
  • You are building voice input for a desktop application
  • Your hardware has limited memory (16 GB Mac with other apps running)
  • English or major-language transcription where accuracy is already excellent

All Available Model Variants

The download-ggml-model.sh script accepts any of these model names:

| Model | Size | Speed (M2 Pro) | Accuracy | Use Case | |---|---|---|---|---| | tiny | 75 MB | ~30x real-time | Low | Quick tests, prototyping | | tiny.en | 75 MB | ~30x real-time | Low (EN only) | English-only testing | | base | 142 MB | ~20x real-time | Fair | Lightweight deployments | | base.en | 142 MB | ~20x real-time | Fair (EN only) | Constrained English use | | small | 466 MB | ~12x real-time | Good | Mid-range hardware | | small.en | 466 MB | ~12x real-time | Good (EN only) | English on limited RAM | | medium | 1.5 GB | ~6x real-time | Very good | Balanced accuracy/speed | | medium.en | 1.5 GB | ~6x real-time | Very good (EN only) | Dedicated English tasks | | large-v3 | 3.1 GB | ~3x real-time | Excellent | Maximum accuracy | | large-v3-turbo | 1.5 GB | ~10x real-time | Near-excellent | Best speed/accuracy ratio |

For batch transcription where you do not need real-time output, large-v3 is the best choice. For anything interactive, large-v3-turbo is usually the better option.

Note

The .en variants (like base.en or medium.en) are trained exclusively on English data. For non-English or mixed-language audio, always use the multilingual versions. The large-v3 model handles 99 languages.

Quantization Options for large-v3

The full ggml-large-v3.bin is stored in f16 (16-bit floating point) format. If 3.1 GB is too large for your system, you can quantize it to reduce size and memory usage:

# Download the full model first
./models/download-ggml-model.sh large-v3

# Quantize to Q5_0 (good balance of size and accuracy)
./quantize models/ggml-large-v3.bin models/ggml-large-v3-q5_0.bin q5_0

# Quantize to Q8_0 (minimal accuracy loss)
./quantize models/ggml-large-v3.bin models/ggml-large-v3-q8_0.bin q8_0

| Format | File Size | RAM Usage | Accuracy Impact | |---|---|---|---| | f16 (default) | 3.1 GB | ~3.3 GB | Baseline | | Q8_0 | ~1.6 GB | ~1.8 GB | Negligible | | Q5_0 | ~1.1 GB | ~1.2 GB | Very slight degradation | | Q4_0 | ~900 MB | ~1.0 GB | Noticeable on difficult audio |

Quantization is particularly useful if you want large-v3 accuracy but have a Mac with only 8 GB of unified memory. The Q5_0 variant brings memory usage down to roughly the same as the unquantized turbo model while keeping most of the accuracy advantage.

Model Size Comparisonlarge-v3 f16 (3.1 GB)large-v3 Q8_0 (1.6 GB)large-v3-turbo (1.5 GB)large-v3 Q5_0 (1.1 GB)medium (1.5 GB)0 GB1 GB2 GB3 GB4 GB

Common Pitfalls

  • Running from the wrong directory. Always run the script from the whisper.cpp root directory: ./models/download-ggml-model.sh large-v3. If you cd models first and run bash download-ggml-model.sh large-v3, the file may end up in a nested models/models/ path.

  • Typo in the model name. Writing large_v3 (underscore) or largev3 (no hyphen) will silently download an HTML error page instead of the model. The resulting file will be a few kilobytes instead of 3.1 GB. Always check the file size after downloading.

  • Not enough disk space. The large-v3 model is 3.1 GB. Check available space with df -h . before running the script. If the download fails partway through, delete the partial file and retry.

  • Confusing large-v3 with large-v3-turbo. These are different models. If you meant to download the smaller, faster turbo variant, use ./models/download-ggml-model.sh large-v3-turbo instead.

  • Corporate firewall blocking Hugging Face. If the download times out or refuses to connect, your network may be blocking huggingface.co. Download the file manually from the Hugging Face web interface and place it at models/ggml-large-v3.bin.

  • Confusing GGML with other formats. The script downloads GGML-format binaries specifically for whisper.cpp. PyTorch .pt files, ONNX models, or Core ML .mlmodelc bundles will not work.

Warning

If whisper.cpp reports "invalid model data" after downloading, the file is likely corrupted or is an HTML error page from Hugging Face. Delete models/ggml-large-v3.bin, check your internet connection, and re-run ./models/download-ggml-model.sh large-v3. The file should be approximately 3.1 GB.

Using the Model After Download

Once ggml-large-v3.bin is in place, you can use it with any whisper.cpp interface:

# Basic transcription
./main -m models/ggml-large-v3.bin -f recording.wav

# With timestamps
./main -m models/ggml-large-v3.bin -f recording.wav -otxt

# Output as JSON
./main -m models/ggml-large-v3.bin -f recording.wav -oj

# Specify language (skip auto-detection for faster processing)
./main -m models/ggml-large-v3.bin -f recording.wav -l ja

# Real-time microphone input on macOS
ffmpeg -f avfoundation -i ":0" -ar 16000 -ac 1 -f wav - | \
  ./main -m models/ggml-large-v3.bin -f -

The model path works the same way across all whisper.cpp bindings, including Python (pywhispercpp), Swift, Go, and Rust wrappers.

Wrapping Up

Running ./models/download-ggml-model.sh large-v3 is the simplest way to get OpenAI's most accurate Whisper model working with whisper.cpp. The 3.1 GB download gives you the best transcription quality available, supporting 99 languages with the lowest word error rate. For batch processing, professional transcription, or cases where accuracy matters more than speed, it is the right choice. If you need real-time performance for interactive voice input, consider the large-v3-turbo variant instead.

Fazm is an open source macOS AI agent that uses local whisper.cpp for voice input. Open source on GitHub.

Related Posts