Inference
8 articles about inference.
llama.cpp Releases in April 2026: Tensor Parallelism, 1-Bit Quantization, and More
Every major llama.cpp release in April 2026, from b8607 to b8779. Covers tensor parallelism, Q1_0 quantization, Gemma 4 audio support, and AMD MI350X.
Open Source AI Projects: GitHub Releases and Updates, April 2026
Every major open source AI project GitHub release in April 2026: version numbers, breaking changes, the CrewAI v0.114 security advisory, and migration notes for LLaMA 4, vLLM, Ollama, LangChain, ComfyUI, and 20+ more.
Open Source AI Projects Updates April 2026: Mid-Month Status Tracker
Track every major open source AI project update in April 2026. Covers model patches, framework upgrades, inference engine fixes, and community milestones through mid-April.
Open Source AI Releases April 2026: Every Major Launch This Month
A complete guide to every significant open source AI release in April 2026, covering foundation models, agent frameworks, inference tools, and developer SDKs with benchmarks and hardware requirements.
vLLM 0.8.2 Release Date, Changelog, and Upgrade Guide
vLLM 0.8.2 was released on March 25, 2025. This post covers the full changelog, V1 engine memory fix, FP8 KV cache support, and how to upgrade safely.
vLLM Update April 2026: v0.18, v0.19, Gemma 4, and gRPC Serving
Every major vLLM update in April 2026 covered. From v0.18's gRPC serving and GPU speculative decoding to v0.19's Gemma 4 support, async scheduling, and critical security patches.
GTC 2026: Inference Is Eating the World
Inference is a recurring cost, not a one-time expense. Every agent action costs tokens. Minimizing LLM round trips is the key to sustainable agent economics.
Inference Optimization Is a Distraction for AI Agent Builders
Why optimizing API call speed barely matters for AI agents - the real bottleneck is action execution, not model inference.