Speculative Decoding
2 articles about speculative decoding.
vLLM Update April 2026: v0.18, v0.19, Gemma 4, and gRPC Serving
·9 min read
Every major vLLM update in April 2026 covered. From v0.18's gRPC serving and GPU speculative decoding to v0.19's Gemma 4 support, async scheduling, and critical security patches.
vllminferencellm-servingapril-2026gemma-4speculative-decodinggrpcopen-source
First Speculative Decoding Across GPU and Neural Engine on Apple Silicon
·2 min read
Running two models on the same Apple Silicon chip - a 1B draft model on the Neural Engine and a larger model on GPU for faster local inference.
speculative-decodingapple-siliconneural-enginelocal-aiperformance
Browse by Topic
Ai Agents (346)Automation (240)Productivity (203)Macos (192)Ai Agent (182)Claude Code (163)Desktop Agent (120)Open Source (106)Developer Tools (104)April 2026 (86)Reliability (83)Accessibility Api (79)Mcp (78)Parallel Agents (75)Desktop Automation (68)Multi Agent (64)Claude (56)Ai Coding (56)Security (54)Llm (51)