Vol. I · No. 18THU, MAY 7, 2026
Source · Community

r/LocalLLaMA

Reddit · COMMUNITY

Last updated May 7, 2026, 3:30 PM

Get faster qwen 3.6 27b

User achieves 50 tokens/sec with Qwen 3.6 27B on RTX 3090 using MTP speculative decoding at 100k context.

··

Follow-up: Trying to make NVIDIA GPUs plug-and-play on Macs. Found hidden RDMA symbols Apple doesn't want you to see — zero-copy GPU memory sharing might already work.

**TL;DR:** My last post about testing TinyGPU attracted some interest. This is the follow-up. The Blackwell card is detected and the driver loads, but NVIDIA's GSP firmware fails to boot through TB5 (known issue, I'm working with tinygrad on it). While debugging that, I went down a rabbit hole and discovered that Apple's RDMA subsystem accepts Metal GPU buffers for zero-copy network transfers — something nobody has documented. I also found hidden `ibv_reg_dmabuf_mr` symbols in Apple's libibverbs that suggest GPUDirect RDMA might be possible on macOS without any kernel modification. Here's eve...

··

What do you use Gemma 4 for?

Community discussion comparing Gemma 4 and Qwen 3.6 model suitability across coding, benchmarks, and agentic workloads.

··

Why run local? Count the money

User quantifies cost savings from running local Qwen-397B with Hermes agent vs. API pricing: 200M tokens in 5 days ≈ $250 saved at API rates.

··

Gemma 4 MTP released

Google releases Gemma 4 multi-token prediction drafters in 4 quantized sizes for local deployment.

··
50 stories