Most people seem obsessed with token generation speed, but isn’t prefill the real bottleneck? Am I missing something?
Reddit discussion argues prefill latency is underemphasized vs. token generation speed in local LLM benchmarking and optimization focus.