Output performance with different GPUs #1168

aropb · 2025-04-23T08:28:01Z

aropb
Apr 23, 2025

The same server, I connect different GPUs: RTX 4090, A100, H100. The response rate is about the same. The model Qwen2.5-14B-Q5.

Why can this be?
Or is it normal for llama.cpp ?
Has anyone made such comparisons?