So people are always on about whether they should upgrade their CPU or their GPU first. Putting aside that many of these people also lack enough RAM for modern gaming (16GB is my personal minimum), I decided to put it to the test, using examples from an actual community thread that has since been put to pasture.
In this test, I’m comparing combinations of the following hardware:
nVidia GTX 750Ti; a common budget-level card *and* nVidia GTX 1070; an enthusiast-level card from the newest generation (at time of writing). Not quite the fastest card, the GTX 1080 currently holds that throne. Both cards kept at stock frequencies.
Intel Core i7-4790K “Devil’s Canyon”, the overclocker’s wet dream from the Haswell/Refresh series, operating on 4 cores / 8 threads, at stock clocks of 4.0-4.4GHz, depending on core load.
Intel Core i3-4130, a budget-to-mid level CPU, operating on 2 cores / 4 threads at a fixed maximum clock of 3.4GHz. I do not have one of these CPU’s, so I emulated it by disabling 2 cores on my 4790K, as well as dropping and fixing the clock to 3.4GHz and limiting the max TDP to 54W, to also mimic power consumption of said chip.
Note: Both CPU’s are Haswell, and therefore operate on similar voltages, with similar (if not identical) memory and PCIe controllers. Any unintended side-effects from underclocking the 4790K should be minimal.
I figured an industry standard gaming benchmark would be more than sufficient to show differences and performance gained per component, and chose 3DMark Fire Strike, on default settings as a test across all combinations, and left verification on as well, to confirm valid scores. I tested not only with stock clocks and the i3-4130 emulation, but also with my CPU and GPU both overclocked, as I usually run, just to throw another score set in there. Keep in mind that the CPU usage NEVER hit 100% on any configuration except during the physics test itself. Even during combined tests, CPU usage did not exceed 70% or so. 3DMark does not track this, and I missed screen capturing details from the Intel XTU program.
My baseline score, using stock clocks on the GTX 1070 and the 4790K, ended up at 14,143. It ran an average of 82.34 and 68.62 FPS in graphics tests 1 and 2 respectively, 37.75 FPS in the physics test, and 32.02 in the combined test.
Notice that the GPU usage stays at or nearly at 100%, which shows that there is plenty of time the CPU is spending handing data off to the GPU.
Now: on to comparing “upgrades” from the original post configuration, which was a 750Ti alongside an i3-4130. The run using the emulated 4130 and actual 750Ti gave me a final result of 3,745, with graphics tests average frame rate of 20.41 and 17.06, physics of 15.27, and a combined of (ouch) 7.71.
It’s interesting to note here that there’s a very similar trend in GPU usage vs. overall frame rates as seen in my baseline test. On this configuration, the CPU is more than capable of feeding enough data into the GPU to keep up its utilization. The low score here is simply due to slower hardware, and not a mismatch. As in other tests, the CPU did not hit 100% during graphics or combined test, so there was even room left over.
Upgrade Scenario 1: CPU
There were many comments about the CPU being more of a problem than the GPU, and that the CPU should be upgraded first.
This one made me giggle… 750Ti gets outperformed (read: ass handed to it) by a mobile Quadro M1000M card in my Precision Mobile Workstation, and they’re just about the same chip. Best part is, the Quadro’s drivers really aren’t at all optimized for DirectX and gaming stuff. Seriously. Not to be distracted, here’s where we ended up:
Going all balls and skipping from a 4130 to a 4790K in CPU terms shows a rather interesting result. A score of 4,021 with frame rates for graphics tests of 20.45 and 17.02, 37.44 in physics test, and 7.70 in the combined test.
The difference in frame rates comes to:
+0.04 FPS in both graphics tests, roughly +12.15 FPS in the physics test, since that’s purely CPU bound, and interestingly enough, -0.01 FPS in the combined test. We’re talking less than a 0.2% “gain” in frame rates, even in the combined test, which is so far inside the margin of error it’s best to just call it a straight tie.
CPU UPGRADE ONLY: No noticeable gaming difference.
Upgrade Scenario 2: GPU
Beating a CPU upgrade by a slight margin in popularity was a GPU upgrade.
I don’t have a GTX 900-series card to test with, nor a low-level 1000-series card, so it went down to my GTX 1070. While outside the budget of the original post, it’s good enough for this test. Keeping the emulated 4130 specs, but swapping back in the GTX 1070 yielded us quite an improvement. A score of 10,056; frame rates in the graphics tests of 91.69 and 77.85, physics test of 15.08, and a combined test frame rate of 15.97 FPS. Pull out your sack, kick back, and enjoy the gaming…
Now I know the first thing that’s going to spill here: “but look at the GPU load, it’s tanking, the CPU is clearly bottlenecking.” That is true, it was. The combined test increases the physics simulation difficulty as it goes on, loading the CPU harder and harder throughout the test. HOWEVER, the combined test frame rate, even when the CPU was fully loaded and the controller saturated, was still more than double that of either of the previous tests. That’s the dead giveaway to what you’d expect in a game, if even that demanding, since much of the physics is actually handed off to the GPU, especially with nVidia cards and games that utilize PhysX and the CUDA compute cores.
Upgrade the GPU, hands down. The performance difference in going from a GTX 750Ti to a GTX 1070 ranges near the 205-450% range based on these numbers. Upgrading CPU got, in a practical sense, no gains. Stepping up from there, one more move and completing an upgrade chain to a 4790K, for example, with the 1070 already having been upgraded yields roughly another 200% increase in “practical load” performance from the combined test, going from 15.97 FPS to 32.02 FPS, mostly because it could bring up the physics better due to considerably higher clock speeds and twice as many available cores/threads.