Are Benchmarks a Big Deal? Should You Care?

In most, if not all phones/computer reviews, there’s almost always a “benchmark moment”. Most publications do it, and we do it as well because benchmark provides interesting information about the product you may end up buying. At the same time, it’s good to know what they mean and how to think of them if you want to make the best of out your budget.

Benchmarks’ original role

Benchmarks are programs designed to either stress a specific piece of hardware/software, or to simulate a real-world workload (but is never 100% realistic).

The “stress-tests” are synthetic benchmarks because they don’t represent a real-world situation. In fact, they may even create an abnormal situation to push something to the limit, whether it is hardware or driver.

For smartphones, Geekbench and most CPU tests would fit in this category, and that’s particularly true for any multi-thread (aka multi-core) benchmarks. The results show a hypothetical, but highly unlikely scenario.

Other benchmarks that try to emulate a real-world situation are usually a compilation of tasks (file loading / computing / light graphics / user interface smoothness) that generate individual scores, which are then converted to a global score.

Basemark OS II could be one of those, so is PCMark (for Android). They offer a remote reflection of what the average performance is, but alone they don’t offer a visibility on individual components performance that is as good as the stress tests.

With both types of benchmarks, reviewers can piece together an approximative image of what the performance should look like, and the rest of the review should tell you if the performance can be perceived (felt) or not.

Difference between Synthetic and Perceived performance

We just explained what synthetic performance was about, but perceived performance is what you’re after. There’s less incentive to pay for additional performance if you cannot see/feel it.

Obviously, perceived performance has “something” to do with synthetic scores. The more powerful the hardware is and the more “potential” there is for perceived performance. For sure, a wimpy hardware will be limited faster.

But to take a simple example, the way user interfaces are programmed will make their performance vary significantly on the same hardware. Android used to be plagued with slow user interface compared to iOS and Windows handsets that both ran at a smooth 60 FPS.

UI lag was so bad that Google set a goal for itself to tweak the software until the interface was “butter smooth” — this was in 2011. While Google has a great interface speed now, other OEMs may not have done as well with their own UI, despite using the same hardware.

Perceived performance can also manifest itself in slow loading apps (somewhat measurable with an IO synthetic test), hang-ups due to network activity and general slowness (often when too many apps are installed).

It is very important to realize that because synthetic tests are often based on unrealistic scenarios, mild differences in benchmark scores typically aren’t important. The overwhelming majority of people’s interaction with a smart device is via the UI. This is not a heavy computing task, and it doesn’t stress the hardware (or should not!)

When synthetic tests should sway a purchase

There are cases when synthetic or somewhat simulated workloads should affect a purchase: when your real-world computing workload is similar enough to the benchmark, that you clearly see how much you gain in upgrading. This is true for things like Gaming (FPS for a specific game), Video Encoding, heavy Photo Processing and tasks that are repetitive, intensive and well-defined.

Benchmarks are useful… to a point

Benchmarks are great to define which “class” of performance a device belongs to. To make it simple, you’ve got low-end, mid-range and high-end. These are three class of performance within which you will “feel” the difference when going from one to the other.

"GREAT TO DEFINE WHICH 'CLASS' OF PERFORMANCE A DEVICE BELONGS TO"

Typically, low-end devices are a bit sluggish and don’t support multi-tasking very well because there’s very little memory, but they are OK for basic usage, and they are cheap. Mid-range devices are more comfortable and designed to be the best tradeoff between what the user “wants” and how much they can pay. High-end devices are the best technology can offer, if you don’t mind paying more.

Within those categories, there are mild differences in benchmarks, but the differences in computing aspect alone usually are less important than other factors such as industrial design (look, weight, etc), battery size, and specific selling points.

Conclusion

Benchmarks provide a general direction that shows more or less where your device stands from a “computing performance” standpoint. While this is important, it is only one of the many factors that decide if the device is great or not.

If you learn to put in perspective what benchmarks are telling you, with how it could improve your user experience you can pick the best product for the price. If you’re not technically savvy, the solution is just to look at it, but don’t take the scores too seriously– and read the rest of the review.

Filed in Cellphones. Read more about Benchmarks.