nbench on PowerPC

As tests and benchmarks are my daily work, I was wondering what would be the results at home on my PowerPC machines. From the original description (see nbench website) : "These are algorithm level tests, benchmarks designed to expose the capabilities of a system's CPU, FPU, and memory system."

nbench is an old benchmark that gives indicative results on various kind of operations. Comparing CPUs with it is not always fair because it does not uses specific compilation options that could help some CPUs to give better results. So it is mandatory to be really careful analyzing benchmark results.

It is important to know with which goals it was written and how it was written. Maybe the code is not good for all CPUs ... what does not mean worse results tell that a CPU is slower. With such benchmarks, you can certainly compare compilers, see if results seem right for close hardware, ... At work, we use it to compare Linux in both versions, native and para-virtualized.

It would be interesting to have benchmarks that would use a nominal test and another one for each target you want to test, taking the best of each CPU. For example, PPC are really good using bits, what could be not visible depending on how benchmarks related to bits are written.

Files

Results

Signes + and - were added in columns below to show who is the fastest / slowest for each test.

Processor     :750GX 800 MHz:970FX 1.8 GHz:7447A 1.4 GHz:7410 400 MHz :750 600 MHz  :750FX 800 MHz:440ep 667 MHz:440ep 553 MHz:5200 396 MHz :5200 396 MHz :
Machine       :MicroAOne G3 :iMac G5      :MacMini G4   :Mac G4       :Pegasos 1    :AmigaOne XE  :Sam440ep     :Sam440ep     :Efika (G2_LE):Efika (G2_LE):
System        :AmigaOS 4.1  :Linux 2.6.23 :Linux 2.6.23 :MacOS X 10.2 :MorphOS 1.4.5:AmigaOS 4.x  :             :             :Linux 2.6.19 :MorphOS 2.0  :
              :             :YellowDog 6.1:YellowDog 6.1:             :             :             :             :             :Debian 4.1.1 :             :
Compiler      :GCC 4.0.2    :GCC 4.1.1    :GCC 4.1.1    :GCC 3.2.x    :GCC 2.95.3   :GCC 4.0.2    :GCC 4.0.2    :GGC 4.0.02   :GCC 4.1.2    :GCC 2.95.3   :
L1 cache      :32 KB        :Unknown      :Unknown      :             :32 KB        :32 KB        :32 KB        :32 KB        :             :16 KB / 16 KB:
L2 cache      :1 MB         :512 KB unif. :512 KB unif. :             :512KB/512KB  :512 KB       :             :             :             :None         :
Bogomips      :             :             :             :             :             :             :             :             :       65.53 :             :

NUMERIC SORT  :      427.31 :      355.92 :+      857.2 :      231.12 :      237.12 :      437.13 :      311.67 :      252.32 :-     163.05 :      122.38 :
STRING SORT   :      15.392 :       49.88 :+     64.443 :      57.019 :      28.986 :      15.615 :      10.622 :-     8.5804 :      17.163 :      13.976 :
BITFIELD      :  1.3193e+08 :  8.2181e+07 :+ 2.0421e+08 :- 6.0211e+07 :   8.554e+07 :  1.3372e+08 :  1.0995e+08 :   8.844e+07 :  6.9461e+07 :  4.7622e+07 :
FP EMULATION  :      62.665 :      70.732 :+     155.14 :      20.689 :      31.447 :      63.787 :       33.59 :-     26.981 :      40.704 :      18.795 :
FOURIER       :        5681 :        4971 :        7542 :      4163.5 :+      11625 :      5799.8 :      4264.3 :      3417.2 :-     1818.4 :      6164.6 :
ASSIGNMENT    :      10.188 :      7.8253 :+     16.926 :      3.1977 :      4.9506 :      10.383 :      3.2279 :      2.9283 :-     1.3203 :      1.2802 :
IDEA          :      1360.5 :      1300.7 :+     2704.9 :      802.14 :      1267.4 :      1379.8 :      960.28 :      768.85 :-     578.38 :      612.59 :
HUFFMAN       :      913.43 :      536.57 :+     1393.9 :      356.47 :       444.8 :      926.23 :      507.86 :       406.9 :-     366.09 :      252.77 :
NEURAL NET    :       6.517 :      11.476 :+     13.497 :       4.793 :      6.1968 :       6.608 :      4.8575 :      3.8885 :-     2.8463 :      3.3334 :
LU DECOMP.    :      195.08 :       363.6 :+     468.12 :       156.6 :      135.56 :      201.19 :       77.92 :      69.307 :-     43.211 :      43.009 :

MEMORY INDEX  :       3.699 :       4.281 :+      8.167 :       2.994 :       3.108 :       3.757 :       2.097 :       1.758 :-      1.567 :       1.277 :
INTEGER INDEX :       5.944 :       5.066 :+     11.710 :-      2.676 :       3.543 :       6.046 :       3.720 :       3.543 :       2.696 :       1.911 :
FP INDEX      :       4.886 :       6.945 :+      9.163 :       3.695 :       5.416 :       4.993 :       2.965 :       2.459 :-      1.534 :       2.426 :

Quick comments

First, I have to remind that it is very hard to analyze benchmark results. You have to know what is done in the benchmark and what are its goals. So, my first comments about this large range of hardware are :

Conclusion

Some compilers options were changed without seeing visible changes in results. Anyway, compiler generation can strongly show their optimization levels.

About nbench, even on 10 different cases using the same architecture, it seems to be very hard to know if this benchmark is relevant. It is 10 years odl, source code is hard to read and documentation is really poor. It is difficult to understand what is the goal of each test and which part of the hardware it uses. For example, "Numeric sort" is labelled "Generic integer performance" : is it only integer computing or does it also uses cache ? And how ?

For me, to measure capabilities of a CPU, tests should focus on more simple tests. And for global performance, we could use common applications but ... it will show their implementation, the compiler capabilities and manual optimizations, etc. So this won't be a CPU benchmarks anymore.

To do