{"id":1316,"date":"2012-04-09T16:53:07","date_gmt":"2012-04-09T15:53:07","guid":{"rendered":"http:\/\/roboblog.fatal-fury.de\/?p=1316"},"modified":"2012-04-09T16:56:46","modified_gmt":"2012-04-09T15:56:46","slug":"faster-code-%e2%80%93-part-6-%e2%80%93-sprungvorhersage-again","status":"publish","type":"post","link":"http:\/\/roboblog.fatal-fury.de\/?p=1316","title":{"rendered":"Faster Code \u2013 Part 6 \u2013 Sprungvorhersage again"},"content":{"rendered":"<p>Wie im <a href=\"http:\/\/roboblog.fatal-fury.de\/?p=1286\">letzten Post<\/a> angek\u00fcndigt, gibt es jetzt die genauere Analyse meines Tests.<br \/>\nHier erstmal das Programm:<\/p>\n<pre><code>int mymin(int a, int b) __attribute__ ((noinline));\r\nint mymin(int a, int b)\r\n{\r\n  asm (\"\");\r\n  if (b < a)\r\n    return b;\r\n  return a;\r\n}\r\n\r\nint mymin2(int a, int b) __attribute__ ((noinline));\r\nint mymin2(int a, int b)\r\n{\r\n  asm (\"\");\r\n  return  b ^ ((a ^ b) & -(a < b)); \/\/ min(a, b)\r\n}\r\n\r\nint main()\r\n{\r\n    int a = 0;\r\n    int b = 0;\r\n    int c = 0;\r\n    long sum = 0;\r\n    for(int i = 0; i < 1e7; i++) {\r\n        a = rand();\r\n        b = rand();\r\n        c = std::min(a,b);\r\n\/\/        c = mymin1(a,b);\r\n\/\/        c = mymin2(a,b);\r\n        sum+=c;\r\n\/\/        cout << a << \" \" << b << \" \" << c << \"\\n\";\r\n    }\r\n    return 0;\r\n}<\/code><\/pre>\n<p>Wie man sieht, gibt es zwei Funktion die jeweils die kleinere Zahl bestimmen. Einmal durch eine einfache if() Verzweigung und einmal durch eine Rechnung. Die beiden Funktionen sind mit den Attribute noinline [1] und einem inline Assembler Befehl versehen, damit der Compiler sie nicht weg optimiert. Sonst hat valgrind nichts zum messen ;)<br \/>\nEs gibt drei Durchl\u00e4uft. Einmal mit std::min um die \"normale Version\" zu vermessen.<br \/>\nDann meine min Version mit If. Und zum Schluss die Version mit Rechnung.<br \/>\nCompiliert wird immer mit <\/p>\n<pre> g++ -O2  -g -Wall main.cpp<\/pre>\n<p>Bei Valgrind schalte ich die Cache Simulation ab, um die Ausf\u00fchrungszeit etwas zu verk\u00fcrzen<\/p>\n<pre>valgrind --tool=cachegrind --branch-sim=yes --cache-sim=no .\/a.out<\/pre>\n<p><u><strong>Test 1<\/strong><\/u><br \/>\nMit std::min<\/p>\n<pre>==5210== I   refs:      1,821,485,345\r\n==5210== \r\n==5210== Branches:        230,216,253  (210,213,194 cond + 20,003,059 ind)\r\n==5210== Mispredicts:         657,699  (    657,279 cond +        420 ind)\r\n==5210== Mispred rate:            0.2% (        0.3%     +        0.0%   )\r\n\r\n--------------------------------------------------------------------------------\r\n         Ir          Bc     Bcm         Bi Bim  file:function\r\n--------------------------------------------------------------------------------\r\n880,000,000  80,000,000 645,163          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/random_r.c:random_r\r\n    353,545      54,035   3,217          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/elf\/dl-lookup.c:do_lookup_x\r\n    176,933      28,792   2,876      2,144 252  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/elf\/..\/sysdeps\/i386\/dl-machine.h:_dl_relocate_object\r\n    138,235      31,695   1,684          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/string\/strcmp.c:strcmp\r\n    562,727      63,616   1,366          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/elf\/dl-lookup.c:_dl_lookup_symbol_x\r\n\r\n--------------------------------------------------------------------------------\r\n-- Auto-annotated source: \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp\r\n--------------------------------------------------------------------------------\r\n        Ir         Bc Bcm Bi Bim \r\n\r\n-- line 13 ----------------------------------------\r\n         .          .   .  .   .  \r\n         .          .   .  .   .  int mymin2(const int& a, const int& b) __attribute__ ((noinline));\r\n         .          .   .  .   .  int mymin2(const int& a, const int& b)\r\n         .          .   .  .   .  {\r\n         .          .   .  .   .    return  b ^ ((a ^ b) & -(a < b)); \/\/ min(a, b)\r\n         .          .   .  .   .  }\r\n         .          .   .  .   .  \r\n         .          .   .  .   .  int main()\r\n         7          0   0  0   0  {\r\n         .          .   .  .   .      int a = 0;\r\n         .          .   .  .   .      int b = 0;\r\n         .          .   .  .   .      int c = 0;\r\n         .          .   .  .   .      long sum = 0;\r\n80,000,000 10,000,000   5  0   0      for(int i = 0; i < 1e7; i++) {\r\n10,000,000          0   0  0   0          a = rand();\r\n10,000,000          0   0  0   0          b = rand();\r\n         .          .   .  .   .          c = std::min(a,b);\r\n         .          .   .  .   .          \/\/c = mymin(a,b);\r\n         .          .   .  .   .          sum+=c;\r\n         .          .   .  .   .  \/\/        cout << a << \" \" << b << \" \" << c << \"\\n\";\r\n         .          .   .  .   .      }\r\n         .          .   .  .   .      return 0;\r\n        11          0   0  0   0  }\r\n<\/pre>\n<p>Interessant sind hier nur Conditional branches executed (Bc) and conditional branches mispredicted (Bcm).<br \/>\nWir haben also 657,699 falsch vorhergesagte Spr\u00fcnge und 645,163 gehen f\u00fcr die Random Funktion drauf. Der Rest verteilt sich auf diverse Systemfunktionen. Aber es gibt keine falschen Spr\u00fcnge f\u00fcr std::min. Der Compiler ist gut ;)<br \/>\nF\u00fcnf gehen f\u00fcr die For Schleife drauf. Hier erkennt die Sprungvorhersage schnell, dass es eine Schleife ist und nimmt immer den richtigen Weg.<\/p>\n<p><u><strong>Test 2<\/strong><\/u><br \/>\nMit mymin; if()<\/p>\n<pre>==10996== I   refs:      1,946,482,984\r\n==10996== \r\n==10996== Branches:        240,216,256  (220,213,197 cond + 20,003,059 ind)\r\n==10996== Mispredicts:       5,658,603  (  5,658,183 cond +        420 ind)\r\n==10996== Mispred rate:            2.3% (        2.5%     +        0.0%   )\r\n\r\n--------------------------------------------------------------------------------\r\n         Ir          Bc       Bcm         Bi Bim  file:function\r\n--------------------------------------------------------------------------------\r\n880,000,000  80,000,000   645,163          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/random_r.c:random_r\r\n520,000,000 120,000,000        15          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/random.c:random\r\n180,000,000           0         0          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/rand.c:rand\r\n140,000,015  10,000,000         9          0   0  \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp:main\r\n120,001,144           0         0          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/string\/..\/sysdeps\/i386\/i686\/multiarch\/strcspn.S:???\r\n 84,997,627  10,000,000 5,000,895          0   0  \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp:mymin(int, int)\r\n\r\n--------------------------------------------------------------------------------\r\n-- Auto-annotated source: \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp\r\n--------------------------------------------------------------------------------\r\n        Ir         Bc       Bcm Bi Bim \r\n\r\n         .          .         .  .   .  #include <iostream>\r\n         .          .         .  .   .  #include <algorithm>\r\n         .          .         .  .   .  using namespace std;\r\n         .          .         .  .   .  \r\n         .          .         .  .   .  \r\n         .          .         .  .   .  int mymin(int a, int b) __attribute__ ((noinline));\r\n         .          .         .  .   .  int mymin(int a, int b)\r\n30,000,000          0         0  0   0  {\r\n34,997,627 10,000,000 5,000,895  0   0    asm (\"\");\r\n         .          .         .  .   .    if (b < a)\r\n         .          .         .  .   .      return b;\r\n         .          .         .  .   .    return a;\r\n20,000,000          0         0  0   0  }\r\n         .          .         .  .   .  \r\n         .          .         .  .   .  int mymin2(int a, int b) __attribute__ ((noinline));\r\n         .          .         .  .   .  int mymin2(int a, int b)\r\n         .          .         .  .   .  {\r\n         .          .         .  .   .    asm (\"\");\r\n         .          .         .  .   .    return  b ^ ((a ^ b) & -(a < b)); \/\/ min(a, b)\r\n         .          .         .  .   .  }\r\n         .          .         .  .   .  \r\n         .          .         .  .   .  int main()\r\n         8          0         0  0   0  {\r\n         .          .         .  .   .      int a = 0;\r\n         .          .         .  .   .      int b = 0;\r\n         .          .         .  .   .      int c = 0;\r\n         .          .         .  .   .      long sum = 0;\r\n80,000,000 10,000,000         9  0   0      for(int i = 0; i < 1e7; i++) {\r\n20,000,000          0         0  0   0          a = rand();\r\n10,000,000          0         0  0   0          b = rand();\r\n         .          .         .  .   .  \/\/        c = std::min(a,b);\r\n30,000,000          0         0  0   0          c = mymin(a,b);\r\n         .          .         .  .   .  \/\/        c = mymin2(a,b);\r\n         .          .         .  .   .          sum+=c;\r\n         .          .         .  .   .  \/\/        cout << a << \" \" << b << \" \" << c << \"\\n\";\r\n         .          .         .  .   .      }\r\n         .          .         .  .   .      return 0;\r\n        12          0         0  0   0  }\r\n<\/pre>\n<p>Jetzt sieht die Sache schon anders aus. Wir haben f\u00fcnf Millionen falsch vorhergesagte Spr\u00fcnge. Das macht 50%. Wie es zu erwarten war. <\/p>\n<p><u><strong>Test 3<\/strong><\/u><br \/>\nMit der Rechnung:<\/p>\n<pre>==11057== I   refs:      1,991,485,357\r\n==11057== \r\n==11057== Branches:        230,216,256  (210,213,197 cond + 20,003,059 ind)\r\n==11057== Mispredicts:         657,699  (    657,279 cond +        420 ind)\r\n==11057== Mispred rate:            0.2% (        0.3%     +        0.0%   )\r\n\r\n--------------------------------------------------------------------------------\r\n         Ir          Bc     Bcm         Bi Bim  file:function\r\n--------------------------------------------------------------------------------\r\n880,000,000  80,000,000 645,163          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/random_r.c:random_r\r\n520,000,000 120,000,000      11          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/random.c:random\r\n180,000,000           0       0          0   0  \/build\/buildd-eglibc_2.11.2-10-i386-GapcyD\/eglibc-2.11.2\/stdlib\/rand.c:rand\r\n140,000,015  10,000,000       5          0   0  \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp:main\r\n130,000,000           0       0          0   0  \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp:mymin2(int, int)\r\n\r\n--------------------------------------------------------------------------------\r\n-- Auto-annotated source: \/home\/kater\/qtcode\/plaintest\/plaintest\/main.cpp\r\n--------------------------------------------------------------------------------\r\n        Ir         Bc Bcm Bi Bim \r\n\r\n-- line 9 ----------------------------------------\r\n         .          .   .  .   .    asm (\"\");\r\n         .          .   .  .   .    if (b < a)\r\n         .          .   .  .   .      return b;\r\n         .          .   .  .   .    return a;\r\n         .          .   .  .   .  }\r\n         .          .   .  .   .  \r\n         .          .   .  .   .  int mymin2(int a, int b) __attribute__ ((noinline));\r\n         .          .   .  .   .  int mymin2(int a, int b)\r\n40,000,000          0   0  0   0  {\r\n70,000,000          0   0  0   0    asm (\"\");\r\n         .          .   .  .   .    return  b ^ ((a ^ b) & -(a < b)); \/\/ min(a, b)\r\n20,000,000          0   0  0   0  }\r\n         .          .   .  .   .  \r\n         .          .   .  .   .  int main()\r\n         8          0   0  0   0  {\r\n         .          .   .  .   .      int a = 0;\r\n         .          .   .  .   .      int b = 0;\r\n         .          .   .  .   .      int c = 0;\r\n         .          .   .  .   .      long sum = 0;\r\n80,000,000 10,000,000   5  0   0      for(int i = 0; i < 1e7; i++) {\r\n20,000,000          0   0  0   0          a = rand();\r\n10,000,000          0   0  0   0          b = rand();\r\n         .          .   .  .   .  \/\/        c = std::min(a,b);\r\n         .          .   .  .   .  \/\/        c = mymin(a,b);\r\n30,000,000          0   0  0   0          c = mymin2(a,b);\r\n         .          .   .  .   .          sum+=c;\r\n         .          .   .  .   .  \/\/        cout << a << \" \" << b << \" \" << c << \"\\n\";\r\n         .          .   .  .   .      }\r\n         .          .   .  .   .      return 0;\r\n        12          0   0  0   0  }\r\n<\/pre>\n<p>Wie zu erwarten war gibt es keine f\u00fcnf Millionen falsch vorhergesagten Spr\u00fcnge. Wir sind so gut wie die Compiler Optimierung. Gebracht hats also nix ;)<br \/>\nMan br\u00e4uchte mal ein realen Anwendungsfall.<br \/>\nAch ein Nachteil hat die Methode zur Berechnung der kleinsten Zahl: Man kann keine Reference auf die kleinste Zahl zur\u00fcck geben. Kein Plan wie der Compiler das macht. It's magic.<\/p>\n<p>[1] <a href=\"http:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Function-Attributes.html#index-g_t_0040code_007bnoinline_007d-function-attribute-2561\">http:\/\/gcc.gnu.org\/onlinedocs\/gcc\/Function-Attributes.html#index-g_t_0040code_007bnoinline_007d-function-attribute-2561<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Wie im letzten Post angek\u00fcndigt, gibt es jetzt die genauere Analyse meines Tests. Hier erstmal das Programm: int mymin(int a, int b) __attribute__ ((noinline)); int mymin(int a, int b) { asm (\"\"); if (b < a) return b; return a; } int mymin2(int a, int b) __attribute__ ((noinline)); int mymin2(int a, int b) { asm [&hellip;]\n<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4],"tags":[14,17,31,18],"class_list":["post-1316","post","type-post","status-publish","format-standard","hentry","category-allgemein","tag-c","tag-cpp","tag-faster-code","tag-linux"],"_links":{"self":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/posts\/1316","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1316"}],"version-history":[{"count":14,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/posts\/1316\/revisions"}],"predecessor-version":[{"id":1330,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/posts\/1316\/revisions\/1330"}],"wp:attachment":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1316"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1316"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}