{"id":3546,"date":"2018-06-29T13:21:42","date_gmt":"2018-06-29T12:21:42","guid":{"rendered":"http:\/\/roboblog.fatal-fury.de\/?page_id=3546"},"modified":"2021-09-18T09:20:23","modified_gmt":"2021-09-18T08:20:23","slug":"c-julia-fortran","status":"publish","type":"page","link":"http:\/\/roboblog.fatal-fury.de\/?page_id=3546","title":{"rendered":"C++ Julia FORTRAN RUST"},"content":{"rendered":"<p><font color=\"red\">Page under construction<\/font><\/p>\n<p>General statements:<br \/>\nCompiler: GNU GCC 8.1<br \/>\nFlags: -O1 --save-temps and -funroll-loops for FORTRAN<\/p>\n<p>The C++ Point2D Type is simply a struct inherit from a SIMDvalarray to use the benefits of SSE, AVX.<\/p>\n<p>One can not use something like a Point2D type in FORTRAN. One can program it, but not use. Because it is not possible to inline small function from other modules in FORTRAN. To pay a function call every time is not an option. Hence I use a variable size array for the two dimensions of Point2D and give the dimension size explicit to the compiler everytime a point2D is used. With this it is possible to use loop unrolling, since the loop size is now know at compile time. This results in a fair comparision of the assembler code to the C++ version.<\/p>\n<p>Yes, using optimization level 2 can make some code better. But not in this trivial examples. No, level 3 dosen't give any better results for this examples.<\/p>\n<h1>Element Center<\/h1>\n<p><strong>C++<\/strong><br \/>\n as member function<br \/>\n\"mesh\" and \"IE\" are const and can't be modified.<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nauto Mesh::elementCenter(const int IE) const {\r\n  return (XY&#x5B;INE&#x5B;IE]&#x5B;0]] + XY&#x5B;INE&#x5B;IE]&#x5B;1]] + XY&#x5B;INE&#x5B;IE]&#x5B;2]]) \/ 3;\r\n}\r\n<\/pre>\n<p><strong>Julia<\/strong><br \/>\n as free function<br \/>\n<font color=\"red\">constness? performance?<br \/>\nPay attention to the dot by the division.<\/font><\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nfunction elementCenter(mesh::Mesh, IE::Int32)\r\n  Elem = mesh.INE&#x5B;:,IE]\r\n  tmp =  (mesh.XY&#x5B;Elem&#x5B;1]] + mesh.XY&#x5B;Elem&#x5B;2]] + mesh.XY&#x5B;Elem&#x5B;3]]) .\/ 3.0\r\n  return tmp\r\nend\r\n<\/pre>\n<p><strong>FORTRAN<\/strong><br \/>\n as member function (and with a few small changes as free function)<br \/>\n\"this\" the mesh and \"IE\" are const and can't be modified.<br \/>\nDon't use sum() for performance reasons.<br \/>\n<font color=\"red\">The first dimension of array XY and INE is not known at compile time<\/font><\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nfunction elementCenter(this, IE) result(center)\r\n  implicit none\r\n  class(Mesh_t), intent(in) :: this\r\n  integer, intent(in) :: IE\r\n  real(8) :: center(2)\r\n  center = (this%XY(1:2,this%INE(1,IE)) + this%XY(1:2,this%INE(2,IE)) + this%XY(1:2,this%INE(3,IE))) \/ 3\r\nend function\r\n<\/pre>\n<p><strong>Rust<\/strong><br \/>\nIts my first Rust code. Could be done better.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\npub struct Point2D {\r\n    xy : &#x5B;f64;2]\r\n}\r\n\r\npub struct Mesh {\r\n    xy : Vec&lt;Point2D&gt;,\r\n    INE : Vec&lt;&#x5B;usize;3]&gt;\r\n}\r\n\r\nimpl Mesh {\r\n    pub fn element_center(&amp;self, IE : usize) -&gt; Point2D {\r\n      return Point2D {xy:&#x5B; (self.xy&#x5B;self.INE&#x5B;IE]&#x5B;0]].xy&#x5B;0] + self.xy&#x5B;self.INE&#x5B;IE]&#x5B;1]].xy&#x5B;0] + self.xy&#x5B;self.INE&#x5B;IE]&#x5B;2]].xy&#x5B;0]) \/ 3.0 , (self.xy&#x5B;self.INE&#x5B;IE]&#x5B;0]].xy&#x5B;1] + self.xy&#x5B;self.INE&#x5B;IE]&#x5B;1]].xy&#x5B;1] + self.xy&#x5B;self.INE&#x5B;IE]&#x5B;2]].xy&#x5B;1]) \/ 3.0  ]};\r\n    }\r\n}\r\n<\/pre>\n<p>The assembly output show the differences. There is a lot more assembler instructions for FORTRAN since it don't use packed SSE instruction, and do some extra offset calculation I guess.<\/p>\n<table>\n<tr>\n<td>\nC++<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nMesh::elementCenter(int) const:\r\n  movslq %esi, %rsi\r\n  leaq (%rsi,%rsi,2), %rdx\r\n  movq 24(%rdi), %rax\r\n  leaq (%rax,%rdx,4), %rax\r\n  movq (%rdi), %rdx          # calc INE offsets\r\n  movslq (%rax), %rsi\r\n  salq $4, %rsi\r\n  movslq 4(%rax), %rcx\r\n  salq $4, %rcx\r\n  movapd (%rdx,%rsi), %xmm0  # xmm0 = p0\r\n  addpd (%rdx,%rcx), %xmm0   # xmm0 += p1\r\n  movslq 8(%rax), %rax\r\n  salq $4, %rax\r\n  addpd (%rdx,%rax), %xmm0   # xmm0 += p2\r\n  divpd .LC0(%rip), %xmm0    # xmm0 \/= 3\r\n  ret                        # return xmm0 as a new Point2D \r\n.LC0:\r\n  .long 0\r\n  .long 1074266112\r\n  .long 0\r\n  .long 1074266112\r\n<\/pre>\n<\/td>\n<td>\nFORTRAN<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n_testmodule_MOD_elementcenter:\r\n\tpushq\t%rbx\r\n\tmovq\t24(%rdi), %rax\r\n\ttestq\t%rax, %rax\r\n\tmovl\t$1, %r8d\r\n\tcmovne\t%rax, %r8\r\n\tmovq\t(%rdi), %r9\r\n\tmovq\t(%rsi), %rcx\r\n\tmovq\t8(%rsi), %r11\r\n\tmovslq\t(%rdx), %rdx\r\n\timulq\t120(%rsi), %rdx\r\n\taddq\t80(%rsi), %rdx\r\n\tmovq\t72(%rsi), %rbx\r\n\tleaq\t(%rbx,%rdx,4), %r10   # calc INE offsets. lot of stuff\r\n\tmovq\t24(%rsi), %rax\r\n\tmovq\t48(%rsi), %rdi\r\n\tmovslq\t4(%r10), %rbx\r\n\timulq\t%rdi, %rbx\r\n\taddq\t%r11, %rbx\r\n\tmovslq\t8(%r10), %rdx\r\n\timulq\t%rdi, %rdx\r\n\taddq\t%r11, %rdx\r\n\tmovslq\t12(%r10), %rsi\r\n\timulq\t%rsi, %rdi\r\n\tleaq\t(%rdi,%r11), %r11\r\n\tleaq\t(%rax,%rbx), %r10\r\n\tleaq\t(%rax,%rdx), %rdi\r\n\tmovsd\t(%rcx,%r10,8), %xmm0   # xmm0 = p0.x\r\n\taddsd\t(%rcx,%rdi,8), %xmm0   # xmm0 += p1.x\r\n\tleaq\t(%rax,%r11), %rsi\r\n\taddsd\t(%rcx,%rsi,8), %xmm0   # xmm0 += p2.x\r\n\tmovsd\t.LC0(%rip), %xmm1      # xmm1 = 3\r\n\tdivsd\t%xmm1, %xmm0           # xmm0 \/= xmm1\r\n\tmovsd\t%xmm0, (%r9)           # write x result back to center array\r\n\taddq\t%rax, %rax\r\n\tleaq\t(%rbx,%rax), %rbx      # calc offset\r\n\taddq\t%rax, %rdx\r\n\tmovsd\t(%rcx,%rbx,8), %xmm2   # xmm2 = p0.y\r\n\taddsd\t(%rcx,%rdx,8), %xmm2   # xmm2 += p1.y\r\n\taddq\t%r11, %rax\r\n\taddsd\t(%rcx,%rax,8), %xmm2   # xmm2 += p2.y\r\n\tdivsd\t%xmm1, %xmm2           # xmm2 \/= xmm1\r\n\tmovsd\t%xmm2, (%r9,%r8,8)     # write y result back to center array\r\n\tpopq\t%rbx\r\n\tret\r\n.LC0:\r\n\t.long\t0\r\n\t.long\t1074266112\r\n<\/pre>\n<\/td>\n<td>\nRust<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nexample::Mesh::element_center:\r\n        pushq   %rax\r\n        movq    %rsi, %rcx\r\n        movq    40(%rdi), %rsi\r\n        cmpq    %rcx, %rsi\r\n        jbe     .LBB0_1\r\n        movq    %rdi, %r8\r\n        movq    16(%rdi), %rsi\r\n        movq    24(%rdi), %rax\r\n        leaq    (%rcx,%rcx,2), %rdx\r\n        movq    (%rax,%rdx,8), %rcx\r\n        cmpq    %rcx, %rsi\r\n        jbe     .LBB0_4\r\n        movq    8(%rax,%rdx,8), %rdi\r\n        cmpq    %rdi, %rsi\r\n        jbe     .LBB0_8\r\n        movq    16(%rax,%rdx,8), %rax\r\n        cmpq    %rax, %rsi\r\n        jbe     .LBB0_9\r\n        movq    (%r8), %rdx\r\n        shlq    $4, %rdi\r\n        shlq    $4, %rcx\r\n        movsd   (%rdx,%rcx), %xmm0\r\n        movsd   8(%rdx,%rcx), %xmm1\r\n        addsd   (%rdx,%rdi), %xmm0\r\n        shlq    $4, %rax\r\n        addsd   (%rdx,%rax), %xmm0\r\n        movsd   .LCPI0_0(%rip), %xmm2\r\n        addsd   8(%rdx,%rdi), %xmm1\r\n        addsd   8(%rdx,%rax), %xmm1\r\n        divsd   %xmm2, %xmm0\r\n        divsd   %xmm2, %xmm1\r\n        movq    %xmm0, %rax\r\n        movq    %xmm1, %rdx\r\n        popq    %rcx\r\n        retq\r\n.LBB0_1:\r\n        leaq    .L__unnamed_1(%rip), %rdx\r\n        jmp     .LBB0_2\r\n.LBB0_4:\r\n        leaq    .L__unnamed_2(%rip), %rdx\r\n.LBB0_2:\r\n        movq    %rcx, %rdi\r\n        callq   *core::panicking::panic_bounds_check@GOTPCREL(%rip)\r\n        ud2\r\n.LBB0_8:\r\n        leaq    .L__unnamed_3(%rip), %rdx\r\n        callq   *core::panicking::panic_bounds_check@GOTPCREL(%rip)\r\n        ud2\r\n.LBB0_9:\r\n        leaq    .L__unnamed_4(%rip), %rdx\r\n        movq    %rax, %rdi\r\n        callq   *core::panicking::panic_bounds_check@GOTPCREL(%rip)\r\n        ud2\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/table>\n<h1>Element Area<\/h1>\n<p><strong>C++<\/strong><br \/>\nas member function<br \/>\n\"mesh\" and \"IE\" are const and can't be modified.<br \/>\nThe long version.<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nauto Mesh::elementArea(const int IE) const {\r\n  return 0.5*((XY&#x5B;INE&#x5B;IE]&#x5B;0]].x()-XY&#x5B;INE&#x5B;IE]&#x5B;2]].x()) * (XY&#x5B;INE&#x5B;IE]&#x5B;1]].y()-XY&#x5B;INE&#x5B;IE]&#x5B;0]].y()) + (XY&#x5B;INE&#x5B;IE]&#x5B;1]].x()-XY&#x5B;INE&#x5B;IE]&#x5B;0]].x()) * (XY&#x5B;INE&#x5B;IE]&#x5B;2]].y()-XY&#x5B;INE&#x5B;IE]&#x5B;0]].y()) );\r\n}\r\n<\/pre>\n<p>Shorter version with zero extra costs<\/p>\n<pre class=\"brush: cpp; title: ; notranslate\" title=\"\">\r\nauto Mesh::elementArea(const int IE) const {\r\n  const auto&amp; p0 = XY&#x5B;INE&#x5B;IE]&#x5B;0]];\r\n  const auto&amp; p1 = XY&#x5B;INE&#x5B;IE]&#x5B;1]];\r\n  const auto&amp; p2 = XY&#x5B;INE&#x5B;IE]&#x5B;2]];\r\n  return 0.5*((p0.x()-p2.x()) * (p1.y()-p0.y()) + (p1.x()-p0.x()) * (p2.y()-p0.y()) );\r\n}\r\n<\/pre>\n<p><strong>Julia<\/strong><br \/>\nas free function<br \/>\n<font color=\"red\">constness? performance?<\/font><\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nfunction elementArea(mesh::Mesh, IE::Int32)\r\n  Elem = mesh.INE&#x5B;:,IE]\r\n  return  0.5*( (mesh.XY&#x5B;Elem&#x5B;1]].x - mesh.XY&#x5B;Elem&#x5B;3]].x) * (mesh.XY&#x5B;Elem&#x5B;2]].y - mesh.XY&#x5B;Elem&#x5B;1]].y) + (mesh.XY&#x5B;Elem&#x5B;2]].x - mesh.XY&#x5B;Elem&#x5B;1]].x) * (mesh.XY&#x5B;Elem&#x5B;3]].y - mesh.XY&#x5B;Elem&#x5B;1]].y) )\r\nend\r\n<\/pre>\n<p><strong>FORTRAN<\/strong><br \/>\nas member function (and with a few small changes as free function)<br \/>\n\"this\" the mesh and \"IE\" are const and can't be modified.<br \/>\n<font color=\"red\">The first dimension of array XY and INE is not known at compile time<\/font><br \/>\nThe long version<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nfunction elementArea(this, IE) result(area)\r\n  implicit none\r\n  type(Mesh_t), intent(in) :: this\r\n  integer, intent(in) :: IE\r\n  real(8) :: area\r\n  area = 0.5*( (this%XY(1,this%INE(1,IE))-this%XY(1,this%INE(3,IE))) * (this%XY(2,this%INE(2,IE))-this%XY(2,this%INE(1,IE))) + (this%XY(1,this%INE(2,IE))-this%XY(1,this%INE(1,IE))) * (this%XY(2,this%INE(3,IE))- this%XY(2,this%INE(1,IE))) )\r\nend function\r\n<\/pre>\n<p>Shorter version<br \/>\n<font color=\"red\">One have to work with pointers to prevent copys. Now the bound checking don't work any longer<\/font><\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\nfunction elementArea(this, IE) result(area)\r\n  implicit none\r\n  type(Mesh_t), intent(in) :: this\r\n  integer, intent(in) :: IE\r\n  real(8) :: area\r\n  real(8), pointer :: p1(:), p2(:), p3(:)\r\n  p1 =&gt; this%XY(1:2,this%INE(1,IE))\r\n  p2 =&gt; this%XY(1:2,this%INE(2,IE))\r\n  p3 =&gt; this%XY(1:2,this%INE(3,IE))\r\n  area = 0.5*( (p1(1)-p3(1)) * (p2(2)-p1(2)) + (p2(1)-p1(1)) * (p3(2)- p1(2)) )\r\nend function\r\n<\/pre>\n<p>The assembly output show the differences. First the two C++ versions. As you can see, the versions are identical except of the ordering of one instruction. The short version of FORTRAN has two more assembler instruction. So there are no zero costs for syntax sugar.<\/p>\n<table>\n<tr>\n<td>\nC++ Long version<\/p>\n<pre class=\"brush: plain; highlight: [10]; title: ; notranslate\" title=\"\">\r\nMesh::elementArea(int) const:\r\n  movslq %esi, %rsi\r\n  leaq (%rsi,%rsi,2), %rdx\r\n  movq 24(%rdi), %rax\r\n  leaq (%rax,%rdx,4), %rsi\r\n  movq (%rdi), %rdx\r\n  movslq (%rsi), %rcx\r\n  salq $4, %rcx\r\n  addq %rdx, %rcx\r\n  movsd (%rcx), %xmm2\r\n  movslq 8(%rsi), %rax\r\n  salq $4, %rax\r\n  addq %rdx, %rax\r\n  movslq 4(%rsi), %rsi\r\n  salq $4, %rsi\r\n  addq %rsi, %rdx\r\n  movsd 8(%rcx), %xmm3\r\n  movapd %xmm2, %xmm0\r\n  subsd (%rax), %xmm0\r\n  movsd 8(%rdx), %xmm1\r\n  subsd %xmm3, %xmm1\r\n  mulsd %xmm1, %xmm0\r\n  movsd (%rdx), %xmm1\r\n  subsd %xmm2, %xmm1\r\n  movsd 8(%rax), %xmm2\r\n  subsd %xmm3, %xmm2\r\n  mulsd %xmm2, %xmm1\r\n  addsd %xmm1, %xmm0\r\n  mulsd .LC0(%rip), %xmm0\r\n  ret\r\n.LC0:\r\n  .long 0\r\n  .long 1071644672\r\n<\/pre>\n<p>mov 12<br \/>\nlea 2<br \/>\nsal 3<br \/>\nadd 4<br \/>\nsub 4<br \/>\nmul 3\n<\/td>\n<td>\nC++ Short version<\/p>\n<pre class=\"brush: plain; highlight: [16]; title: ; notranslate\" title=\"\">\r\nMesh::elementArea(int) const:\r\n  movslq %esi, %rsi\r\n  leaq (%rsi,%rsi,2), %rdx\r\n  movq 24(%rdi), %rax\r\n  leaq (%rax,%rdx,4), %rsi\r\n  movq (%rdi), %rax\r\n  movslq (%rsi), %rcx\r\n  salq $4, %rcx\r\n  addq %rax, %rcx\r\n  movslq 4(%rsi), %rdx\r\n  salq $4, %rdx\r\n  addq %rax, %rdx\r\n  movslq 8(%rsi), %rsi\r\n  salq $4, %rsi\r\n  addq %rsi, %rax\r\n  movsd (%rcx), %xmm2     \r\n  movsd 8(%rcx), %xmm3\r\n  movapd %xmm2, %xmm0\r\n  subsd (%rax), %xmm0\r\n  movsd 8(%rdx), %xmm1\r\n  subsd %xmm3, %xmm1\r\n  mulsd %xmm1, %xmm0\r\n  movsd (%rdx), %xmm1\r\n  subsd %xmm2, %xmm1\r\n  movsd 8(%rax), %xmm2\r\n  subsd %xmm3, %xmm2\r\n  mulsd %xmm2, %xmm1\r\n  addsd %xmm1, %xmm0\r\n  mulsd .LC0(%rip), %xmm0\r\n  ret\r\n.LC0:\r\n  .long 0\r\n  .long 1071644672\r\n<\/pre>\n<p>mov 12<br \/>\nlea 2<br \/>\nsal 3<br \/>\nadd 4<br \/>\nsub 4<br \/>\nmul 3\n<\/td>\n<td>\nFORTRAN long version<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n__testmodule_MOD_elementarea:\r\n\tmovq\t(%rdi), %r8\r\n\tmovq\t8(%rdi), %r9\r\n\tmovslq\t(%rsi), %rax\r\n\timulq\t120(%rdi), %rax\r\n\taddq\t80(%rdi), %rax\r\n\tmovq\t72(%rdi), %rdx\r\n\tleaq\t(%rdx,%rax,4), %r10\r\n\tmovq\t48(%rdi), %rsi\r\n\tmovq\t24(%rdi), %rcx\r\n\tmovslq\t4(%r10), %r11\r\n\timulq\t%rsi, %r11\r\n\taddq\t%r9, %r11\r\n\taddq\t%rcx, %r11\r\n\tmovsd\t(%r8,%r11,8), %xmm1\r\n\tmovslq\t12(%r10), %rdx\r\n\timulq\t%rsi, %rdx\r\n\taddq\t%r9, %rdx\r\n\taddq\t%rcx, %rdx\r\n\tmovslq\t8(%r10), %rdi\r\n\timulq\t%rdi, %rsi\r\n\taddq\t%rsi, %r9\r\n\tleaq\t(%r9,%rcx,2), %r9\r\n\taddq\t%rcx, %r11\r\n\tmovsd\t(%r8,%r11,8), %xmm3\r\n\tmovq\t%r9, %rax\r\n\tsubq\t%rcx, %rax\r\n\tmovsd\t(%r8,%rax,8), %xmm0\r\n\tsubsd\t%xmm1, %xmm0\r\n\taddq\t%rdx, %rcx\r\n\tmovsd\t(%r8,%rcx,8), %xmm2\r\n\tsubsd\t%xmm3, %xmm2\r\n\tmulsd\t%xmm2, %xmm0\r\n\tmovapd\t%xmm1, %xmm4\r\n\tsubsd\t(%r8,%rdx,8), %xmm4\r\n\tmovsd\t(%r8,%r9,8), %xmm5\r\n\tsubsd\t%xmm3, %xmm5\r\n\tmulsd\t%xmm4, %xmm5\r\n\taddsd\t%xmm5, %xmm0\r\n\tmulsd\t.LC0(%rip), %xmm0\r\n\tret\r\n.LC0:\r\n\t.long\t0\r\n\t.long\t1071644672\r\n<\/pre>\n<p>mov 16<br \/>\nlea 2<br \/>\nadd 9<br \/>\nsub 5<br \/>\nmul 7\n<\/td>\n<td>\nFORTRAN short version<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n__testmodule_MOD_elementarea:\r\n\tmovslq\t(%rsi), %rax\r\n\timulq\t120(%rdi), %rax\r\n\taddq\t80(%rdi), %rax\r\n\tmovq\t72(%rdi), %rdx\r\n\tleaq\t(%rdx,%rax,4), %r10\r\n\tmovq\t24(%rdi), %rsi\r\n\tmovq\t(%rdi), %r8\r\n\tmovl\t$1, %eax\r\n\tsubq\t32(%rdi), %rax\r\n\timulq\t%rsi, %rax\r\n\tmovq\t56(%rdi), %r9\r\n\tmovq\t48(%rdi), %rcx\r\n\tmovslq\t4(%r10), %rdi\r\n\tsubq\t%r9, %rdi\r\n\timulq\t%rcx, %rdi\r\n\taddq\t%rax, %rdi\r\n\tleaq\t(%r8,%rdi,8), %rdi\r\n\tmovslq\t8(%r10), %r11\r\n\tsubq\t%r9, %r11\r\n\timulq\t%rcx, %r11\r\n\taddq\t%rax, %r11\r\n\tleaq\t(%r8,%r11,8), %r11\r\n\tmovslq\t12(%r10), %rdx\r\n\tsubq\t%r9, %rdx\r\n\timulq\t%rdx, %rcx\r\n\taddq\t%rcx, %rax\r\n\tleaq\t(%r8,%rax,8), %r10\r\n\tmovsd\t(%rdi), %xmm1\r\n\tmovsd\t(%rdi,%rsi,8), %xmm3\r\n\tmovapd\t%xmm1, %xmm4\r\n\tsubsd\t(%r10), %xmm4\r\n\tmovsd\t(%r11,%rsi,8), %xmm2\r\n\tsubsd\t%xmm3, %xmm2\r\n\tmulsd\t%xmm2, %xmm4\r\n\tmovsd\t(%r11), %xmm0\r\n\tsubsd\t%xmm1, %xmm0\r\n\tmovsd\t(%r10,%rsi,8), %xmm6\r\n\tsubsd\t%xmm3, %xmm6\r\n\tmulsd\t%xmm6, %xmm0\r\n\taddsd\t%xmm4, %xmm0\r\n\tmulsd\t.LC0(%rip), %xmm0\r\n\tret\r\n.LC0:\r\n\t.long\t0\r\n\t.long\t1071644672\r\n<\/pre>\n<p>mov 16<br \/>\nlea 4<br \/>\nadd 5<br \/>\nsub 8<br \/>\nmul 8\n<\/td>\n<\/tr>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Page under construction General statements: Compiler: GNU GCC 8.1 Flags: -O1 --save-temps and -funroll-loops for FORTRAN The C++ Point2D Type is simply a struct inherit from a SIMDvalarray to use the benefits of SSE, AVX. One can not use something like a Point2D type in FORTRAN. One can program it, but not use. Because it [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-3546","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/pages\/3546","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3546"}],"version-history":[{"count":29,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/pages\/3546\/revisions"}],"predecessor-version":[{"id":4959,"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=\/wp\/v2\/pages\/3546\/revisions\/4959"}],"wp:attachment":[{"href":"http:\/\/roboblog.fatal-fury.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3546"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}