This is one wrong way to design a FORTRAN to C++ interface.
But let's start at the beginning. From FORTRAN callable C++ function must be declared with extern "C" to disable name mangling.
The function funcCPP expect a 2D array. Two items a 3 values. Calculate the sum and return it.
Rember: C counts from 0 and the right most index is the fastest index.
#ifdef __cplusplus
extern "C" {
#endif
double funcCPP(double arr[2][3]);
#ifdef __cplusplus
}
#endif
double funcCPP(double arr[2][3]) {
return arr[0][0] + arr[0][1] + arr[0][2] + arr[1][0] + arr[1][1] + arr[1][2];
}
The generated ASM code (GCC -O1) looks straight forward. Almost perfect (except of SIMD). Nothing more to say.
funcCPP:
movsd (%rdi), %xmm0
addsd 8(%rdi), %xmm0
addsd 16(%rdi), %xmm0
addsd 24(%rdi), %xmm0
addsd 32(%rdi), %xmm0
addsd 40(%rdi), %xmm0
ret
Now we implement an FORTRAN interface for that function. Using iso_c_binding module to make life easier. But the compiler didn't to any real check. If you add an parameter in the C function and forget to add it in the FORTRAN interface -> BOOM.
The array is pass as an 1D array.
module CPPinterface_m
use, intrinsic :: iso_c_binding
implicit none
interface
function funcCPP(arr) bind(c, name="funcCPP")
use, intrinsic :: iso_c_binding
implicit none
real(c_double), dimension(*) :: arr
real(c_double) :: funcCPP
end function
end interface
end module
The main code is a subroutine func which get an 2D array with N items a 3 values. The variable idx contains the position of the items we are interested in. This values are passed to funcCPP. But the two items at position idx(1) and idx(2) are not consecutive stored in RAM. So the compiler has to generate code which make a copy of the six floating point values. That's not what we intend to.
module prog_m
use CPPinterface_m
implicit none
integer :: N
contains
subroutine func(arr, idx)
implicit none
real(8), intent(inout) :: arr(3,N)
integer, intent(in) :: idx(2)
real(8) :: res
res = funcCPP(arr(:,idx))
end subroutine
end module
You can see it right here in the assembler code. I commented it for you. The data is passed through stack.
__unrunoff_m_MOD_func:
subq $56, %rsp # allocate 56 bytes on stack
movslq (%rsi), %rax
leaq (%rax,%rax,2), %rax # calculate address of first item
leaq (%rdi,%rax,8), %rax
movsd -24(%rax), %xmm0 # copy arr[0][0]
movsd %xmm0, (%rsp)
movsd -16(%rax), %xmm0 # copy arr[0][1]
movsd %xmm0, 8(%rsp)
movsd -8(%rax), %xmm0 # copy arr[0][2]
movsd %xmm0, 16(%rsp)
movslq 4(%rsi), %rax
leaq (%rax,%rax,2), %rax # calculate address of second item
leaq (%rdi,%rax,8), %rax
movsd -24(%rax), %xmm0 # copy arr[1][0]
movsd %xmm0, 24(%rsp)
movsd -16(%rax), %xmm0 # copy arr[1][1]
movsd %xmm0, 32(%rsp)
movsd -8(%rax), %xmm0 # copy arr[1][2]
movsd %xmm0, 40(%rsp)
movq %rsp, %rdi
call funcCPP@PLT # call the C++ function
addq $56, %rsp # release stack space
ret
We have 6 ASM instruction to process the data and 21 instructions to access the data. Nope. Fail.