Description
In #27 we said that "every call to a Lua function is inlined, always, without exception." But what are these calls inlined into? What is the unit of compilation?
A reasonable first approximation is to say that the JIT compiles each of the innermost loops in the program separately. Innermost loops are any loops that don't themselves contain loops, neither directly in their source code nor indirectly via a function they call. The unit of compilation is then from the start of each inner loop to its end, and all function calls made within the loop are inlined into the generated code.
This inlining means that each time you call a function in your source code the JIT will compile a separate copy of that function. Consider this inner loop that calls the same function multiple times:
function add(a, b) return a + b end
local x = 0
for i = 1, 100 do
x = add(x, 42)
x = add(x, "42")
end
Here we have two calls to the function add
and parameter b
is an integer in the first call but a string in the second call. (This is not an error because Lua automatically coerces strings to numbers.) Is the diversity of types for the b
parameter going to cause problems with speculative optimization? No: the JIT will compile each of the calls separately and specialize the code for each call for separate argument types. This code compiles efficiently because at runtime each of the calls uses self-consistent types.
Let's be advanced and consider the merits of a couple of higher-order functions:
function apply(fn, x)
return fn(x)
end
function fold(fn, val, array)
for i = 1, #array do
val = fn(val, array[i])
end
return val
end
Is this code appropriate for a tracing JIT? The potential problem is that each time we compile a call to these functions the JIT will speculatively inline the body of the function that we pass in as fn
. This is extremely efficient if we are always passing the same value for fn
but otherwise it is expensive. If our intention is to call these library functions many times with different function parameters then we have to consider how those calls will compile.
The answer is that apply
is fine but fold
is problematic. The reason that apply
is fine is that each call will be compiled and optimized separately and so there will not be any interaction between the different fn
values. The reason that fold
is problematic is that it passes many different values of fn
into a loop and calls it there. The loop in fold
will be its own compilation unit, meaning that only one copy of this loop body will be compiled, and this compiled code will see all of the different values of fn
at runtime. This will force the compiler to repeatedly mispredict the value that fn
will have and to generate less-than-optimal code as a result.
The moral of this story is that if you want to optimize the same code for many different uses then you should be careful to make sure each use is compiled separately. Loosely speaking that means to avoid sharing the same inner loop in your source code for multiple uses.
How does that work in practice? In simple terms it means that instead of writing code like this:
local x = 0
x = fold(fn1, x, array)
x = fold(fn2, x, array)
You should simply write two separate loops (for
...end
) in your source code.
local x = 0
for i = 1, #array do x = fn1(x, array[i]) end
for i = 1, #array do x = fn2(x, array[i]) end
Here fn1
and fn2
are called from separate inner loops and so each will be compiled and optimized separately. Simple enough, eh? (This is assuming that these really are inner loops i.e. that neither fn1
nor fn2
contains a loop!)
If you are really determined to write higher-order functions that have attractive performance characteristics then take a look at advanced tricks like luafun/luafun#33.
(Incidentally: These units of compilation are called traces.)