IR Representations and Code Transformations
Mooncake works by transforming Julia's SSA-form Intermediate Representation (IR), so a good working model of that IR is useful when touching the interpreter.
Please note that Julia's SSA-form IR changes slightly across minor versions, because it is not a public language interface. The examples below are representative rather than version-stable.
Before looking at the printed IR, keep three ideas in mind:
- Each SSA statement produces one named value such as
%1or%2. - Control flow is organized into basic blocks with branch or return terminators.
- The compiler stores the statements and the control-flow graph separately, and Mooncake has to keep those two views coherent when transforming code.
Julia's SSA-Form IR
Straight-Line Code
You can find the IR associated to a given signature using Base.code_ircode_by_type:
julia> function foo(x)
y = sin(x)
z = cos(y)
return z
end
foo (generic function with 1 method)
julia> signature = Tuple{typeof(foo), Float64}
Tuple{typeof(foo), Float64}
julia> Base.code_ircode_by_type(signature)[1][1]
2 1 ─ %1 = invoke sin(_2::Float64)::Float64
3 │ %2 = invoke cos(%1::Float64)::Float64
4 └── return %2The statements are associated to SSA names such as %1 and %2. Each statement is associated to a single SSA value, and uses of arguments are written as _n, where _1 is the function itself.
This IR is obtained after type inference and some optimisation passes, so each statement already carries type information. In the example above, %1 and %2 are both known to be Float64.
Control Flow
Control flow is expressed via basic blocks and terminators:
julia> function bar(x)
if x > 0
return x
else
return 5x
end
end
bar (generic function with 1 method)
julia> Base.code_ircode_by_type(Tuple{typeof(bar), Float64})[1][1]
2 1 ─ %1 = intrinsic Base.lt_float(0.0, _2)::Bool
│ %2 = intrinsic Base.or_int(%1, false)::Bool
└── goto #3 if not %2
3 2 ─ return _2
5 3 ─ %5 = intrinsic Base.mul_float(5.0, _2)::Float64
└── return %5The corresponding control-flow graph is stored separately in the cfg field:
julia> Base.code_ircode_by_type(Tuple{typeof(bar), Float64})[1][1].cfg
CFG with 3 blocks:
bb 1 (stmts 1:3) → bb 3, 2
bb 2 (stmt 4)
bb 3 (stmts 5:6)Each basic block is a straight-line region that ends either by falling through, branching, or returning.
Simple Loops and Phi Nodes
Loops introduce phi nodes:
julia> function my_factorial(x::Int)
n = 0
s = 1
while n < x
n += 1
s *= n
end
return s
end
my_factorial (generic function with 1 method)
julia> ir = Base.code_ircode_by_type(Tuple{typeof(my_factorial), Int})[1][1]
1 ─ nothing::Nothing
4 2 ┄ %2 = φ (#1 => 1, #3 => %7)::Int64
│ %3 = φ (#1 => 0, #3 => %6)::Int64
│ %4 = intrinsic Base.slt_int(%3, _2)::Bool
└── goto #4 if not %4
5 3 ─ %6 = intrinsic Base.add_int(%3, 1)::Int64
6 │ %7 = intrinsic Base.mul_int(%2, %6)::Int64
7 └── goto #2
8 4 ─ return %2For example,
%2 = φ (#1 => 1, #3 => %7)means %2 takes value 1 when control arrives from block #1, and the value of %7 when control arrives from block #3.
Julia Compiler's IR Datastructure
The compiler represents inferred IR as Core.Compiler.IRCode. The statements live in the stmts field, which is a Core.Compiler.InstructionStream. An InstructionStream is a bundle of parallel vectors: the statement itself, its inferred type, call info, line data, and flags.
For example:
julia> ir.stmts.stmt
9-element Vector{Any}:
nothing
:(φ (%1 => 1, %3 => %7))
:(φ (%1 => 0, %3 => %6))
:(Base.slt_int(%3, _2))
:(goto %4 if not %4)
:(Base.add_int(%3, 1))
:(Base.mul_int(%2, %6))
:(goto %2)
:(return %2)
julia> ir.stmts.type
9-element Vector{Any}:
Nothing
Int64
Int64
Bool
Any
Int64
Int64
Any
AnyThe control-flow graph is stored separately in ir.cfg, and the argument types are stored in ir.argtypes.
Code Transformations
Mooncake uses two broad styles of transformation:
- Straight-line edits on
IRCode, especially in forward mode. - Reverse-mode assembly through a builder-local CFG in
reverse_mode.jl, followed by a final lowering step back to coherentIRCode.
Replacing Instructions in IRCode
Replacing one statement with another is straightforward:
julia> using Core: SSAValue
julia> const CC = Core.Compiler;
julia> new_ir = Core.Compiler.copy(ir);
julia> old_stmt = new_ir.stmts.stmt[7]
:(Base.mul_int(%2, %6))
julia> new_stmt = Expr(:call, Base.add_int, old_stmt.args[2:end]...)
:((Core.Intrinsics.add_int)(%2, %6))
julia> CC.setindex!(CC.getindex(new_ir, SSAValue(7)), new_stmt, :stmt);
julia> new_ir
1 ─ nothing::Nothing
4 2 ┄ %2 = φ (#1 => 1, #3 => %7)::Int64
│ %3 = φ (#1 => 0, #3 => %6)::Int64
│ %4 = intrinsic Base.slt_int(%3, _2)::Bool
└── goto #4 if not %4
5 3 ─ %6 = intrinsic Base.add_int(%3, 1)::Int64
6 │ %7 = intrinsic (Core.Intrinsics.add_int)(%2, %6)::Int64
7 └── goto #2
8 4 ─ return %2This is the kind of local transformation that forward mode relies on heavily.
Inserting New Instructions in IRCode
Insertion requires a little more care because later SSA names may need to shift. IRCode handles this through insert_node! plus a later compact!:
julia> ni = CC.NewInstruction(Expr(:call, Base.mul_int, SSAValue(3), 2), Int)
Compiler.NewInstruction(:((Core.Intrinsics.mul_int)(%3, 2)), Int64, Compiler.NoCallInfo(), nothing, nothing)
julia> new_ssa = CC.insert_node!(new_ir, SSAValue(6), ni)
:(%10)
julia> stmt = CC.getindex(CC.getindex(new_ir, SSAValue(6)), :stmt)
:(Base.add_int(%3, 1))
julia> stmt.args[2] = new_ssa;
julia> new_ir = CC.compact!(new_ir)
1 ─ nothing::Nothing
4 2 ┄ %2 = φ (#1 => 1, #3 => %8)::Int64
│ %3 = φ (#1 => 0, #3 => %7)::Int64
│ %4 = intrinsic Base.slt_int(%3, _2)::Bool
└── goto #4 if not %4
5 3 ─ %6 = intrinsic (Core.Intrinsics.mul_int)(%3, 2)::Int64
│ %7 = intrinsic Base.add_int(%6, 1)::Int64
6 │ %8 = intrinsic (Core.Intrinsics.add_int)(%2, %7)::Int64
7 └── goto #2
8 4 ─ return %2This is the right tool when the transformation stays local to existing basic blocks.
Reverse-Mode CFG Assembly
Reverse mode needs more than local SSA insertion. It frequently has to:
- create fresh blocks,
- thread predecessor-sensitive phi handling,
- insert reverse-only control flow, and
- preserve a coherent CFG while doing so.
Mooncake now handles that in src/interpreter/reverse_mode.jl using a builder-local CFGBlock representation. The reverse transform first translates the normalized primal IRCode into CFG blocks with stable internal IDs, assembles the forwards and pullback control flow in that representation, and finally lowers the result back to IRCode.
That split is deliberate:
IRCoderemains the source of truth at the compiler boundary.- The builder provides a convenient place to manipulate reverse-mode control flow.
- The final lowering step re-establishes standard compiler IR with a coherent CFG.
This page is mainly about the representations themselves. For the full reverse-mode pipeline, including statement translation, control-flow replay, and forward-to-reverse communication, see reverse_mode_design.md.
Summary
IRCode is the main compiler-facing representation throughout Mooncake. Forward mode mostly performs local statement rewrites on that representation. Reverse mode still starts from normalized IRCode, but assembles its extra control flow in a builder-local CFG before lowering back to IRCode.
If you are modifying interpreter internals, the most important invariants to preserve are:
- SSA uses must stay coherent after insertion and compaction.
ir.stmtsandir.cfgmust agree after lowering.- Phi-node edges and predecessor relationships must stay aligned when blocks are removed or reordered.