Interface

This is the public interface that day-to-day users of AD are expected to interact with if for some reason DifferentiationInterface.jl does not suffice. If you have not tried using Mooncake.jl via DifferentiationInterface.jl, please do so. See Tutorial for more info.

Example

Here's a simple example demonstrating how to use Mooncake.jl's native API:

import Mooncake as MC

struct SimplePair
    x1::Float64
    x2::Float64
end

# Define a simple function
g(x::SimplePair) = x.x1^2 + x.x2^2

# Where to evaluate the derivative
x_eval = SimplePair(1.0, 2.0)

Main.SimplePair(1.0, 2.0)

With friendly_tangents = false (the default), gradients for custom structures use a representation based on Mooncake.Tangent types. See Mooncake.jl's Rule System for more information.

cache = MC.prepare_gradient_cache(g, x_eval)
val, grad = MC.value_and_gradient!!(cache, g, x_eval)

(5.0, (Mooncake.NoTangent(), Mooncake.Tangent{@NamedTuple{x1::Float64, x2::Float64}}((x1 = 2.0, x2 = 4.0))))

This produces a tuple containing the value of the function (here 5.0) and the gradient. The first part of the gradient is the gradient wrt. g itself, here NoTangent() since g is not differentiable. The second part of the gradient is the gradient wrt. x; for the type SimplePair, its gradient is represented using a @NamedTuple{x1::Float64, x2::Float64} wrapped in a Tangent object. The gradient wrt. x1 can for example be retrieved with grad[2].fields.x1.

With friendly_tangents=true, gradients are returned in a more readable form:

cache = MC.prepare_gradient_cache(g, x_eval; config=MC.Config(friendly_tangents=true))
val, grad = MC.value_and_gradient!!(cache, g, x_eval)

(5.0, (Mooncake.NoTangent(), (x1 = 2.0, x2 = 4.0)))

The gradient wrt. x is now the NamedTuple (x1 = 2.0, x2 = 4.0).

In addition, there is an optional tuple-typed argument args_to_zero that specifies a true/false value for each argument (e.g., g, x_eval), allowing tangent zeroing to be skipped on a per-argument basis when the value is constant. Note that the first true/false entry specifies whether to zero the tangent of g; zeroing g's tangent is not always necessary, but is sometimes required for non-constant callable objects.

cache = MC.prepare_gradient_cache(g, x_eval; config=MC.Config(friendly_tangents=true))
val, grad = MC.value_and_gradient!!(
    cache,
    g,
    x_eval;
    args_to_zero = (false, true),
)

(5.0, (Mooncake.NoTangent(), (x1 = 2.0, x2 = 4.0)))

Aside: Any performance impact from using friendly_tangents = true should be very minor. If it is noticeable, something is likely wrong—please open an issue.

If you want to use forward mode explicitly, the cache from prepare_derivative_cache can now also drive value_and_gradient!! for scalar outputs. Mooncake seeds standard-basis directions internally and evaluates them in chunks:

fcache = MC.prepare_derivative_cache(g, x_eval; config=MC.Config(chunk_size=2))
val, grad = MC.value_and_gradient!!(fcache, g, x_eval)

(5.0, (Mooncake.NoTangent(), Mooncake.Tangent{@NamedTuple{x1::Float64, x2::Float64}}((x1 = 2.0, x2 = 4.0))))

Passing Config(chunk_size=2) builds a width-2 forward FCache, so public forward-cache APIs evaluate derivatives in chunks of two directions at a time. Leaving chunk_size=nothing keeps Mooncake's default width-1 path. Cache construction stays passive; show(cache) / repr(cache) display the prepared cache configuration.

When a public cache path dispatches to NfwdMooncake, value_and_gradient!! remains the higher-level Mooncake interface. It may need to bridge richer user-facing inputs, such as custom structs, to the scalar/array/tuple nfwd signatures used internally, and it also does the usual cache checks and tangent zeroing. That extra interface work adds some overhead relative to calling NfwdMooncake.build_rrule(...)(...) directly on a supported nfwd signature over IEEEFloat / Complex{<:IEEEFloat} scalars, dense arrays with those element types, and tuples thereof.

Separately, the Hessian path exposed by prepare_hessian_cache / value_gradient_and_hessian!! uses nfwd-over-reverse AD: it compiles a reverse-mode rule for NDual inputs so that a single forward+backward pass yields both the gradient and the Hessian-vector product.

Jacobian example

For a vector-valued function of a single dense vector input, value_and_jacobian!! returns the primal output together with a dense Jacobian whose columns correspond to input coordinates.

julia> using Mooncake

julia> f(x) = [x[1]^2 + x[2], x[1] * x[2]]
f (generic function with 1 method)

julia> x = [2.0, 3.0];

julia> cache = Mooncake.prepare_derivative_cache(f, x);

julia> Mooncake.value_and_jacobian!!(cache, f, x)
([7.0, 6.0], [4.0 1.0; 3.0 2.0])

API Reference

Mooncake.Config — Type

Config(;
    debug_mode::Bool=false,
    silence_debug_messages::Bool=false,
    friendly_tangents::Bool=false,
    chunk_size::Union{Nothing,Int}=nothing,
    empty_cache::Bool=false,
    second_order_mode::Symbol=:forward_over_reverse,
)

Configuration struct for use with ADTypes.AutoMooncake.

Keyword Arguments

debug_mode::Bool=false: whether or not to run additional type checks when differentiating a function. This has considerable runtime overhead, and should only be switched on if you are trying to debug something that has gone wrong in Mooncake.
silence_debug_messages::Bool=false: if false and debug_mode is true, Mooncake will display some warnings that debug mode is enabled, in order to help prevent accidentally leaving debug mode on. If you wish to disable these messages, set this to true.
friendly_tangents::Bool=false: if true, Mooncake will represent tangents using the primal type at the interface level: the tangent type of a primal type P will be P when using friendly tangents, and tangent_type(P) otherwise (e.g. the friendly tangent of a custom struct will be of the same type as the struct instead of Mooncake's Tangent type). The tangent is converted from/to the friendly representation at the interface level, so all Mooncake internal computations and rule implementations always use the tangent_type representation.
chunk_size::Union{Nothing,Int}=nothing: optional forward chunk width for the public prepare_derivative_cache path and APIs layered on top of it. nothing uses Mooncake's default width-1 path; an explicit integer compiles a width-N forward rule and uses chunked evaluation in value_and_derivative!! / value_and_gradient!!. This does not affect reverse-mode caches.
empty_cache::Bool=false: if true, all internal Mooncake caches (compiled OpaqueClosures, CodeInstances, and type-inference results) are cleared before building the new rule. This allows the garbage collector to reclaim memory held by previously compiled rules, and is useful in long-running sessions where many distinct functions have been differentiated. Note that only Julia-level (GC-managed) objects are freed; JIT-compiled native machine code is held permanently by the Julia runtime and cannot be reclaimed.
second_order_mode::Symbol=:forward_over_reverse: controls the nesting strategy used by prepare_hvp_cache and prepare_hessian_cache. :forward_over_reverse differentiates a gradient closure with forward-mode AD. :reverse_over_forward compiles a reverse-mode rule over NDual inputs so that a single forward+backward pass yields both the gradient and the Hessian-vector product.