Interface

This is the public interface that day-to-day users of AD are expected to interact with if for some reason DifferentiationInterface.jl does not suffice. If you have not tried using Mooncake.jl via DifferentiationInterface.jl, please do so. See Tutorial for more info.

Example

Here's a simple example demonstrating how to use Mooncake.jl's native API:

import Mooncake as MC

struct SimplePair
    x1::Float64
    x2::Float64
end

# Define a simple function
g(x::SimplePair) = x.x1^2 + x.x2^2

# Where to evaluate the derivative
x_eval = SimplePair(1.0, 2.0)
Main.SimplePair(1.0, 2.0)

With friendly_tangents = false (the default), gradients for custom structures use a representation based on Mooncake.Tangent types. See Mooncake.jl's Rule System for more information.

cache = MC.prepare_gradient_cache(g, x_eval)
val, grad = MC.value_and_gradient!!(cache, g, x_eval)
(5.0, (Mooncake.NoTangent(), Mooncake.Tangent{@NamedTuple{x1::Float64, x2::Float64}}((x1 = 2.0, x2 = 4.0))))

This produces a tuple containing the value of the function (here 5.0) and the gradient. The first part of the gradient is the gradient wrt. g itself, here NoTangent() since g is not differentiable. The second part of the gradient is the gradient wrt. x; for the type SimplePair, its gradient is represented using a @NamedTuple{x1::Float64, x2::Float64} wrapped in a Tangent object. The gradient wrt. x1 can for example be retrieved with grad[2].fields.x1.

With friendly_tangents=true, gradients are returned in a more readable form:

cache = MC.prepare_gradient_cache(g, x_eval; config=MC.Config(friendly_tangents=true))
val, grad = MC.value_and_gradient!!(cache, g, x_eval)
(5.0, (Mooncake.NoTangent(), (x1 = 2.0, x2 = 4.0)))

The gradient wrt. x is now the NamedTuple (x1 = 2.0, x2 = 4.0).

In addition, there is an optional tuple-typed argument args_to_zero that specifies a true/false value for each argument (e.g., g, x_eval), allowing tangent zeroing to be skipped on a per-argument basis when the value is constant. Note that the first true/false entry specifies whether to zero the tangent of g; zeroing g's tangent is not always necessary, but is sometimes required for non-constant callable objects.

cache = MC.prepare_gradient_cache(g, x_eval; config=MC.Config(friendly_tangents=true))
val, grad = MC.value_and_gradient!!(
    cache,
    g,
    x_eval;
    args_to_zero = (false, true),
)
(5.0, (Mooncake.NoTangent(), (x1 = 2.0, x2 = 4.0)))

Aside: Any performance impact from using friendly_tangents = true should be very minor. If it is noticeable, something is likely wrong—please open an issue.

If you want to use forward mode explicitly, the cache from prepare_derivative_cache can now also drive value_and_gradient!! for scalar outputs. Mooncake seeds standard-basis directions internally and evaluates them in chunks:

fcache = MC.prepare_derivative_cache(g, x_eval; config=MC.Config(chunk_size=2))
val, grad = MC.value_and_gradient!!(fcache, g, x_eval)
(5.0, (Mooncake.NoTangent(), Mooncake.Tangent{@NamedTuple{x1::Float64, x2::Float64}}((x1 = 2.0, x2 = 4.0))))

Passing Config(chunk_size=2) builds a width-2 forward FCache, so public forward-cache APIs evaluate derivatives in chunks of two directions at a time. Leaving chunk_size=nothing keeps Mooncake's default width-1 path. Cache construction stays passive; show(cache) / repr(cache) display the prepared cache configuration.

When a public cache path dispatches to NfwdMooncake, value_and_gradient!! remains the higher-level Mooncake interface. It may need to bridge richer user-facing inputs, such as custom structs, to the scalar/array/tuple nfwd signatures used internally, and it also does the usual cache checks and tangent zeroing. That extra interface work adds some overhead relative to calling NfwdMooncake.build_rrule(...)(...) directly on a supported nfwd signature over IEEEFloat / Complex{<:IEEEFloat} scalars, dense arrays with those element types, and tuples thereof.

Separately, the Hessian path exposed by prepare_hessian_cache / value_gradient_and_hessian!! uses nfwd-over-reverse AD: it compiles a reverse-mode rule for NDual inputs so that a single forward+backward pass yields both the gradient and the Hessian-vector product.

Jacobian example

For a vector-valued function of a single dense vector input, value_and_jacobian!! returns the primal output together with a dense Jacobian whose columns correspond to input coordinates.

julia> using Mooncake

julia> f(x) = [x[1]^2 + x[2], x[1] * x[2]]
f (generic function with 1 method)

julia> x = [2.0, 3.0];

julia> cache = Mooncake.prepare_derivative_cache(f, x);

julia> Mooncake.value_and_jacobian!!(cache, f, x)
([7.0, 6.0], [4.0 1.0; 3.0 2.0])

API Reference

Mooncake.ConfigType
Config(;
    debug_mode::Bool=false,
    silence_debug_messages::Bool=false,
    friendly_tangents::Bool=false,
    chunk_size::Union{Nothing,Int}=nothing,
    empty_cache::Bool=false,
    second_order_mode::Symbol=:forward_over_reverse,
)

Configuration struct for use with ADTypes.AutoMooncake.

Keyword Arguments

  • debug_mode::Bool=false: whether or not to run additional type checks when differentiating a function. This has considerable runtime overhead, and should only be switched on if you are trying to debug something that has gone wrong in Mooncake.
  • silence_debug_messages::Bool=false: if false and debug_mode is true, Mooncake will display some warnings that debug mode is enabled, in order to help prevent accidentally leaving debug mode on. If you wish to disable these messages, set this to true.
  • friendly_tangents::Bool=false: if true, Mooncake will represent tangents using the primal type at the interface level: the tangent type of a primal type P will be P when using friendly tangents, and tangent_type(P) otherwise (e.g. the friendly tangent of a custom struct will be of the same type as the struct instead of Mooncake's Tangent type). The tangent is converted from/to the friendly representation at the interface level, so all Mooncake internal computations and rule implementations always use the tangent_type representation.
  • chunk_size::Union{Nothing,Int}=nothing: optional forward chunk width for the public prepare_derivative_cache path and APIs layered on top of it. nothing uses Mooncake's default width-1 path; an explicit integer compiles a width-N forward rule and uses chunked evaluation in value_and_derivative!! / value_and_gradient!!. This does not affect reverse-mode caches.
  • empty_cache::Bool=false: if true, all internal Mooncake caches (compiled OpaqueClosures, CodeInstances, and type-inference results) are cleared before building the new rule. This allows the garbage collector to reclaim memory held by previously compiled rules, and is useful in long-running sessions where many distinct functions have been differentiated. Note that only Julia-level (GC-managed) objects are freed; JIT-compiled native machine code is held permanently by the Julia runtime and cannot be reclaimed.
  • second_order_mode::Symbol=:forward_over_reverse: controls the nesting strategy used by prepare_hvp_cache and prepare_hessian_cache. :forward_over_reverse differentiates a gradient closure with forward-mode AD. :reverse_over_forward compiles a reverse-mode rule over NDual inputs so that a single forward+backward pass yields both the gradient and the Hessian-vector product.
source
Mooncake.value_and_derivative!!Function
value_and_derivative!!(rule, f::Dual, x::Dual...)
value_and_derivative!!(rule, (f, df), (x, dx), ...)

Run a forward rule directly, without first constructing a FCache.

The Dual interface returns the rule output directly. The tuple interface returns (y, dy) using the rule's native tangent representation. Specialized rule types may add chunked NTangent support on top of this entrypoint.

source
value_and_derivative!!(cache::FCache, f::Dual, x::Vararg{Dual,N})

Forward-mode derivative via Dual inputs, returning a Dual output.

source
value_and_derivative!!(cache::FCache, (f, df), (x, dx), ...)

Forward-mode derivative via tuple inputs, returning (y, dy). Plain tuple tangents represent a single direction. Multi-direction evaluation requires NTangent for every differentiable tuple tangent; mixed chunked/plain differentiable inputs are rejected.

source
Mooncake.value_and_gradient!!Method
value_and_gradient!!(cache::Cache, f, x...; args_to_zero=(true, ...))

Computes a 2-tuple. The first element is f(x...), and the second is a tuple containing the gradient of f w.r.t. each argument. The first element is the gradient w.r.t any differentiable fields of f, the second w.r.t the first element of x, etc. If the cache was prepared with config.friendly_tangents=true, the pullback uses the same types as those of f and x. Otherwise, it uses the tangent types associated to f and x.

Assumes that f returns a Union{Float16, Float32, Float64}.

As with all functionality in Mooncake, if f modifes itself or x, value_and_gradient!! will return both to their original state as part of the process of computing the gradient.

Info

cache must be the output of prepare_gradient_cache, and (fields of) f and x must be of the same size and shape as those used to construct the cache. This is to ensure that the gradient can be written to the memory allocated when the cache was built.

Warning

cache owns any mutable state returned by this function, meaning that mutable components of values returned by it will be mutated if you run this function again with different arguments. Therefore, if you need to keep the values returned by this function around over multiple calls to this function with the same cache, you should take a copy (using copy or deepcopy) of them before calling again.

The keyword argument args_to_zero is a tuple of boolean values specifying which cotangents should be reset to zero before differentiation. It contains one boolean for each element of (f, x...). It is used for performance optimizations if you can guarantee that the initial cotangent allocated in cache (created by zero_tangent) never needs to be zeroed out again.

Example Usage

f(x, y) = sum(x .* y)
x = [2.0, 2.0]
y = [1.0, 1.0]
cache = prepare_gradient_cache(f, x, y)
value_and_gradient!!(cache, f, x, y)

# output

(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))
source
Mooncake.value_and_gradient!!Method
value_and_gradient!!(rule, f, x...; friendly_tangents=false)

Equivalent to value_and_pullback!!(rule, 1.0, f, x...), and assumes f returns a Union{Float16,Float32,Float64}.

Note: There are lots of subtle ways to mis-use value_and_pullback!!, so we generally recommend using Mooncake.value_and_gradient!! (this function) where possible. The docstring for value_and_pullback!! is useful for understanding this function though.

An example:

f(x, y) = sum(x .* y)
x = [2.0, 2.0]
y = [1.0, 1.0]
rule = build_rrule(f, x, y)
value_and_gradient!!(rule, f, x, y)

# output

(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))
source
Mooncake.value_and_jacobian!!Function
value_and_jacobian!!(cache::FCache, f, x)
value_and_jacobian!!(cache::Cache, f, x)

Using a pre-built cache, compute and return (value, jacobian) for a vector-valued function f of a single vector input.

The current implementation supports a single dense vector input and an AbstractVector output, both with the same IEEEFloat element type. The returned Jacobian is a dense matrix whose columns correspond to input coordinates.

Info

cache must be the output of prepare_derivative_cache or prepare_pullback_cache, and f and x must match the types and shapes used to construct the cache.

source
Mooncake.value_and_pullback!!Method
value_and_pullback!!(cache::Cache, ȳ, f, x...; args_to_zero=(true, ...))
Info

If f(x...) returns a scalar, you should use value_and_gradient!!, not this function.

Computes a 2-tuple. The first element is f(x...), and the second is a tuple containing the pullback of f applied to . The first element is the component of the pullback associated to any fields of f, the second w.r.t the first element of x, etc. If the cache was prepared with config.friendly_tangents=true, the pullback uses the same types as those of f and x. Otherwise, it uses the tangent types associated to f and x.

There are no restrictions on what y = f(x...) is permitted to return. However, must be an acceptable tangent for y. If the cache was prepared with config.friendly_tangents=false, this means that, for example, it must be true that tangent_type(typeof(y)) == typeof(ȳ). If the cache was prepared with config.friendly_tangents=true, then typeof(y) == typeof(ȳ).

As with all functionality in Mooncake, if f modifes itself or x, value_and_gradient!! will return both to their original state as part of the process of computing the gradient.

Info

cache must be the output of prepare_pullback_cache, and (fields of) f and x must be of the same size and shape as those used to construct the cache. This is to ensure that the gradient can be written to the memory allocated when the cache was built.

Warning

cache owns any mutable state returned by this function, meaning that mutable components of values returned by it will be mutated if you run this function again with different arguments. Therefore, if you need to keep the values returned by this function around over multiple calls to this function with the same cache, you should take a copy (using copy or deepcopy) of them before calling again.

The keyword argument args_to_zero is a tuple of boolean values specifying which cotangents should be reset to zero before differentiation. It contains one boolean for each element of (f, x...). It is used for performance optimizations if you can guarantee that the initial cotangent allocated in cache (created by zero_tangent) never needs to be zeroed out again.

Example Usage

f(x, y) = sum(x .* y)
x = [2.0, 2.0]
y = [1.0, 1.0]
cache = Mooncake.prepare_pullback_cache(f, x, y)
Mooncake.value_and_pullback!!(cache, 1.0, f, x, y)

# output

(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))
source
Mooncake.prepare_gradient_cacheFunction
prepare_gradient_cache(f, x...; config=Mooncake.Config())

Returns a cache used with value_and_gradient!!. See that function for more info.

The API guarantees that tangents are initialized at zero before the first autodiff pass.

Note

Calls f(x...) once during cache preparation.

source
Mooncake.prepare_pullback_cacheFunction
prepare_pullback_cache(f, x...; config=Mooncake.Config())

Returns a cache used with value_and_pullback!!. See that function for more info.

The API guarantees that tangents are initialized at zero before the first autodiff pass.

Note

Calls f(x...) once during cache preparation.

source
Mooncake.prepare_hvp_cacheFunction
prepare_hvp_cache(f, x...; config=Mooncake.Config())

Prepare a cache for Hessian-vector products and Hessian evaluation.

The nesting strategy is controlled by config.second_order_mode:

  • :forward_over_reverse (default): forward-mode derivative of a gradient closure.
  • :reverse_over_forward: reverse-mode rule over NDual{T,1} inputs.

Only Vector{<:IEEEFloat} inputs are supported.

Mutation semantics

Preparation never calls f(x...) directly. Both modes discover the output type by running a compiled AD rule and immediately unwinding it, so mutations are restored and side effects are not leaked. Callers may rely on x being unmodified after prepare_hvp_cache returns.

source
Mooncake.value_and_hvp!!Function
value_and_hvp!!(cache, f, v, x...)

Compute the value, gradient, and Hessian-vector product of scalar-valued f at x along direction v.

For single-argument f(x::Vector), returns (value, gradient, hvp). For multi-argument f(x1, x2, ...), returns (value, (g1, g2, ...), (Hv1, Hv2, ...)).

v must provide one direction vector per differentiable input, matching the shape of x. Only Vector{<:IEEEFloat} inputs are supported.

source
Mooncake.value_gradient_and_hessian!!Function
value_gradient_and_hessian!!(cache, f, x...)

Compute the value, gradient, and Hessian of f at x.

For single-argument f(x::Vector), returns (value, gradient, hessian_matrix). For multi-argument f(x1, x2, ...), returns (value, (g1, g2, ...), ((H11, H12, ...), (H21, H22, ...), ...)).

Only Vector{<:IEEEFloat} inputs are supported. All inputs must have the same element type.

source