Interface
This is the public interface that day-to-day users of AD are expected to interact with if for some reason DifferentiationInterface.jl does not suffice. If you have not tried using Mooncake.jl via DifferentiationInterface.jl, please do so. See Tutorial for more info.
Mooncake.Config — Type
Config(; debug_mode::Bool=false, silence_debug_messages::Bool=false)Configuration struct for use with ADTypes.AutoMooncake.
Keyword Arguments
debug_mode::Bool=false: whether or not to run additional type checks when differentiating a function. This has considerable runtime overhead, and should only be switched on if you are trying to debug something that has gone wrong in Mooncake.silence_debug_messages::Bool=false: iffalseanddebug_modeistrue, Mooncake will display some warnings that debug mode is enabled, in order to help prevent accidentally leaving debug mode on. If you wish to disable these messages, set this totrue.
Mooncake.value_and_derivative!! — Function
value_and_derivative!!(rule::R, f::Dual, x::Vararg{Dual,N})Returns a Dual containing the result of applying forward-mode AD to compute the (Frechet) derivative of primal(f) at the primal values in x in the direction of the tangent values in f and x.
Mooncake.value_and_gradient!! — Method
value_and_gradient!!(cache::Cache, f, x...; args_to_zero=(true, ...))Computes a 2-tuple. The first element is f(x...), and the second is a tuple containing the gradient of f w.r.t. each argument. The first element is the gradient w.r.t any differentiable fields of f, the second w.r.t the first element of x, etc.
Assumes that f returns a Union{Float16, Float32, Float64}.
As with all functionality in Mooncake, if f modifes itself or x, value_and_gradient!! will return both to their original state as part of the process of computing the gradient.
cache must be the output of prepare_gradient_cache, and (fields of) f and x must be of the same size and shape as those used to construct the cache. This is to ensure that the gradient can be written to the memory allocated when the cache was built.
cache owns any mutable state returned by this function, meaning that mutable components of values returned by it will be mutated if you run this function again with different arguments. Therefore, if you need to keep the values returned by this function around over multiple calls to this function with the same cache, you should take a copy (using copy or deepcopy) of them before calling again.
The keyword argument args_to_zero is a tuple of boolean values specifying which cotangents should be reset to zero before differentiation. It contains one boolean for each element of (f, x...). It is used for performance optimizations if you can guarantee that the initial cotangent allocated in cache (created by zero_tangent) never needs to be zeroed out again.
Example Usage
f(x, y) = sum(x .* y)
x = [2.0, 2.0]
y = [1.0, 1.0]
cache = prepare_gradient_cache(f, x, y)
value_and_gradient!!(cache, f, x, y)
# output
(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))Mooncake.value_and_pullback!! — Method
value_and_pullback!!(cache::Cache, ȳ, f, x...; args_to_zero=(true, ...))If f(x...) returns a scalar, you should use value_and_gradient!!, not this function.
Computes a 2-tuple. The first element is f(x...), and the second is a tuple containing the pullback of f applied to ȳ. The first element is the component of the pullback associated to any fields of f, the second w.r.t the first element of x, etc.
There are no restrictions on what y = f(x...) is permitted to return. However, ȳ must be an acceptable tangent for y. This means that, for example, it must be true that tangent_type(typeof(y)) == typeof(ȳ).
As with all functionality in Mooncake, if f modifes itself or x, value_and_gradient!! will return both to their original state as part of the process of computing the gradient.
cache must be the output of prepare_pullback_cache, and (fields of) f and x must be of the same size and shape as those used to construct the cache. This is to ensure that the gradient can be written to the memory allocated when the cache was built.
cache owns any mutable state returned by this function, meaning that mutable components of values returned by it will be mutated if you run this function again with different arguments. Therefore, if you need to keep the values returned by this function around over multiple calls to this function with the same cache, you should take a copy (using copy or deepcopy) of them before calling again.
The keyword argument args_to_zero is a tuple of boolean values specifying which cotangents should be reset to zero before differentiation. It contains one boolean for each element of (f, x...). It is used for performance optimizations if you can guarantee that the initial cotangent allocated in cache (created by zero_tangent) never needs to be zeroed out again.
Example Usage
f(x, y) = sum(x .* y)
x = [2.0, 2.0]
y = [1.0, 1.0]
cache = Mooncake.prepare_pullback_cache(f, x, y)
Mooncake.value_and_pullback!!(cache, 1.0, f, x, y)
# output
(4.0, (NoTangent(), [1.0, 1.0], [2.0, 2.0]))Mooncake.prepare_derivative_cache — Function
prepare_derivative_cache(fx...; kwargs...)Returns a cache used with value_and_derivative!!. See that function for more info.
Mooncake.prepare_gradient_cache — Function
prepare_gradient_cache(f, x...)Returns a cache used with value_and_gradient!!. See that function for more info.
The API guarantees that tangents are initialized at zero before the first autodiff pass.
Mooncake.prepare_pullback_cache — Function
prepare_pullback_cache(f, x...)Returns a cache used with value_and_pullback!!. See that function for more info.
The API guarantees that tangents are initialized at zero before the first autodiff pass.