How to write X in parallel.

using FLoops

In-place mutation

Mutable containers can be allocated in the init expressions (zeros(3) in the example below):

@floop for x in 1:10
    xs = [x, 2x, 3x]
    @reduce() do (ys = zeros(3); xs)
        ys .+= xs
    end
end
ys

3-element Vector{Float64}:
  55.0
 110.0
 165.0

Mutating objects allocated in the init expressions is not data race because each basecase "owns" such mutable objects. However, it is incorrect to mutate objects created outside init expressions.

Note

Technically, it is correct to mutate objects in the loop body if the objects are protected by a lock. However, it means that the code block protected by the lock can only be executed by a single task. For efficient data parallel loops, it is highly recommended to use non-thread-safe data collection (i.e., no lock) and construct the @reduce block that efficiently merge two mutable objects.

INCORRECT EXAMPLE

This example has data race because the array ys0 is shared across all base cases and mutated in parallel.

ys0 = zeros(3)
@floop for x in 1:10
    xs = [x, 2x, 3x]
    @reduce() do (ys = ys0; xs)
        ys .+= xs
    end
end

Data race-free reuse of mutable objects using private variables

To avoid allocation for each iteration, it is useful to pre-allocate mutable objects and reuse them. We can use @init macro to do this in a data race-free ("thread-safe") manner:

@floop for x in 1:10
    @init xs = Vector{typeof(x)}(undef, 3)
    xs .= (x, 2x, 3x)
    @reduce() do (ys = zeros(3); xs)
        ys .+= xs
    end
end
ys

3-element Vector{Float64}:
  55.0
 110.0
 165.0

This page was generated using Literate.jl.