Julia Set
In this example, we will compute an image of the Julia set in parallel. We will explore the schedule
and nchunks
options that can be used to get load balancing.
The value of a single pixel of the Julia set, which corresponds to a point in the complex number plane, can be computed by the following iteration procedure.
function _compute_pixel(i, j, n; max_iter = 255, c = -0.79 + 0.15 * im)
x = -2.0 + (j - 1) * 4.0 / (n - 1)
y = -2.0 + (i - 1) * 4.0 / (n - 1)
z = x + y * im
iter = max_iter
for k in 1:max_iter
if abs2(z) > 4.0
iter = k - 1
break
end
z = z^2 + c
end
return iter
end
_compute_pixel (generic function with 1 method)
Note that the value of the pixel is the number of performed iterations for the corresponding complex input number. Hence, the computational workload is non-uniform.
Sequential computation
In our naive implementation, we just loop over the dimensions of the image matrix and call the pixel kernel above.
function compute_juliaset_sequential!(img)
N = size(img, 1)
for j in 1:N
for i in 1:N
img[i, j] = _compute_pixel(i, j, N)
end
end
return img
end
N = 2000
img = zeros(Int, N, N)
compute_juliaset_sequential!(img);
Let's look at the result
using Plots
p = heatmap(img)
Parallelization
The Julia set computation above is a map!
operation: We apply some function to each element of the array. Hence, we can use tmap!
for parallelization. We use CartesianIndices
to map between linear and two-dimensional cartesian indices.
using OhMyThreads: tmap!
function compute_juliaset_parallel!(img; kwargs...)
N = size(img, 1)
cart = CartesianIndices(img)
tmap!(img, eachindex(img); kwargs...) do idx
c = cart[idx]
_compute_pixel(c[1], c[2], N)
end
return img
end
# or alternatively
#
# function compute_juliaset_parallel!(img; kwargs...)
# N = size(img, 1)
# cart = CartesianIndices(img)
# @tasks for idx in eachindex(img)
# c = cart[idx]
# img[idx] = _compute_pixel(c[1], c[2], N)
# end
# return img
# end
N = 2000
img = zeros(Int, N, N)
compute_juliaset_parallel!(img);
p = heatmap(img)
Benchmark
Let's benchmark the variants above.
using BenchmarkTools
using Base.Threads: nthreads
N = 2000
img = zeros(Int, N, N)
@show nthreads()
@btime compute_juliaset_sequential!($img) samples=10 evals=3;
@btime compute_juliaset_parallel!($img) samples=10 evals=3;
nthreads() = 10
131.295 ms (0 allocations: 0 bytes)
31.422 ms (68 allocations: 6.09 KiB)
As hoped, the parallel implementation is much faster!
Dynamic vs static scheduling
As stated above, the per-pixel computation is non-uniform. Hence, we do benefit from the load balancing of the default dynamic scheduler. The latter divides the overall workload into tasks that can then be dynamically distributed among threads to adjust the per-thread load. We can try to fine tune and improve the load balancing further by increasing the ntasks
parameter of the scheduler, that is, creating more tasks with smaller per-task workload.
using OhMyThreads: DynamicScheduler
@btime compute_juliaset_parallel!($img; ntasks=N, scheduler=:dynamic) samples=10 evals=3;
17.438 ms (12018 allocations: 1.11 MiB)
Note that while this turns out to be a bit faster, it comes at the expense of much more allocations.
To quantify the impact of load balancing we can opt out of dynamic scheduling and use the StaticScheduler
instead. The latter doesn't provide any form of load balancing.
using OhMyThreads: StaticScheduler
@btime compute_juliaset_parallel!($img; scheduler=:static) samples=10 evals=3;
30.097 ms (73 allocations: 6.23 KiB)
This page was generated using Literate.jl.