Trapezoidal Integration
In this example, we want to parallelize the computation of a simple numerical integral via the trapezoidal rule. The latter is given by
\[\int_{a}^{b}f(x)\,dx \approx h \sum_{i=1}^{N}\frac{f(x_{i-1})+f(x_{i})}{2}.\]
The function to be integrated is the following.
f(x) = 4 * √(1 - x^2)
f (generic function with 1 method)
The analytic result of the definite integral (from 0 to 1) is known to be $\pi$.
Sequential
Naturally, we implement the trapezoidal rule as a straightforward, sequential for
loop.
function trapezoidal(a, b, n; h = (b - a) / n)
y = (f(a) + f(b)) / 2.0
for i in 1:(n - 1)
x = a + i * h
y = y + f(x)
end
return y * h
end
trapezoidal (generic function with 1 method)
Let's compute the integral of f
above and see if we get the expected result. For simplicity, we choose N
, the number of panels used to discretize the integration interval, as a multiple of the number of available Julia threads.
using Base.Threads: nthreads
N = nthreads() * 1_000_000
10000000
Calling trapezoidal
we do indeed find the (approximate) value of $\pi$.
trapezoidal(0, 1, N) ≈ π
true
Parallel
Our strategy is the following: Divide the integration interval among the available Julia threads. On each thread, use the sequential trapezoidal rule to compute the partial integral. It is straightforward to implement this strategy with tmapreduce
. The map
part is, essentially, the application of trapezoidal
and the reduction operator is chosen to be +
to sum up the local integrals.
using OhMyThreads
function trapezoidal_parallel(a, b, N)
n = N ÷ nthreads()
h = (b - a) / N
return tmapreduce(+, 1:nthreads()) do i
local α = a + (i - 1) * n * h # the local keywords aren't necessary but good practice
local β = α + n * h
trapezoidal(α, β, n; h)
end
end
# or equivalently
#
# function trapezoidal_parallel(a, b, N)
# n = N ÷ nthreads()
# h = (b - a) / N
# @tasks for i in 1:nthreads()
# @set reducer=+
# local α = a + (i - 1) * n * h
# local β = α + n * h
# trapezoidal(α, β, n; h)
# end
# end
trapezoidal_parallel (generic function with 1 method)
First, we check the correctness of our parallel implementation.
trapezoidal_parallel(0, 1, N) ≈ π
true
Then, we benchmark and compare the performance of the sequential and parallel versions.
using BenchmarkTools
@btime trapezoidal(0, 1, $N);
@btime trapezoidal_parallel(0, 1, $N);
24.348 ms (0 allocations: 0 bytes)
2.457 ms (69 allocations: 6.05 KiB)
Because the problem is trivially parallel - all threads to the same thing and don't need to communicate - we expect an ideal speedup of (close to) the number of available threads.
nthreads()
10
This page was generated using Literate.jl.