Useful patterns
This page includes some useful patterns using Transducers.jl.
using TransducersFlattening nested objects using MapCat
Simple MapCat usage
Consider a vector of "objects" (here just NamedTuples) which in turn contain a vector of objects:
nested_objects = [
(a = 1, b = [(c = 2, d = 3), (c = 4, d = 5)]),
(a = 10, b = [(c = 20, d = 30), (c = 40, d = 50)]),
];We can flatten this into a table by using Map inside MapCat:
using TypedTables
astable(xs) = copy(Table, xs) # using `TypedTables` for a nice display
table1 = nested_objects |> MapCat() do x
x.b |> Map() do b # not `MapCat`
(a = x.a, b...)
end
end |> astableTable with 3 columns and 4 rows:
a c d
┌───────────
1 │ 1 2 3
2 │ 1 4 5
3 │ 10 20 30
4 │ 10 40 50(Note that the transducer used inside MapCat is Map, not MapCat)
Nested MapCat
This pattern can handle more nested objects:
more_nested_objects = [
(a = 1, b = [(c = 2, d = [(e = 3, f = 4), (e = 4, f = 5)]),
(c = 6, d = [])]),
(a = 10, b = [(c = 20, d = [(e = 30, f = 40), (e = 40, f = 50)])]),
];By using nested MapCat (except for the "inner most" processing which uses Map since there is nothing to concatenate):
table3 =
more_nested_objects |> MapCat() do x
x.b |> MapCat() do b
b.d |> Map() do d
(a = x.a, c = b.c, d...)
end
end
end |> astableTable with 4 columns and 4 rows:
a c e f
┌───────────────
1 │ 1 2 3 4
2 │ 1 2 4 5
3 │ 10 20 30 40
4 │ 10 20 40 50Comparison with iterator comprehension
As a comparison, here is how to do it with iterator comprehension
rows = (
(a = x.a, c = b.c, d...)
for x in more_nested_objects
for b in x.b
for d in b.d
)
@assert Table(collect(rows)) == table3For a simple flattening and mapping, iterator comprehension as above perhaps is the simplest solution.
Note that Transducers.jl works well with iterator comprehensions. Transducers.jl-specific entry points like foldxl converts iterator comprehensions to transducers internally. eduction can be used to explicitly do this conversion:
@assert astable(eduction(rows)) == table3Complex MapCat example
For more complex processing that requires intermediate variables, the iterator comprehension does not work well. Fortunately, it is easy to use intermediate variables with transducers:
more_nested_objects |>
MapCat() do x
a2 = x.a * 2
x.b |> MapCat() do b
a2_plus_c = a2 + b.c
b.d |> Map() do d
c_plus_e = b.c + d.e
c_plus_f = b.c + d.f
(a2_plus_c = a2_plus_c, c_plus_e = c_plus_e, c_plus_f = c_plus_f)
end
end
end |>
astableTable with 3 columns and 4 rows:
a2_plus_c c_plus_e c_plus_f
┌──────────────────────────────
1 │ 4 5 6
2 │ 4 6 7
3 │ 40 50 60
4 │ 40 60 70MapCat with zip
Note also that MapCat can be combined with arbitrary iterator combinators such as zip
[(a = 1:3, b = 'x':'z'), (a = 1:4, b = 'i':'l')] |>
MapCat() do x
zip(x.a, x.b)
end |>
MapSplat((a, b) -> (a = a, b = b)) |>
astableTable with 2 columns and 7 rows:
a b
┌─────
1 │ 1 x
2 │ 2 y
3 │ 3 z
4 │ 1 i
5 │ 2 j
6 │ 3 k
7 │ 4 lMapCat with Iterators.product
... and product
[(a = 1:3, b = 'x':'z'), (a = 1:4, b = 'i':'l')] |>
MapCat() do x
Iterators.product(x.a, x.b)
end |>
Enumerate() |>
Filter(x -> x[1] % 5 == 0) |> # include only every five item
MapSplat((n, (a, b)) -> (n = n, a = a, b = b)) |>
astableTable with 3 columns and 5 rows:
n a b
┌─────────
1 │ 5 2 y
2 │ 10 1 i
3 │ 15 2 j
4 │ 20 3 k
5 │ 25 4 l"Missing value" handling with KeepSomething
Transducers.jl has a generic filtering such as Filter as well as type-based filtering such as NotA and OfType. These transducers can be used to filter out "missing values" represented as missing or nothing.
KeepSomething is a transducer that is useful for working on Union{Nothing,Some{T}}. It filters out nothing and yield itmes after applying something.
[nothing, 1, Some(nothing), 2, 3] |> KeepSomething(identity) |> collect4-element Vector{Union{Nothing, Int64}}:
1
nothing
2
3Thus, KeepSomething works well with any tools that operate on Union{Nothing,Some{T}}. Here is an example of using it with Maybe.jl. Consider a vector of heterogeneous dictionaries with varying set of keys:
heterogeneous_objects = [
Dict(:a => 1, :b => Dict(:c => 2)),
Dict(:a => 1), # missing key
Dict(:a => 1, :b => Dict()), # missing key
Dict(:b => Dict(:c => 2)), # missing key
Dict(:a => 10, :b => Dict(:ccc => 20)), # alternative key name
];Using @something and @? macros from Maybe.jl, we can convert this to a regular table quite easily:
using Maybe
using Maybe: @something
heterogeneous_objects |>
KeepSomething() do x
c = @something { # (1)
@? x[:b][:c]; # (2)
@? x[:b][:ccc]; # (3)
return; # (4)
}
@? (a = x[:a], c = c) # (5)
end |>
astableTable with 2 columns and 2 rows:
a c
┌───────
1 │ 1 2
2 │ 10 20In this example, for each dictionary x, the body of the do block works as follows:
- (1) Try to extract the item
c.- (2) First, try to get it from
x[:b][:c]. - (3) If
x[:b][:c]doesn't exist, tryx[:b][:ccc]next. - (4) If both
x[:b][:c]andx[:b][:ccc]do not exist, returnnothing.KeepSomethingwill filter out this entry.
- (2) First, try to get it from
- (5) Try to extract the item
afromx[:a].- If this does not exist, the whole expression wrapped by
@?evaluates tonothing. This, in turn, will be filtered out byKeepSomething. - If
x[:a]exists,@? (a = x[:a], c = c)evaluates toSome((a = value_of_a, c = value_of_c)). TheSomewrapper is unwrapped bysomethingcalled byKeepSomething.
- If this does not exist, the whole expression wrapped by
For more information, see the tutorial in Maybe.jl documentation.
Multiple outputs
Usually, reducers like sum and collect have one output. However we can use TeeRF etc. to "fan-out" input items to multiple outputs.
Multiple output vectors
Here is an example of creating two output vectors of integers and symbols in one go:
ints, symbols =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(TeeRF(
OfType(Int)'(push!!), # push integers to a vector
OfType(Symbol)'(push!!), # push symbols to a vector
))([1, 3, 4], [:two, :five])Here, we use TeeRF(rf₁, rf₂, ..., rfₙ) to fan-out input items to multiple reducing functions. To compose each reducing function, we use OfType transducer as reducing function transformation xf'(rf).
Handling empty results
Note that fold with push!! throws when the input is empty. To obtain an empty vector when the input is empty or all filtered out, we need to specify init. MicroCollections.jl includes a library of collections useful as init. Here, we can use EmptyVector:
using MicroCollections
ints, strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(TeeRF(
OfType(Int)'(push!!), # push integers to a vector
OfType(String)'(push!!), # push strings to a vector (but there is no string)
); init = EmptyVector())([1, 3, 4], Union{}[])Composed transducers with TeeRF
Each reducing function passed to TeeRF can use arbitrary complex transducers. Here is an example of filtering-in symbols and then map them to strings:
ints, strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(
TeeRF(
OfType(Int)'(push!!),
opcompose(OfType(Symbol), Map(String))'(push!!), # filter _then_ map
);
init = EmptyVector(),
)([1, 3, 4], ["two", "five"])Nested TeeRF
Each reducing function itself passed to TeeRF can even be composed using TeeRF (or other reducing function combinators; e.g., ProductRF). Here is an example of computing extrema on integers:
(imin, imax), strings =
[1, :two, missing, 3, 4, :five, 6] |>
Filter(!isequal(6)) |>
foldxl(
TeeRF(
OfType(Int)'(TeeRF(max, min)), # extrema on integers
opcompose(OfType(Symbol), Map(String))'(push!!), # filter _then_ map
);
init = ((typemin(Int), typemax(Int)), EmptyVector()),
)((4, 1), ["two", "five"])When input is a tuple: ProductRF
ProductRF is like TeeRF but it expects that the input is already a tuple:
ints, io =
[(1:3, 'x':'z'), nothing, (1:4, 'i':'l')] |>
NotA(Nothing) |>
foldxl(
ProductRF(
opcompose(Cat(), Filter(isodd))'(push!!), # process 1:3 etc.
Cat()'((io, char) -> (write(io, char); io)), # process 'x':'z' etc.
);
init = (EmptyVector(), IOBuffer()),
);String(take!(io))"xyzijkl"ints4-element Vector{Int64}:
1
3
1
3When input is a row: DataTools.oncol
oncol from DataTools.jl is like ProductRF but acts on NamedTuple (as well as any Accessors.jl-compatible possibly nested objects).
using DataTools
foldxl(oncol(a = +, b = *), [(a = 1, b = 2), (a = 3, b = 4)])(a = 4, b = 8)This page was generated using Literate.jl.