I have been working on a software package
RandomTree.jl
that could effectively generate pretty
large (\(10^8\) nodes) random trees, in particular Conditional Galton-Watson trees. The package is
written in Julia, a language designed for high-performance-computation. It is high-level language
that is as easy to use as Python, but also has performance in the same order of C (if you use it
correctly).
I have experimented to implement this package with Python + Cython and Python + Numba. Both are quite impressive in terms of efficiency. But writing Cython code is so much like working in C that you lose the reason to use Python in the first place. While numba keeps things close to Python, unfortunately it is pretty slow at generating exponential random variables. Eventually I settled down in Julia and I have to say it's perhaps the best language to use for simulations and computations if you do not want to use C or CPP. It is definitely worth the time to learn it if you care about efficiency of your code.
NoneHow to use RandomTree.jl
¶
Load the packages¶
using RandomTree
using BenchmarkTools # For testing the speed
using Statistics
using Plots
using StatsBase
Genrate random trees¶
We generate a Catalan trees of size 5.
tree = CatalanTree(5)
degseq = degrees(tree)
println(degseq)
This tree is represented by its degree sequnce in depth-first-search order. It's corresponding to the following tree
drawtree(degseq, true)
The algorithm for generting the degree sequneces is very fast. For example, for trees of size $10^6$, it take about 30ms
tree = CatalanTree(10^6)
@btime degrees(tree);
For Cayley tree and Binary tree (0 or 2 children), it's even faster
tree = CayleyTree(10^6)
@btime degrees(tree);
tree = BinaryTree(10^6)
@btime degrees(tree);
We can also generate fixed full-d-ary trees
tree = FullDAryTree(4, 2)
drawtree(tree)
Simulations on trees¶
Several types of simulation on random trees are provided. For example, it is know that the number of leaves in a Cayley tree of size $n$ has expectation $n e^{-1}$ and has a normal limit distribution. Let's see if it's true by generate 2000 samples.
tree = CayleyTree(10^6)
sim = LeafSimulator(tree)
samples = simulation(sim, 2000);
The mean is about
mu = mean(samples)
Comparing with theoretical expectaion.
mu/(10^6*exp(-1))
If we shift the samples by its mean and rescale by square root of the variance and draw a histogram, it does look a bit like a normal distribution.
sigma = std(samples)
samples_rescaled = @. (samples-mu)/sigma;
histogram(samples_rescaled)
The moments also suggest a normal distribution.
print([round(moment(samples_rescaled, r), digits=3) for r in 1:7])
Drawing trees¶
To draw trees, you must have the graph drawing software Graphviz and this Python package installed.
If you want draw a tree from a degree sequence
drawtree([1, 1, 2, 0, 0], true)
If you want to draw some random trees
tree = CayleyTree(6)
drawtree(tree)
Conditional Galton-Watson trees are considered tall and skinny. Let's see it with our eyes.
tree = CayleyTree(1000)
drawtree(tree)
But random recursive trees are much shorter
tree = RandomRecursiveTree(1000)
drawtree(tree)