*This is part 1 of a series explaining what is Harmonic number and it's connection with Zeta
functions. The post is written as a Jupyter notebook with Julia kernel. You can download it
here.*

None

## Harmonic number¶

If you took some discrete math courses, you have probably seen this sum $$ \newcommand{\Bold}[1]{\mathbf{#1}}{\sum_{i=1}^{m} \frac{1}{i}} $$ This is called the $m$-th Harmonic number and is often written as $H_m$.

Why should we care? Well, for one thing, if there $10$ different coupons, and the type you can is uniform at random whenever you buy one, then on average you need to buy to get all of them is $$ \newcommand{\Bold}[1]{\mathbf{#1}}{\sum_{i=0}^{m-1} \frac{m}{m-i}} = m H_{m} $$ This is the famous coupon collectors problem 🎉️.

Let's see how we can compute and approximate this in Julia.
But first, Julia is not a CAS (computer algebra system).
But fortunately, we have `SymPy.jl`

.
It allows us to use `SymPy`

, a CAS written entirely in Python. A good source of learning to use `SymPy.jl`

is Symbolic math with julia.

Of course you will need to install Python and `SymPy`

itself first. And to run this notebook, you also need to add the following Julia packages.

```
using Pkg;
Pkg.add("SymPy")
Pkg.add("Plots")
Pkg.add("BenchmarkTools")
Pkg.add("LaTeXStrings")
Pkg.add("FastRationals")
```

```
using SymPy
@vars i m;
harmonic = summation(1/i, (i, 1, m))
```

What we get here is a symbolic object (representing a math symbol) of type `Sym`

```
typeof(harmonic)
```

To compute $H_m$, we just need to replace $m$ in the expression `harmonic`

by a integer.

```
subs(harmonic, m, 10)
```

But there is a short cut for this substitution. We can use `harmonic`

just as if it is a Julia function.

```
h10 = harmonic(10)
```

The first ten harmonic numbers are

```
transpose(map(harmonic, 1:10))
```

## The speed¶

How fast can `SymPy`

compute the Harmonic numbers? We can find out with BenchmarkTools.jl.
We load it and use the marco `@benchmark`

test how faster can we compute `harmonic(10000)`

. The macro runs the computation many times to get an average performance.
It will take a while. Make a tea 🍵️ for your self.

```
using BenchmarkTools
```

```
m0 = 10^4
@benchmark harmonic(m0)
```

As you can see, the longest time it take to compute `harmonic(10^4)`

is about 270 times slower than the median time.
Jeffery Sarnoff pointed out to me that

As far as I know, when the

`maximum time`

reported by`@benchmark`

is much larger than the median time (as usually is the case) you are seeing overhead unrelated to the function evaluation itself (e.g. the OS is handling other things). It is important for realtime systems and mostly irrelevant otherwise.

Though I did check the code of`SymPy`

and found that it does some memoization here -- After the first time `H_m`

is computed, `SymPy`

just remembers it. This may also explains the difference.

## Three Julia implementations¶

Of course you can also do this purely with Julia. Don't forget to use `//`

instead of `/`

, the first gives rational number, the second gives a float number.

```
harmonic_j(m) = sum(i->1//big(i), 1:m)
```

```
harmonic_j(10^4) == harmonic(m0)
```

```
@benchmark harmonic_j(m0)
```

This is a bit slow.
It turns out that `Rational{BigInt}`

is just not.
But there is a package `FastRationsl.jl`

which addresses this problem.

```
using FastRationals
```

```
harmonic_fast(m) = sum(i->FastQBig(1, i), 1:m)
```

```
harmonic_fast(m0) == N(harmonic(m0))
```

```
@benchmark harmonic_fast(m0)
```

Almost 20 times speed up. Not bad. But why don't we try `SymPy`

's idea of memoization?

```
const harmonic_numbers = Array{FastRational{BigInt}, 1}()
function harmonic_cached(m)
m1 = length(harmonic_numbers)
if m <= m1
return harmonic_numbers[m]
end
if m1 > 0
ret = harmonic_numbers[m1]
else
ret = 0
end
for i in m1+1:m
ret += FastQBig(1, i)
push!(harmonic_numbers, ret)
end
return ret
end
```

Let's make sure we have not made any mistakes

```
all([harmonic_cached(m) == harmonic_fast(m) for m in 1:100])
```

How fast is this version?

```
@benchmark harmonic_cached(m0)
```

As you can, this is 200 times faster! And much much faster than calling `SymPy`

.

## Approximations¶

But for any practical purpose, e.g., you want to buy coupons, we don't really need the exact value of harmonic number. A good approximation is enough.

You probably remember that $H_m \sim \log m$. This is to say that when $m$ is really large, the two numbers are quite close. Let's draw a picture.

```
using Plots; gr();
```

We load `LatexString`

so we can use LaTeX in the plots.

```
using LaTeXStrings
```

```
xpos = 1:30:1001;
harmonic_values = map(m->BigFloat(harmonic_cached(m)), xpos);
scatter(xpos, harmonic_values, label=L"H_m", legend=:bottomright)
plot!(log, 1:1001, label=L"\log(m)")
```

It looks like that $\log(m)$ is indeed close to $H_m$, but there seems to be a gap.

In fact, we can do much better. We know that $$ H_m \sim \log m + \gamma + \frac{1}{2 m} - \sum_{k=1}^\infty \frac{B_{2 k}}{2 k m^{2 k}} $$ where $\gamma$ is Euler's constant and $B_{k}$ are Bernoulli numbers. (A good way to become immortal is to find a useful constant and people will name it after yourself. 😅)

So the more terms in the above sum, the better approximation we have for $H_m$. Let's try $k=0$ and $k=1$

```
function harmonic_number_approx(m0, k0)
approx = log(m0) + sympy.EulerGamma + 1/(2*m0)
if k0 >=1
approx -= sum([sympy.bernoulli(2*k)/(2*k*m^(2*k)) for k in 1:k0])
end
return approx
end
```

```
harmonic_approx0 = harmonic_number_approx(m, 0)
harmonic_approx0
```

```
harmonic_approx1 = harmonic_number_approx(m, 1)
harmonic_approx1
```

Let's see how this works.

```
approx0_values = map(m->harmonic_approx0(m).evalf(), xpos);
approx1_values = map(m->harmonic_approx1(m).evalf(), xpos);
```

```
scatter(xpos, harmonic_values, label=L"H_m", legend=:bottomright)
plot!(log, 1:1001, label=L"\log(m)")
plot!(xpos, approx0_values, 1:1001, label=latexstring(sympy.latex(harmonic_approx0)))
plot!(xpos, approx1_values, 1:1001, label=latexstring(sympy.latex(harmonic_approx1)))
```

The two new apprximations looks much better than $\log(m)$. But we cannot really tell which of them is better by look at the picture. So let's plot the error instead. To see anything, we have to use log scale plot because the error is so small.

```
scatter(xpos, abs.((approx0_values./harmonic_values) .- 1), label=latexstring(sympy.latex(harmonic_approx0)), yscale=:log10)
scatter!(xpos, abs.((approx1_values./harmonic_values) .- 1), label=latexstring(sympy.latex(harmonic_approx1)), yscale=:log10)
```

As you can see the relative error of the second approximation is quite good. So for large $m$ we can safely use it instead of computing the exact value of harmonic numbers. This is much faster! For example, it takes a bit time (2 seconds) to compute

```
@time harmonic_cached(3*10^4);
```

But for the approximation

```
@time harmonic_approx1(3*10^4).evalf()
```

Life is short. Let's move fast!