The Prime Numbers' Music, Staircase and Riemann Hypothesis¶

For DSP, mathematics is a fundamental tool to understand the different techniques. Hence, I'm also interested in different aspects of mathematics. And what aspect could be more "pure" mathematics than number theory and prime numbers? In this regard, to at least understand the still unsolved Riemann Hypothesis, where a price money of $1,000,000 will be given to the one who proves or disproves it, has been bothering me for quite some time. Despite a lot of material about the hypothesis being online, it was the book of Barry Mazur and William Stein entitled "Prime Numbers and the Riemann Hypothesis" which provided me at least a superficial understanding of it.

The nice thing about this book is its very simple approach, requiring not more than high school math. Each page contains graphs and illustrations of the analyzed functions and hence makes it quite easy to follow. Accordingly, it's a very nice book to get to know a bit about the Riemann Hypothesis (RH). However, admittedly, the mathematical treatment is intentionally quite superficial.

In this notebook, I do not want to directly talk about the RH. Instead, the notebook arose when I tried to replicate some calculations from the book. In particular, I was completely astonished, when I saw that the prime-counting function $\pi(x)$ can be written as an infinite sum of continuous functions:

$$\text{Number of primes below $x$}=\pi(x) = R_0(x)+\sum_{k=1}^{\infty} T_k(x)$$

where $R_0(x)$ is some smooth function and the $T_k(x)$ are oscillating correction terms which finally create the prime staircase.

As the numerics seemed too straight-forward I sat down and just tried to replicate this calculation. However, it didn't work out directly and hence some more research was necessary. In this notebook, I want to describe my calculations. As a bonus, I give Marcus du Sautoy's title The Music of the Primes (which is another great to read book) a direct meaning, because we will actually hear how these correction functions sound.

As a teaser, here are some sound samples. Scroll down to find out what they actually represent!

Tk(t), k=1

Tk(t), k=100

So, let's get started. For the calculations, I will make use of the mpmath library, which has a lot of number-theoretic functions already implemented. In particular, Riemann's $R(x)$ function is already there. Please also dont care too much about mathematical exactness, we will just play around with the functions a bit.

import mpmath

Let's first calculate the first few prime numbers (to recap: a prime number is a number that is only divisible by 1 and itself):

N = 1000000
t = np.arange(0, N, 0.2)
primes = np.array(mpmath.mp.list_primes(N))
print ("First 20 prime numbers:")
print (primes[:20])

First 20 prime numbers:
[ 2  3  5  7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71]

Now, let's mark each prime number on the number line:

prime_diracs = np.zeros(N)
prime_diracs[primes] = 1
plt.plot(prime_diracs[:100])

Nothing special here. Let's understand this number line as a time-domain function, which we can directly convert to sound. For comparison, we will also play white noise to see if the "music" is actually different from noise:

print ("White noise:")
display(Audio(data=np.random.randn(len(prime_diracs)), rate=44100))
print ("Primes as Diracs:")
display(Audio(data=prime_diracs, rate=44100))

White noise:

Primes as Diracs:

Ah, the primes as diracs sound kinda like noise, however there is definitely some tone to hear! So, there seem to be some rules in the prime numbers, being more than randomly distributed. Let's see if we can see this tone in the spectrogram:

f, t, Sxx = scipy.signal.spectrogram(prime_diracs)
plt.pcolormesh(t, f, Sxx);

Yes, there are two lines visible, which make up the tones we hear. Interesting...

The Prime-counting Function $\pi(x)$¶

Anyway, let's go on and move to the actual prime staircase, the prime-counting $\pi(x)$ function. This function returns the number of primes below or equal to a given number $x$. We can implement it as follows:

def primepi(x, prime_list=None):
    if prime_list is None:
        return np.vectorize(mpmath.fp.primepi)(x)
    else:
        def _single(n):
            return np.searchsorted(prime_list, n)
        return np.vectorize(_single)(x)

Here, I made use of the precalculated list of primes to speed up the calculation. Anyway, it takes some time, it's definitely not optimal, but we don't care for now. Let's visualize the function:

N2 = 100000
n = np.arange(0, N2, 0.2)
P = primepi(n, prime_list=primes)   # this calculation takes some time

plt.figure(figsize=(8,3))
plt.subplot(121);  plt.plot(n, P); plt.grid(True); plt.xlabel('$x$'); plt.ylabel(r'$\pi(x)$');
plt.subplot(122); plt.plot(n, P); plt.xlim((0,20)); plt.ylim((0,10)); plt.grid(True); plt.xlabel('$x$'); plt.ylabel(r'$\pi(x)$');

Looking at the full scale (left figure), $\pi(x)$ looks very smooth. However, if we zoom in, we see that the function is actually a stair case. For example $\pi(5)=3$, as there are the prime numbers 2,3,5 less or equal to 5. Similarly, $\pi(15)=6$, as there are 2,3,5,7,11,13 smaller than 15.

Now, the question how this function $\pi(x)$ behaves in infinity is fundamental. It has been conjectured already by Gauss that $\pi(x)$ follows roughly the logarithmic integral $li(x)=\int_0^{x}dz/\log(x)$ and the celebrated and proven Prime number theorem ensures that $li(x)/\pi(x)\rightarrow 1$ for $x\rightarrow \infty$. Let's look how $li(x)$ behaves compared to $\pi(x)$:

li = np.vectorize(mpmath.fp.li, otypes=[float])
L = li(n[::10])  # make calculation a bit faster by calculating less samples

plt.subplot(121)
plt.plot(n, P, label='$\\pi(x)$')
plt.plot(n[::10], li(n[::10]), label='$li(x)$')

plt.subplot(122)
plt.plot(n, P, label='$\\pi(x)$')
plt.plot(n[::10], li(n[::10]), label='$li(x)$')
plt.xlim((100, 200)); plt.ylim((0,70))

Indeed, $li(x)$ and $\pi(x)$ are very close together. But, we can do even better!

Riemann's approximation to $\pi(x)$¶

In his 1859 landmark paper Riemann came up with a function $R(x)$ which is even closer to $\pi(x)$, which we call the Riemann $R$ function here. Its definition is a bit strange at first, given by

$$ R(x)=\sum_{n=1}^{\infty}\frac{\mu(n)}{n}li(x^{\frac{1}{n}}),$$

i.e. it's a sum of more and more elongated li functions, weighted by $\frac{1}{n}$ and the Moebius function $\mu(n)$. Let's not worry too much about it, but instead plot it. Here, mpmath offers a direct evaluation of $R$:

riemannr = np.vectorize(mpmath.fp.riemannr, otypes=[complex])

R = riemannr(n).real  # this calculation takes some time

plt.subplot(121);
plt.plot(n, R, label='Riemann\'s R function $R(x)$');
plt.plot(n, P, label='$\\pi(x)$'); 

plt.subplot(122);
plt.plot(n, R, label='Riemann\'s R function $R(x)$');
plt.plot(n, P, label='$\\pi(x)$'); 
plt.plot(n[::10], L, label='$li(x)$')
plt.xlim((100,200)); plt.ylim((0,70))

See, how perfect the Riemann function approximates $\pi(x)$! Now, one formulation of the Riemann Hypothesis is that $R(x)$ (but also $li(x)$) approximates $\pi(x)$ very well in infinity, or more precisely, the approximation is square-root accurate (meaning the absolute error grows only with the square root of the function value, or the relative error quickly drops to zero).

Anyway, we are not quite done yet, as Riemann has found a way to exactly express $\pi(x)$ by an infinite sum. Let us first examine the error made by the $R$ function:

plt.plot(n, R-P)  # plot error between pi(x) and R(x)
plt.grid(True); plt.xlabel('$x$'); plt.ylabel('$R(x)-\\pi(x)$');

OK, the error is shaky, looking like a waveform. So, let's listen to it:

Audio(data=(R-P), rate=44100)

Again, it sounds like a lot of noise, but still there are some tones contained. I interpret these as the "non-randomness" in the prime numbers. Let's see, if we find these tones in the spectrogram:

f, t, Sxx = scipy.signal.spectrogram(R-P)
plt.pcolormesh(t, f, 10*np.log10(Sxx));

Yes! In the spectrogram, we see some lines corresponding to the tones. Though, I leave the interpretation of the frequencies of these tones to others. Also, one would need to see if the tones change, when we go to higher and higher numbers.

The spectrum of the prime numbers¶

In "Prime Numbers and the Riemann Hypothesis", we see a nice exercise. Despite occuring somehow artificial, they construct the following function:

$$F_{\leq C}(t) = \sum_{p^n\leq C}\frac{\log(p)}{p^{n/2}}\cos(t\log(p^n)).$$

Here, the sum runs over all power of prime numbers $p^n$ ($p$ is prime) with $p^n\leq C$. Let's write this function:

def F(t, C, prime_list):
    use_primes = np.array([p for p in prime_list if p <= C])
    result = 0
    for p in use_primes:
        p_power = p
        while p_power <= C:
            result = result + np.log(p)/np.sqrt(p_power)*np.cos(t*np.log(p_power))
            p_power = p_power * p
    return -result

def showFC(C):
    t = np.linspace(10, 100, 10001)
    plt.plot(t, F(t, C, prime_list=primes))

Let's plot this function for different limits $C$, and only look at the positive part:

showFC(5); plt.ylim((0,2));

For $C=5$ we already see some peaks occuring. Let's move on for higher $C$:

showFC(50); plt.ylim((0,5)); plt.title('C=50'); plt.tight_layout()
showFC(500); plt.ylim((0,8)); plt.title('C=500'); plt.tight_layout();

OK, we see with increasing $C$, the peaks become sharper, but their positions remain the same. The first peaks roughly appears at $t_1=14$, the second at $t_2=21$, the third one at $t_3=25$. We don't know yet, what this exercise tells us, but keep in mind the three values 14, 21, 25 and let us call these peaks the spectrum of the primes.

Approximating the stair-case structure of $\pi(x)$ with oscillating correction terms¶

Let us now focus on numerically following what Riemann found out and what is described in the book by Mazur and Stein. To understand this, Stein and Mazur gave the following equation for the approximation of $\pi(x)$:

$$\pi(x)=R_0(x) - \sum_{k=1}^{\infty}R(x^{\frac{1}{2}+i\theta_k})+R(x^{\frac{1}{2}-i\theta_k})$$

where $\theta_k$ are the imaginary parts of the non-trivial zeros of the Riemann Zeta function. It is known that this complex-valued function has infinitely many zeros, and another formulation of the Riemann Hypothesis states, that all non-trivial (i.e. interesting) zeros, have real part $\frac{1}{2}$. Hence, all zeros are located on a line, called the critical line given by $z=\frac{1}{2}+it$. Let's have a look at the zeta function on the critical line. Again, the mpmath library already has an implementation of the zeta function:

zeta = np.vectorize(mpmath.fp.zeta, otypes=[complex])

im = np.linspace(0, 100, 1000)
Z = zeta(0.5+1j*im)  # Calculate the values of the zeta function on the critical line
plt.plot(im, np.real(Z), label='Re')
plt.plot(im, np.imag(Z), label='Im')
plt.plot(im, abs(Z), label='abs')

Indeed, there are some zeros on this line. The location of the first few trillion zeros are known, and the first few are:

zeta_zeros = [
14.134725142,21.022039639,25.010857580,30.424876126,32.935061588,
37.586178159,40.918719012,43.327073281,48.005150881,49.773832478,
52.970321478,56.446247697]

Do you remember 14, 21 and 25 from the exercise above? They are equal to the first 3 zeros of the zeta function! Do they match the position of the zeros by coincidence? We cannot be sure (as we haven't proven the RH yet), but given RH is true, then the spectrum of the primes matches the zeros of the zeta function. I find this a fascinating connection.

Let's verify these zeros with our zeta plot:

im = np.linspace(0, 100, 1000)

plt.plot(im, abs(Z), label='abs')
for z in zeta_zeros:
    plt.axvline(z, color='r')

Yes, the zeta plot and the zeros overlap. So, let's calculate the first 100 zeros using the mpmath library. Again, check that they match the known zeros:

zeta_zeros = [mpmath.fp.zetazero(n).imag for n in range(1, 101)]
zeta_zeros[:5]

[14.134725141734693,
 21.02203963877155,
 25.01085758014569,
 30.424876125859512,
 32.93506158773919]

We can now proceed and try to replicate the calculation as proposed in Stein's and Mazur's book. We start out with defining a shorter range, as the calculations would take too long otherwise:

n_short = np.arange(10, 1000, 0.2)
P_short = primepi(n_short, prime_list=primes)
R_short = riemannr(n_short).real

Let's plot again $\pi(x)$ and $R(x)$ in the smaller range.

plt.plot(n_short, P_short, label='$\\pi(x)$');
plt.plot(n_short, R_short, label='$R(x)$');

Remember, our aim was to exactly replicate $\pi(x)$ by adding some correction terms to $R(x)$. In the referenced book, the equation for $\pi(x)$ is given as (see. p 116,117):

$$\pi(x) = R(x) + \sum_{k=1}^{\infty}C_k(x) $$

where $C_k(x)$ is defined as

$$ C_k(x) = -R(x^{\frac{1}{2}+i\theta_k})-R(x^{\frac{1}{2}-i\theta_k}) $$

where $\theta_k$ is the imaginary part of the $k$th non-trivial zero of the zeta function (actually, in above equation for $\pi$ it's not $R(x)$ but $R_0(x)$ which is a slightly modified version of $R(x)$, which we silently ignore here). Let's go ahead and implement this:

def Ck(x, theta):
    return np.real(-(riemannr(x**(0.5+1j*theta))+riemannr(x**(0.5-1j*theta))))

Note that we can write the argument as

$$x^{\frac{1}{2}+i\theta_k}=e^{\log(x)(\frac{1}{2}+i\theta_k)}=\sqrt{x}e^{j\theta_k\log(x)},$$

i.e. the argument to the $R$ function is a complex number, which grows with $\sqrt{x}$ and rotates with decaing frequency, since $\log(x)$ grows very slowly. This is already one hint on the name "The Music of the Primes", as these oscillations can be understood as sound waves. We'll get to hear them later on.

For now, let us just evaluate these functions and their sum. We plot $C_k(x)$ for different $k$ and track their sum as the correction term:

plt.subplot(211)
Csum = 0
for k in range(5):
    C = Ck(n_short, _zeta_zeros[k])
    Csum = Csum + C
    plt.plot(n_short, C, label='$C_%d(x)$' % (k+1))
plt.subplot(212)
plt.plot(n_short, Csum, label='$\\sum_kC_k(x)$', lw=3);

Let's now add this correction term to $R$ and see, if it matches $\pi(x)$:

plt.plot(n_short, R_short+Csum)
plt.plot(n_short, P_short);

Whoa, unfortunately not! The "corrected" function is much worse and does not match $\pi(x)$ at all. Why is that I thought? I exactly recapitulated the equations from the book. Where's the error? I don't know, it is either an erroneous implementation on my side, an inexact formulation in the book, or the implementation of $R$ from mpmath does have problems with complex arguments. As I couldn't find an immediate solution, I consulted a paper by Hans Riesel und Gunnar Gohl which is referred to in the book. There, the authors state that (see eq (17))

$$ \pi(x) = R_0(x)+\sum_{k=1}^{\infty}T_k(x) $$

where $R_0(x)$ is again the slightly modified $R(x)$ (which we dont care for now) and $T_k(x)$ are given by

$$ T_k(x)=-\sum_{n=1}^{\infty} \frac{\mu(n)}{n} \left(li(x^{\rho_k/n})+li(x^{\rho_k^*/n})\right), $$

where $\rho_k=\frac{1}{2}+i\theta_k$ is the $k$th non-trivial zero of the zeta function. By examining this closely, we actually come to the conclusion that this actually matches

$$ T_k(x)=-(R(x^{\frac{1}{2}+i\theta_k})+R(x^{\frac{1}{2}-i\theta_k})), $$

so, we might conclude that either my or the mpmath implementation is wrong. Anyway, Riesel and Gohl give an approximate expression by $li(x^\rho)+li(x^{\rho^*})\sim 2\Re\left\{\frac{\exp(\rho \log(x))}{\rho \log(x)}\right\}$, which we implement directly as follows:

def myli(x, rho):  # approximate -(li(x**rho)+li(x**rho.conj()))
    X = rho*np.log(x)
    return -2*np.real(np.exp(X)/X)

We are now able to replicate Fig. 1 from Riesel/Gohl's paper:

L = 5
for k in range(L):
    plt.subplot(L, 1, k+1)
    plt.plot(n_short, myli(n_short, 0.5+1j*zeta_zeros[k]), label='$C_%d(x)$' % k); 
    plt.xlim((10,100)); plt.grid(True); plt.ylim((-0.3, 0.3))
    plt.yticks((-0.2, 0, 0.2))

This figure just shows the behaviour of the components of $T_k$ and is y itself not interesting. However, for calibration reasons, it's always good to replicate the fundamental figures as well. Now, let's proceed and define the $T_k$ function:

def Tk(x, rho):
    N = 20
    return sum(mpmath.mp.moebius(n)/n*myli(x, rho/n) for n in range(1, N))

Again, we are now able to reproduce Fig. 2 from Riesel/Gohl, showing the behaviour of the different correction components, which will eventually lead to an exact expression for the $\pi(x)$:

L = 5
for k in range(L):
    plt.subplot(L+1, 1, k+1)
    plt.plot(n_short, Tk(n_short, 0.5+1j*zeta_zeros[k]), label='$T_%d(x)$' % (k+1))
plt.subplot(L+1, 1, L+1)
plt.plot(n_short, Tk(n_short, 0.5+1j*zeta_zeros[-1]), label='$T_{%d}(x)$' % len(zeta_zeros));

Some observations about these graphs:

the functions have an interesting behaviour in the beginning, but become simple oscillations for higher $t$.
the higher $k$, the smaller becomes the oscillation amplitude
the higher $k$, the faster is the oscillation
the higher $k$, the longer are the interesting oscillations of the correction functions.

Now, we are finally able to replicate the prime-counting function $\pi(x)$ by a summation over the smooth $T_k(x)$ functions, which are parameterized by the zeros of the zeta function (or the spectrum of the primes)

correction = 0;
for k in range(len(zeta_zeros)):
    correction = correction + Tk(n_short, 0.5+1j*zeta_zeros[k])  # Calculate the correction function

def show_RPRs():
    plt.plot(n_short, P_short)
    plt.plot(n_short, R_short)
    plt.plot(n_short, R_short+correction);
    
plt.subplot(211); show_RPRs()
plt.subplot(223); show_RPRs(); plt.xlim((10, 20)); plt.ylim((2, 10));
plt.subplot(224); show_RPRs(); plt.xlim((80, 90)); plt.ylim((20, 25));

Isn't that astonishing? The staircase is nicely recreated by the summation of the smooth and oscillating correction functions (Note the constant deviation between the red and blue curve in the bottom-left figure? This is due to the fact that we used $R(x)$ rather than $R_0(x)$ for the base function). Isn't it fascinating, how the zeta zeros dictate the frequencies of the oscillations such that the staircase is exactly resembled? We see that the approximation is more exact for smaller numbers, this is clear as we saw that the interesting behaviour of the $T_k(x)$ is concentrated towards small $x$.

We could now play around with the code to incorporate more and more zeros to get more and more exact representations of $\pi(x)$, but that would not be very interesting. Instead, let us look at how the correction function looks like in the end:

plt.subplot(121); plt.plot(n_short, correction); 
plt.subplot(122); plt.plot(n_short, correction); plt.xlim((20,50));

In the big picture (left figure), we see that the behaviour gradually changes from a fast-changing to a slowly changing function. This is clear, as only zeros with higher values will create significant behaviour for higher $x$. If we zoom in (right figure), we see that the function is similar to a sawtooth and each jump occurs at a prime (we have marked the primes with black lines). Hence, all information about the primes is fully contained in the correction term alone.

The music of the correction terms¶

Now, that we have seen the correction terms and we know they are essentially oscillating functions, we can listen to them. This will not lead to new insights, but anyway, it is nice to hear. First, we calculate the correction terms for a bit longer range:

t = np.arange(2, 10000, 0.1)  # this calculation takes time
tks = []
for zero in zeta_zeros:
    tk = Tk(t, 0.5+1j*zero)
    tks.append(tk)

Let's first play the correction terms for the first and the highest calculated zeta zero:

print ("First correction term")
display(Audio(data=tks[0], rate=44100))
print ("Last correction term")
display(Audio(data=tks[-1], rate=44100))

First correction term

Last correction term

Sounds as we expected, especially the second sound: Essentially, the correction terms are down-chirps (i.e. waveforms with decreasing instantaneous frequency). However, the frequency of the curves is quickly too small to hear. Hence, let's move the frequency to a more hearable range by modulating with a sine wave:

def createAudio(data, basefreq=1000):
    data = data.real
    rate = 44100
    t = np.arange(len(data)) / rate
    c = np.sin(2*np.pi*basefreq*t)
    display(Audio(data=c*data, rate=rate))

Now, let's hear all the correction waves and also the sum of all of them:

for k in range(10):
    print ("Tk(t), k=%d" % k)
    createAudio(tks[k])
print ("Tk, k=%d" % len(zeta_zeros))
createAudio(tks[-1])

print ("Sum of all Tk")
createAudio(sum(t for t in tks))