« back to posts

Why ratios want geometric means

2020-08-28 · view article source

You may have heard that when taking the average of a set of ratios, it’s best to use the geometric mean instead of the normal arithmetic mean. For years, I took this as received wisdom, but recently I decided to think about it more carefully. It makes sense to me now—here’s my thought process.

Speed test

Suppose that you have two computer programs, Program A and Program B. You want to compare how fast the two programs run on a variety of test cases. You time them, and observe the following results:

  • Trial 1: Program A is 2× faster.
  • Trial 2: Program A is 4× faster.
  • Trial 3: Program A is 0.2× as fast.

Call these ratios a1=2.0a_1 = 2.0, a2=4.0a_2 = 4.0, and a3=0.2a_3 = 0.2. Program A is faster than Program B on trial ii if ai>1a_i > 1. We can see that Program A is sometimes faster, but Program B is sometimes faster, too. Which program is faster “on average”?

With a simple arithmetic mean, we could compute:

aˉ=13(a1+a2+a3)2.1.\bar a = \frac{1}{3} (a_1 + a_2 + a_3) \approx 2.1.

So, “on average”, it looks like Program A is 2.12.1 times faster.

But something’s not quite right here. Suppose that we had looked at the exact same timing data from the perspective of Program B instead. We would have observed these results:

  • Trial 1: Program B is 0.5× as fast.
  • Trial 2: Program B is 0.25× as fast.
  • Trial 3: Program B is 5× faster.

That is, we have b1=0.5b_1 = 0.5, b2=0.25b_2 = 0.25, and b3=5.0b_3 = 5.0, where each bi=1/aib_i = 1 / a_i. Taking an arithmetic mean again, we find:

bˉ=13(b1+b2+b3)1.9.\bar b = \frac{1}{3} (b_1 + b_2 + b_3) \approx 1.9.

But then we are forced to conclude that, “on average”, Program A is 2.12.1 times faster than Program B, and Program B is also 1.91.9 times faster than Program A! Surely any framework that leads us to such a conclusion must be flawed. An arithmetic mean must not be the right tool: let’s return to the drawing board.


Consider for a moment what the arithmetic mean means. It says, “combine your three quantities by adding them together, then scale them down by a factor of three to counteract the change in magnitude”. It we have three quantities of distance, or of mass, or of time, this makes sense. If you travel 33 meters and then travel 22 meters, you’ve traveled 3+2=53 + 2 = 5 meters.

But that’s not how ratios compose. If you make a program 1.21.2 times faster, and then make it 1.21.2 times faster again, you haven’t made it 1.2+1.2=2.41.2 + 1.2 = 2.4 times faster—you’ve made it 1.2×1.2=1.441.2 \times 1.2 = 1.44 times faster. Thus, to take an average in the same spirit as the arithmetic mean, we should combine our three ratios by multiplying them together, and then scale them down by a third root to counteract the change in magnitude. And this is precisely the geometric mean.

Returning to our speed comparison example, we can reëvaluate how much faster Program A is “on average”…

a~=(a1a2a3)1/3=1.6,\tilde a = (a_1 \cdot a_2 \cdot a_3)^{1/3} = 1.6,

…and Program B:

b~=(b1b2b3)1/3=0.625.\tilde b = (b_1 \cdot b_2 \cdot b_3)^{1/3} = 0.625.

Now our results are consistent: Program A is faster, and Program B is slower. Moreover, note that a~=1/b~\tilde a = 1 / \tilde b, which makes sense. This is another property that carries over in spirit from arithmetic means. If you travel, on average, 33 meters farther than me, then I travel 3-3 meters farther than you; and just as ratios compose with multiplication rather than addition, they invert with reciprocation rather than negation.

The logarithmic connection

Look more closely at these analogies that hold “in spirit” between ratios and the more general real numbers:

  • Ratios live in R+\mathbb{R}^+, and reals live in R\mathbb{R}.
  • Ratios combine with multiplication, and reals with addition.
  • Ratios scale down with nnth roots, and reals with division.
  • Ratios invert with reciprocation, and reals with negation.

The common structure underlying these mappings is the logarithm:

  • The image of R+\mathbb{R}^+ under the logarithm is R\mathbb{R}.
  • For ratios aa and bb, we have log(ab)=log(a)+log(b)\log(a \cdot b) = \log(a) + \log(b).
  • For a ratio aa and positive scalar cc, we have log(a1/c)=log(a)/c\log(a^{1/c}) = \log(a) / c.
  • For a ratio aa, we have log(1/a)= ⁣log(a)\log(1 / a) = -\mskip-1mu\log(a).

Because the logarithm preserves all the structure that we care about, we say that it is a homomorphism (“same shape”) from the ratios to the reals. Moreover, since the logarithm is invertible (via the exponential function), and its inverse likewise preserves this structure, it is an isomorphism, which means that these two spaces are really just different characterizations of the same thing. Thus, another way to think about the geometric mean is that it uses the logarithm to bring the ratios into “linear space”, takes their arithmetic mean, and then returns them to their natural habitat:

exp(logz1,,logzn)=exp(1ni=1nlogzi)=(i=1nzi) ⁣1/n=z1,,zn~. \exp(\overline{\log z_1, \dotsc, \log z_n}) = \exp\Biggl( \frac{1}{n} \sum_{i=1}^{n} \log z_i \Biggr) = \Biggl( \,\prod_{i=1}^{n} z_i \Biggr)^{\mskip-3mu 1/n} = \widetilde{z_1, \dotsc, z_n}.
=exp(logz1,,logzn)=exp(1ni=1nlogzi)=(i=1nzi) ⁣1/n=z1,,zn~. \begin{aligned} &\mathrel{\hphantom=} \exp(\overline{\log z_1, \dotsc, \log z_n}) \\[1.2ex] &= \exp\Biggl( \frac{1}{n} \sum_{i=1}^{n} \log z_i \Biggr) \\[3ex] &= \Biggl( \,\prod_{i=1}^{n} z_i \Biggr)^{\mskip-3mu 1/n} \\[3ex] &= \widetilde{z_1, \dotsc, z_n}. \end{aligned}

This also suggests that if we wanted to, say, summarize the distribution of a random variable representing ratios, we should probably not take its expectation. We would get a more meaningful result by taking the exponential of the expectation of its logarithm, exp(E[logZ])\exp(\mathbb{E}[\log Z]). And this in turn brings to mind Jensen’s inequality, with which we immediately obtain a nice proof of the AM–GM inequality by taking ZZ to be a uniform random variable over a set z1,,znz_1, \dotsc, z_n:

z1,,zn~=exp(E[logZ])exp(log(E[Z]))=E[Z]=z1,,zn,\widetilde{z_1, \dotsc, z_n} = \exp(\mathbb{E}[\log Z]) \leq \exp(\log(\mathbb{E}[Z])) = \mathbb{E}[Z] = \overline{z_1, \dotsc, z_n},
=z1,,zn~=exp(E[logZ])exp(log(E[Z]))=E[Z]=z1,,zn, \begin{aligned} &\mathrel{\hphantom=} \widetilde{z_1, \dotsc, z_n} \\ &= \exp(\mathbb{E}[\log Z]) \\ &\leq \exp(\log(\mathbb{E}[Z])) \\ &= \mathbb{E}[Z] \\ &= \overline{z_1, \dotsc, z_n}, \end{aligned}

with the second step due to Jensen’s inequality, the concavity of the logarithm, and the monotonicity of the exponential. All the pieces fit nicely together… one connection at a time.

« back to posts