Combining two test batches to get a better estimate

TL;DR

The merge_summary.md utility combines two ULTRA experiment summaries into a single consolidated XML report.

Here we describe the underlying criteria.

Premise

We assume the two batches come from the same ULTRA version and parameters (i.e. you can treat them as two independent samples from the same underlying distribution).

Math

We have two sets \(i \in \{1,2\}\), let \(n_i\), \(M_i\), \(s_i\) be the size, mean and standard deviation of the \(i\)-th set.

The \(j\)-th element from the \(i\)-th set is written as \(x_{ij}\).

Combined mean

This isn't too difficult, the combined mean is the weighted mean:

\[\bar M = \frac{n_1M_1 + n_2M_2}{n_1 + n_2}\]

and it's reasonable, since each batch contributes proportionally to how many runs it contains. If you aren't persuaded:

\[\bar M = \frac{1}{n_1 + n_2} (\sum_{j=1}^{n_1}x_{1j} + \sum_{j=1}^{n_2}x_{2j}) = \frac{1}{n_1 + n_2} (n_1M_1 + n_2M_2) = \frac{n_1M_1 + n_2M_2}{n_1 + n_2}\]

Combined variance

By definition, the sample variance of all observations is:

\[{\bar s}^2 = \frac{1}{n_1 + n_2 - 1} \sum_{i=1}^2\sum_{j=1}^{n_i}(x_{ij} - \bar M)^2\]

The trick is decomposing the inner term:

\[x_{ij} - \bar M = (x_{ij} - M_i) + (M_i - \bar M)\]

and squaring it:

\[(x_{ij} - \bar M)^2 = (x_{ij} - M_i)^2 + 2(x_{ij} - M_i)(M_i - \bar M) + (M_i - \bar M)^2\]

Now we calculate the inner sum:

\[\sum_{j=1}^{n_i} (x_{ij} - \bar M)^2 = \sum_{j=1}^{n_i} (x_{ij} - M_i)^2 + 2(M_i - \bar M) \sum_{j=1}^{n_i} (x_{ij} - M_i) + n_i(M_i - \bar M)^2\]

but by definition of mean:

\[\sum_{j=1}^{n_i} (x_{ij} - M_i) = \sum_{j=1}^{n_i} x_{ij} - n_iM_i = n_iM_i - n_iM_i = 0\]

so the middle term vanishes. Thus:

\[\sum_{j=1}^{n_i} (x_{ij} - \bar M)^2 = \sum_{j=1}^{n_i} (x_{ij} - M_i)^2 + n_i(M_i - \bar M)^2\]

Even the first term can be simplified:

\[\sum_{j=1}^{n_i} (x_{ij} - M_i)^2 = (n_i - 1)s_i^2\]

therefore

\[\sum_{j=1}^{n_i} (x_{ij} - \bar M)^2 = (n_i - 1)s_i^2 + n_i(M_i - \bar M)^2\]

Eventually, returning to the variance definition:

\[{\bar s}^2 = \frac{1}{n_1 + n_2 - 1} \sum_{i=1}^2\sum_{j=1}^{n_i}(x_{ij} - \bar M)^2 = \frac{1}{n_1 + n_2 - 1} \sum_{i=1}^2 \left( (n_i - 1)s_i^2 + n_i(M_i - \bar M)^2 \right) = \frac{(n_1 - 1)s_1^2 + n_1(M_1 - \bar M)^2 + (n_2 - 1)s_2^2 + n_2(M_2 - \bar M)^2}{n_1 + n_2 - 1}\]

Q.E.D.

Python to the rescue

def combine_mean(m1, n1, m2, n2):
    """Pooled mean."""
    return (n1 * m1 + n2 * m2) / (n1 + n2)


def combine_std(m1, s1, n1, m2, s2, n2):
    """Unbiased pooled standard deviation."""
    M = combine_mean(m1, n1, m2, n2)
    num = (
        (n1 - 1) * s1 * s1 +
        (n2 - 1) * s2 * s2 +
        n1 * (m1 - M)**2 +
        n2 * (m2 - M)**2
    )
    return (num / (n1 + n2 - 1)) ** 0.5

The code is very simple and mathematically correct.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search