Marcus' blog

My name is Marcus P. da Silva, and this is my blog. I am a Principal Researcher in the Quantum System group at Microsoft. The views expressed here are my own.

View My GitHub Profile

13 December 2019

How long should your quantum benchmarking experiments be?

by Shelby Kimmel (Assistant Professor, Middlebury College) and Marcus Silva (Principal Researcher, Quantum Systems, Microsoft)

TLDR We give a simple, intuitive, semi-rigorous argument for how to choose the length of experiments used to benchmark quantum computers. If you think your results will look like for , then you should choose to be around .

This post was written as part of the Q# Advent calendar for 2019.

See more of Prof. Kimmel’s work here


Quantum computers are complex devices that will one day unlock tremendous computational power – but only once they are sufficiently large and error rates are sufficiently low. Towards that end, many benchmarking protocols have been devised to help experimentalists assess the quality of currently available prototypical devices, and guide their development towards more powerful machines.

One of the most popular benchmarking protocols is called randomized benchmarking , (RB) [Knill et al., 2007, Magesan et. al, 2011]. There are many variants of RB that benchmark different aspects of quantum computers, including Google’s cross-entropy benchmarking (referred to as XEB), which was recently used to help demonstrate the power of a 53 qubit quantum computer [Arute et. al, 2019].

These RB protocols, as well as older protocols used to asses the life-time of quantum states, all rely on fitting experimental data to exponential decays. The fitting function will be of the form \begin{equation} f(k) = A p^k + B, \end{equation} where is an integer corresponding to the length of experimental sequences that are sampled, is the quantity we really care about – it generaly corresponds to a measure of how good the operations in the quantum computer are, and the higher is, the better the quantum computer – and is the expectation of binomial random variable that we can take a sample of by running a single experiment.

What do these parameters mean?

The decay parameter can, in extreme cases, be negative, but in most cases it is between and . (A value of means no errors have occurred; in recent experiments is not uncommon, and some experiments have demonstrated , signalling extremely small error rates).

and provide information about how well we can initialize the quantum computer, and how well we can read out the information in the quantum computer.

For the purposes of this discussion, we will take and to be known, as they can be estimated largely independently and in a way that is independent of experiment length ( see the post script below).

Estimating without fitting

One intuitive approach to estimating is to estimate for some , and then invert the relationship between and (recall we assume we know what and are). More explicitly, our estimate of will be given by \begin{equation} p = \sqrt[k]{\frac{f(k)-B}{A}}, \end{equation} which is straightforward to compute.

Since corresponds to the expectation value of a random distribution of measurement outcomes, we can improve this estimate by taking more samples from this distribution (i.e., running the experiment more times).

It is known that RB (and related experiments) can yield an estimate of that has very small uncertainties. This has been verified empirically in too many experiments to mention, but this is also a result that has also been proven mathematically [Epstein et. al, 2013, Granade et. al, 2014].

The details of these proofs may be somewhat opaque for those not familiar with theoretical statistics, so here we give an argument that requires minimal knowledge of statistics, and only cursory familiarity with uncertainty propagation.

Quantifying uncertainty, and how it propagates

First, let’s look at how we can calculate the uncertainty in our final estimate of .

One way to estimate uncertainty in an estimate of a random quantity is to compute its variance. Recall that is the mean of a random distribution, and we are only able to sample from that distribution, so our estimate of will also be random.

Although we can empirically measure the uncertainty in our estimate of , as well as the uncertainty in our estimate of , we are more interested in predicting the uncertainty in based on , so we can choose before running an experiment.

We do that be assuming the uncertainty in is small, and linearizing around to estimate what the resulting would be. This is known as uncertainty propagation [Wikipedia], and gives us the following relationship between and : \begin{equation} \sigma_p^2 \approx \left| \frac{\partial p}{\partial f(k)} \right|^2~{\sigma_f^2} . \end{equation} Using this equation requires computing the derivative of with respect to : \begin{equation} \frac{\partial p}{\partial f(k)} = \frac{1}{A} \frac{1}{k} p^{1-k}. \end{equation}

Our goal is to minimize with respect to the sequence length .

The optimal sequence length

Note that also depends on , and taking that dependence on into account makes things a bit messy as we look for the minimal variance sequence length. Instead, we use as an upper bound on , which is independent , allowing us to ignore the contribution from the survival probability, and reducing the minimization of to the minimization of .

If we look at , we find \begin{equation} \left|\frac{\partial p}{\partial f(k)}\right|^2 = \frac{p^2}{A^2} \frac{1}{k^2 p^{2k}}, \end{equation} which can be minimized by maximizing .

After a derivative and some light algebra we find the optimal sequence length to be \begin{equation} k_* = - \frac{1}{\log p} \approx \frac{1}{1-p}, \end{equation} where the second approximation is in the limit. This conclusion is remarkably similar to the rigorous result for RB [Granade, Ferrie, and Cory], but also applicable to more general decay experiments.

What do we gain?

It is instructive to look at how much are we gaining by looking at these longer sequences. We can estimate this by comparing the variance of our estimate using versus the variance of taking the shortest sequence . In this comparison we will include the dependence of the uncertainty in our estimate of the survival probability – we neglected it in the derivation of , but it is good to make a consistency check.

In the neighborhood of (relevant for single qubit RB), the estimate obtained from the long experiments with sequence length is approximately times smaller than the variance obtained from the shortest sequences. Recall that , which can be quite large for high quality devices.

For example, in the regime where , this corresponds to a factor of improvement in accuracy (the standard deviation) with the same effort! Trying to obtain such an improvement by increasing the number of samples would require the number of samples to be increased by a factor of !

But we don’t know what is!

In practice we don’t know what is, so we cannot choose the optimal .

One approach to get around this is to choose adaptively, starting from some initial estimate of , and modifying as we get better estimates. This can be a bit awkward to automate.

What we can do instead is simply choose multiple of different orders of magnitude. It turns out that the uncertainty is not strongly dependent on , so a single choice of can estimate a wide range of competently. For example, choosing , should allow for good estimates over the full range , but also beyond it.

Given a set of experimental data for such a range of , one can then obtain independent estimates for each point, and choose the one with smallest uncertainty. Alternatively, one can use non-linear least squares fitting through all the points, or other more sophisticated estimation methods that use all the data for the estimate, and expect improvements in accuracy over the independent estimates.


Although we have swept some details under the rug (e.g. bias of the estimates), we hope this gives a intuitive illustration of why RB and related protocols with long sequence lengths can yield such high accuracy estimates of error rates in quantum computers.

Estimating A and B

In this post, we mentioned that it is straightforward to esimate and . To do this, one can run a zero-length RB sequence and an infinite length RB sequence:

The last sequence may seem experimentally daunting, but we can also address that by replacing an infinitely long sequences with the preparation of the maximally mixed state.

tags: randomized-benchmarking - rb - cross-entropy-benchmarking - xeb - uncertainty