What are quantum superpositions?

TL;DR: A clearly motivated geometric understanding of quantum superpositions (no, a quantum system can't be in multiple states at once)

One the biggest offenders in the quantum nonsense category is that a particle can be in a “state of superposition,” in which it exists in multiple states at once until it is measured. This type of mischaracterization stems from a lack of understanding of why we have the math we have and what it represents physically. So, let’s clarify what a superposition in quantum mechanics actually is, why it is not something strange and why we use the math we use.

Let’s start with a familiar example. Suppose we have a magnet and we want to characterize the direction of its magnetization. We can imagine using polar and azimuthal angles \(\theta\) and \(\varphi\), or the components of a normalized vector \(v^x\), \(v^y\) and \(v^z\). The second choice is more convenient because we can express a generic direction as a linear combination \(v^x e_x + v^y e_y + v^z e_z\) and the magnetic field acts linearly with respect to that representation when calculating the energy: \(U = - B(v^x e_x + v^y e_y + v^z e_z) = - ( v^x B_x + v^y B_y + v^z B_z )\). The state in which the magnet is on the diagonal between \(x\) and \(z\) can be therefore expressed as \(\sqrt{2}/2 e_x + \sqrt{2}/2 e_z\), but no one in their right mind would say that the magnet exists in both the horizontal state and the vertical state. The magnet exists in a single state: the magnetization is just oriented in a diagonal direction.

Also note that the choice of axes is arbitrary. There is no sense in which some directions are linear combinations and some are not. Given any direction, we can choose it to be one of the basis vectors or not. Whether a particular direction is a linear combination or not depends on the choice of the basis vectors and is not an intrinsic physical feature.

Quantum mechanics works in exactly the same way. I can’t stress this enough: there is absolutely no difference in quality. Every state is represented by a state vector, and a linear combination (i.e. a superposition) of state vectors just gives you a different state vector. The system is always in a single, well-defined state. It is never in two states at once. Like before, every state vector can be represented as a superposition of other vectors, which means there are no “superposition states” and “non-superposition states.” It depends on what basis you pick. If we have a spin 1/2 system, this is effectively a little magnet and its state is fully determined by its orientation. The state space is a sphere, exactly like directions in space. Morever, every two-state system, every qubit, is described in the same way. So, the intuition we have from directions in space carries over.

There is, however, a difference. Even though directions in 3D and quantum states for a qubit are one-to-one, we represent them in a different way. This is the part that can be confusing because it is generally unmotivated. But, as usual, if you understand the physics, you understand why things are done in a certain way. The key is that we need something different because the quantities that we are interested in, those on which the physics is linear, are different.

For directions, we are interested in distances. So, if we have a direction on the \((x,z)\) plane, the components \(v^x\) and \(v^z\) are the physically interesting quantities, but the constraint they have is quadratic \((v^x)^2 + (v^z)^2=1\) because the vector is normalized. Note that \(v^x = \sin \theta\) while \(v^z = \cos \theta\), so the constraint is the trig identity \(\sin^2 \theta + \cos^2 \theta = 1\). In quantum mechanics we are not interested in distances but probability. So we have to understand how the linearity in probability works.

For a two-state system, a qubit, there are only two mutually exclusive, perfectly distinguishable configurations at a time. You choose one direction, and then either you prepare the system along that direction or in the opposite direction. This is true for measurements as well: you pick a direction for your measurement, say \(z\), and you will find the spin of your particle aligned or anti-aligned with that direction, \(z^+\) or \(z^-\). If we prepared our state in another direction, then, the measurement will have to change the state to one of the possible outcomes, but the average will have to correspond to measuring the vertical direction of the original prepared state. Therefore, the probability \(p_{z^+}\) and \(p_{z^-}\) will have to be proportional (i.e. linear) with respect to the vertical component.

The quantities that need to be represented linearly, then, are \(p_{z^+}\) and \(p_{z^-}\), and the constraint is \(p_{z^+} + p_{z^-} = 1\). Note the difference: for directions we are assigning numbers to perpendicular directions, \(x\) and \(z\), and the constraint is quadratic. In quantum mechanics, we are assigning numbers to opposite directions \(z^+\) and \(z^-\), and the constraint is linear. This is why we use a different representation. So we need a way to represent all directions on the \(x\) and \(z\) plane as components on \(z^+\) and \(z^-\). How do we do that?

Note that we are still representing directions, therefore the component must be expressible in terms of angles and trigonometric functions. When \(\theta=0\), we must have \(p_{z^+}=1\) and \(p_{z^-}=0\), while when \(\theta=180\) degrees, we must have \(p_{z^+}=0\) and \(p_{z^-}=1\). We also must have \(p_{z^+} + p_{z^-}=1\). One finds that the correct expression is \(p_{z^+} = \cos^2 \theta/2\) and \(p_{z^+} = \sin^2 \theta/2\). This allows us to represent each direction as \(\sqrt{p_{z^+}} \vert z^+ \rangle + \sqrt{p_{z^-}} \vert z^- \rangle = \cos \theta/2 \vert z^+ \rangle + \sin \theta /2 \vert z^- \rangle\). That is, we turned the linearity of probability into the more usual linearity of components in terms of square roots of probabilities.

There is one issue: the factor of two. This means that each state is represented twice, as \(\theta/2\) will span 360 degrees. In other words, the square root of the probability may be negative, but it is the square that counts. So each state vector \(\vert \psi \rangle\) and its opposite \(-\vert \psi \rangle\) actually represent the same state. A better way to say it is that the state is really represented by a line that passes through the origin, not just a direction. While you need a full turn to rotate a vector back to its original position, it will take only half a turn to rotate a line through the origin back to its original position. This is what is meant by spin is the double cover of rotation. The point is that, while this vector notation is very convenient for calculation, we lose something. The factor of two of the angle also tells us why orthogonal states in the vector space correpond to opposite states on the Bloch sphere.

So, where do the complex numbers come from? From the \(y\) direction. All that we discussed for \((x,z)\) will have to be valid on the \((y,z)\) plane as well. If we allow the components to be complex numbers, we can represent any direction as \(\cos \theta/2 e^{-\imath \varphi /2} \vert z^+ \rangle + \sin \theta /2 e^{\imath \varphi /2} \vert z^- \rangle\), using the phase to keep track of the angle on the \((x,y)\). If, whenever we used squares, we use the norm squared, then the results for the \((x,z)\) plane will work the same for any other \((n^x e_x + n^y e_y, z)\) plane. So, we use the norm to fix the height at which to cut the sphere, which will give us a circle, and then the phase to identify the point along the circle.

To sum up, superpositions are exactly like linear combinations of directions in physical space. In the same way that nothing can be oriented in two directions at once, no quantum system is in two states at once. We need a different notion of linearity because, for directions, what behaves linearly are distances while in quantum mechanics, what behaves linearly are probabilities. We can represent a direction on a sphere using two complex numbers, one assigned to the top point and one to the bottom point, whose norms squared sum to one. The norms represent probability and identify the height at which we cut the sphere to get a circle, and the phase difference between the two complex numbers represents the angle along the circle. There is nothing mysterious. Just lots of interesting geometry!