Why do quantum superpositions exist?
TL;DR: Quantum mechanics allows superpositions because it allows ensembles to be decomposed into different pure states
Quantum mechanics allows superpositions (i.e. linear combination) of states in a way that classical mechanics doesn’t, though it is not as easy as one may think to characterize the difference. One can, for example, take linear combinations of position and momentum. Why exactly are they different from quantum superpositions? What is it that quantum superpositions do that is allowed by quantum mechanics and not by classical mechanics?
While we can take classical point particle states \((q^1, p_1)\) and \((q^2, p_2)\), and simply sum the values of position and momentum \((q^1 + q^2, p_1 + p_2)\), the solution of the equation of motion won’t sum. That is, if \(q^1(t)\) and \(q^2(t)\) describe the trajectories in space for the first two states, \(q^1(t) + q^2(t)\) is not, in general, a solution of the equation of motion. In quantum mechanics, instead, superpositions are preserved by time evolution: the Schroedinger equation is linear. This is why superpositions are inherently more important. But this is just a mathematical argument: what is the physics underlying this mathematical fact? Could we have non-linear equations in quantum mechanics, or is this linearity something that characterizes a physical property that cannot be broken?
The problem here is that superpositions are a property of the mathematical representation in terms of vector spaces. We need to find an alternative characterization of superposition in terms of something that is physically more well-defined. We can find one such characterization in terms of ensembles and mixtures of ensembles.
A statistical ensemble in quantum mechanics is represented by a density matrix. Given two ensembles \(\rho_1\) and \(\rho_2\), we can create a statistical mixture \(p\rho_1 + (1-p) \rho_2\) where we take the first ensemble \(p\) percent of the times and the second the remaining \(1-p\) percent. This works exactly like in classical mechanics, but the spaces of classical and quantum ensembles differ in one significant way. In classical mechanics, every statistical ensemble can be decomposed uniquely as a probability distribution over pure states, states that cannot be understood as a mixture of other states. For example, in classical mechanics every ensemble for a classical particle can be expressed as a probability distribution over all possible combinations of position and momentum. In quantum mechanics, this is not true. For a spin 1/2 system, an equal mixture of spin up and spin down is indistinguishable from an equal mixture of spin left and spin right. Both, in fact, will give the maximally mixed state, and the statistics over all measurements will be the same. This gives a different shape to the spaces of possible statistical ensembles and makes classical probability and quantum probability incompatible: quantum states cannot be described using classical probability and vice-versa.
It turns out that we can express superpositions in terms of multiple decompositions and vice-versa. Suppose that a state \(\phi\) can be expressed as a superposition of two states \(\psi_1\) and \(\psi_2\), then we can find and ensemble \(\rho\) that can be expressed both as a mixture of \(\psi_1\) and \(\psi_2\), or as a mixture of \(\phi\) and another state \(\hat{\phi}\). For the full calculation, see here. If the states are assumed normalized, then \(\vert \phi \rangle = c_1 \vert \psi_1 \rangle + c_2 \vert \psi_2 \rangle\) if and only if there exists \(\rho = \frac{1}{2} \vert \psi_1 \rangle \langle \psi_1 \vert + \frac{1}{2} \vert \psi_2 \rangle \langle \psi_2 \vert = \vert c_1 \vert^2 \vert \phi \rangle \langle \phi \vert + \vert c_2 \vert^2 \vert \hat{\phi} \rangle \langle \hat{\phi} \vert\) with \(\vert \hat{\phi} \rangle = c_1 \vert \psi_1 \rangle - c_2 \vert \psi_2 \rangle\). Therefore quantum superpositions are effectively representing the properties of multiple decomposition of statistical ensembles in terms of pure states.
Now, time evolution must preserve statistical mixtures. If \(\rho = p\rho_1 + (1-p) \rho_2\) at the beginning, then \(\rho(t) = p\rho_1(t) + (1-p) \rho_2(t)\) at all times. That is, time evolution must be a linear operation over ensembles. This must be true in classical mechanics, quantum mechanics and any other physical theory, and in fact it is. But this means that multiple decompositions must be preserved as well. That is, if \(\rho = \frac{1}{2} \vert \psi_1 \rangle \langle \psi_1 \vert + \frac{1}{2} \vert \psi_2 \rangle \langle \psi_2 \vert = \vert c_1 \vert^2 \vert \phi \rangle \langle \phi \vert + \vert c_2 \vert^2 \vert \hat{\phi} \rangle \langle \hat{\phi} \vert\) at the beginning, this must be true for the whole evolution. Therefore, time evolution must preserve superpositions.
The physical reason why we have quantum superpositions, and why time evolution has to be linear with respect to those superpositions, is exactly the property of quantum ensembles to be decomposable into different pure states. This is a property that classical mechanics does not have and is at the core of a true understanding of quantum mechanics.