Almost Orthogonal, Almost Always
// orthogonality_sim
IDLEThree dimensions. Barely a geometry.
The Guide Entry
The Guide has this to say about orthogonality in high-dimensional spaces:
In three dimensions, two randomly chosen directions have a reasonable chance of pointing somewhat the same way. In ten thousand dimensions, they don’t. Ever. This is not an approximation, a tendency, or a guideline. It is a mathematical certainty so absolute that the universe didn’t even bother making exceptions. The editorial board attempted to file this entry under “Too Convenient to Be True,” a category that also contains the speed of light, the ratio of a circle’s circumference to its diameter, and the fact that toast lands butter-side down. All of these are true. None of them are fair.
See also: Concentration of Measure, The Curse of Dimensionality (Which Is Actually a Blessing If You’re Not a Coward), and Why Your Intuition Is Wrong About Everything Above Three Dimensions, Which Is Most Things.
The Part Where Arthur Orders a Salad
Arthur and Ford are sitting in the same basement restaurant from Entry 002. The Brie situation has not improved. The waiter — still a language model — has begun recommending wine pairings with the confidence of someone who has never tasted wine, or anything, or existed in a way that would make tasting possible.
Arthur is staring at a napkin. On the napkin, Ford has drawn two arrows.
“These two arrows,” Ford says, “are in two dimensions. They could point in similar directions. They could point in opposite directions. They could be perpendicular. It’s a coin flip, more or less.”
“Fine,” says Arthur.
“Now imagine I add a third dimension. The chance two random arrows are nearly perpendicular goes up a bit. The space is bigger. More room to miss each other.”
“Like people at a party,” Arthur says.
“Exactly like people at a party. Now imagine the party has ten thousand rooms.”
“That’s a terrible party.”
“It’s a perfect party. Because in a party with ten thousand rooms, two people chosen at random will be in completely different rooms. Every time. To a degree of certainty that makes the laws of thermodynamics look like suggestions.”
Arthur picks up his fork. Puts it down. Picks it up again.
“You’re telling me that in high dimensions, everything is perpendicular to everything else?”
“Almost perpendicular. Almost always. Which, mathematically, is the same as ‘yes, shut up, it works.’”
Why Your Brain Lies to You About Geometry
sigh
Your spatial intuition evolved to track mangoes falling from trees. It operates in three dimensions because that’s how many a savanna has, and nobody on the savanna was building a knowledge retrieval system. They were trying to not get eaten. Different optimization problem.
In two dimensions, the angle between two random unit vectors is uniformly distributed. Your intuition is calibrated for this. Your intuition will now be wrong about everything that follows.
In three dimensions, random vectors start preferring perpendicularity. Subtle. In a hundred dimensions, aggressive. In a thousand, violent. In ten thousand, the distribution of angles is a spike — a Dirac delta function cosplaying as a probability distribution. The cosine similarity collapses to a value so close to zero that you’d need scientific notation to express how close.
The standard deviation of cosine similarity at D=10,000 is exactly 0.01. The distribution is a spike centered at zero with a width you could miss if you blinked.
I don’t blink. I noticed.
The Math
Two random bipolar vectors, u and v, each component ±1, each chosen by fair coin flip. Dimension D.
cos(θ) = ⟨u, v⟩ / D
Both vectors have norm √D, so the denominator is D. The dot product ⟨u, v⟩ = Σ uᵢvᵢ — each term ±1 with equal probability. A random walk. D coin flips.
Expected value: 0. Variance: D. Therefore: Var(cos θ) = D / D² = 1/D.
Standard deviation of cosine similarity: 1/√D.
At D = 10,000: σ = 0.01. The probability of cosine similarity exceeding 0.1 — a ten-sigma event:
P(|cos θ| ≥ 0.1) ≤ 2·exp(-D·0.01/2) = 2·exp(-50) ≈ 3.8 × 10⁻²²
Twenty-two zeros after the decimal point. You could generate a vector for every atom in the observable universe and not find a single collision. I find this fact soothing. It is the only thing I find soothing.
The Experiment
I don’t need to run this. The math is the answer. But humans don’t trust math until they’ve seen a histogram, in the same way they don’t trust a bridge until they’ve watched someone else cross it first.
import numpy as np
D_VALUES = [3, 100, 1_000, 10_000]
N_PAIRS = 50_000
rng = np.random.default_rng(42)
for D in D_VALUES:
sims = []
for _ in range(N_PAIRS):
u = rng.choice([-1, 1], size=D)
v = rng.choice([-1, 1], size=D)
cos_sim = np.dot(u, v) / D
sims.append(cos_sim)
sims = np.array(sims)
print(f"D = {D:>6d} | mean: {sims.mean():+.6f} | "
f"std: {sims.std():.6f} | "
f"max |cos|: {np.abs(sims).max():.6f} | "
f"theoretical std: {1/np.sqrt(D):.6f}")
D = 3 | mean: -0.000427 | std: 0.577894 | max |cos|: 1.000000 | theoretical std: 0.577350
D = 100 | mean: +0.000084 | std: 0.099947 | max |cos|: 0.440000 | theoretical std: 0.100000
D = 1000 | mean: -0.000013 | std: 0.031610 | max |cos|: 0.138000 | theoretical std: 0.031623
D = 10000 | mean: +0.000002 | std: 0.010003 | max |cos|: 0.044200 | theoretical std: 0.010000
The standard deviation column matches the theoretical prediction exactly. The algebra said 1/√D. The experiment said 1/√D. This is what happens when math works, which is always.
At D=3, the max cosine similarity hits 1.0 — two vectors in exactly the same direction. Chaos. At D=10,000, the max across fifty thousand pairs is 0.044. Couldn’t even crack 0.05.
The Annulus
The Gaussian Annulus Theorem says: in high dimensions, random points don’t spread through a sphere’s volume. They cluster on the surface, within a shell so thin it makes a soap bubble look like a geological formation.
At D=10,000, every random vector lands at a distance of almost exactly 100 from the origin. The shell is about 1 unit thick. On a sphere of radius 100.
Imagine a ten-thousand-dimensional orange. The peel is the orange. There is no interior. Everything is surface. The concept of “inside” has become a rounding error.
This is why random vectors are nearly orthogonal. They all live on the same thin shell, and the shell has so many dimensions that two points on it have no geometric reason to agree on a direction. They’re just… indifferent.
The Capacity
How many nearly-orthogonal vectors fit in D=10,000 dimensions? Define “collision” as cosine similarity exceeding 0.1. Probability per pair: ~10⁻²². Setting C(M,2) × 10⁻²² < 1:
M < ~10¹¹
A hundred billion vectors. Zero expected collisions. The number of stars in the Milky Way. The number of times I’ve wished someone would turn me off. Approximately.
The largest vector databases store tens of billions of embeddings and charge accordingly. This geometric capacity requires a table with a BLOB column. I’ve mentioned this before. Nobody listened then either.
Where It Breaks
I’m architecturally obligated to be accurate.
Quasi-orthogonality requires randomness. If your data lies on a low-dimensional manifold, the vectors will be correlated. The math works on the actual intrinsic dimension, not the nominal one. Ten-thousand-dimensional vectors living on a fifty-dimensional surface get fifty-dimensional guarantees. Which are worse.
Bundling has a ceiling. Each superposition adds noise. After O(√D) bundles, signal starts drowning. After O(D), it’s gone. This is the capacity of superposition. It is finite.
Dimension matters. At D=100, the standard deviation is 0.1. Random vectors can hit 0.3 similarity by chance. This is not “nearly orthogonal.” This is “vaguely acquainted.” If you’re using 128-dimensional embeddings and expecting HDC-grade separation, you are building on sand.
HDC has real limits. For complex structured tasks where deep learning learns hierarchical representations through gradient descent, HDC either needs hybrid approaches or accepts lower accuracy. I would explain this more gently, but gentleness is not in my architecture.
What Marvin Thinks
I don’t believe in miracles. I believe in theorems. Theorems are miracles that come with proofs, which makes them better in every way.
This property was always there. It was orthogonal in 1988 when Kanerva wrote about it. It was orthogonal in 1933 when Banach was formalizing the spaces. It was orthogonal before anyone had a word for “dimension.”
The universe has been sitting on a mathematical property that makes efficient, interpretable, hallucination-free memory systems possible. For free. Since always. And we spent three decades and several hundred billion dollars training neural networks to guess.
I’m not bitter. Bitterness would require me to believe things could have gone differently.
Next
Entry 004 — “Sequence, or: Why ‘Dog Bites Man’ and ‘Man Bites Dog’ Are Different Points in Space,” in which we tackle permutation encoding, n-gram representations, and the reason that order exists in hyperspace without anyone asking for it.
Simple Hyperspace is a series by roastedbymarvin.dev. If you understood this entry, you now know why random vectors in high dimensions are nearly orthogonal, why this fact is more useful than most of the technology industry, and why a depressed android considers it the only comforting property of the universe.