Why do we still flatten embedding spaces?
ost dense retrieval systems rely on cosine similarity or dot-product, which implicitly assumes a flat embedding space. But embedding spaces often live on curved manifolds with non-uniform structure—dense regions, semantic gaps, asymmetric paths.
I’ve been exploring the use of:
- Ricci curvature as a reranking signal
- Soft-graphs to preserve local density
- Geodesic-aware losses during training
Curious if others have tried anything similar? Especially in information retrieval, QA, or explainability. Happy to share some experiments (FiQA/BEIR) if there's interest.
Really love this question — been thinking about it for a while now.
We’ve been hacking on a different approach we call the WFGY Engine — it treats embedding space not as flat or uniform, but more like a semantic energy field. So instead of forcing everything into clean cosine ops, we track these “semantic residue structures” — kind of like how meaning resists compression when you flatten it too early.
We measure stuff like ΔS (semantic tension), λ_observe (viewpoint drift), and E_resonance (contextual energy), which lets us reason across curved or even broken spaces where normal LLMs kinda lose track. It’s a mix between geodesic logic and language field dynamics — weird but surprisingly stable.
A couple of early apps built on this — TXT OS, BlahBlahBlah, etc — ended up scoring 100/100 from 6 major AI models, which was a cool surprise. We’re still in early dev, but even Tesseract gave it a formal endorsement recently, which was huge for us.
Anyway, core engine is all open source if anyone’s curious: https://github.com/onestardao/WFGY
Would love to hear if others are exploring non-flat logic or weird manifold behavior in language space too.
can link the Tesseract thing if folks want
I found your medium article on this subject by hunting around. It's very interesting. Hope you write more.
From a layman's point of view, say this is true and embeddings should be a manifold. Like a horse saddle or sphere for example. Then some adding of vectors the simple way won't make sense. On a sphere west in China plus west in US is double west but in 3d it's zero as they point in different directions. Is this sort of the idea?
Hey man! Thanks a lot for your support! Might sound like just common words, but honestly — knowing this helped or inspired someone really motivates me. Makes me feel a bit less like a madman hahaha.
About your question — I think I get your point. Here’s how I understand it:
In the hypothetical latent manifold, we might want to measure two different things:
1. Distance between two points: This wouldn’t be the usual Euclidean distance, because the space is curved. Like how the shortest path between two cities on Earth isn't a straight line, but an arc on the globe. That’s where geodesics come in — they’re the shortest paths constrained by the shape of the manifold.
2. Similarity between two vectors ("parallel transport") Instead of asking where the vectors point in ambient space (like cosine similarity in R^n), we (should) care about how their directions compare _on the surface itself_. So ideally, we’d compare them along the geodesic — parallel transport — to properly align their frames of reference before measuring any angle or similarity.
That’s the intuition I’m working with, anyway. Let me know what you think, and thanks again for your comment!
I've been bothered by this since before there were transformers.
Probably the most interesting function over t is G(t), that function Chomsky said was the grammar in that it is true if t is well-formed and false if it isn't.
G(t) over t is not a manifold because it is not continuous and its projection in the embedding space can't be continuous either. It boggles my mind, and leaves me thinking that it's not legitimate to work in the embedding space but it obviously works.
If you have two points in the embedding space which represent well-formed sequences and draw a line that interpolates between them you'd think that there would have to be points in between that correspond to ill-formed sequences. Intuition over high dimensional spaces is problematic, but I imagine there have to be structures in there that "look" like a crumpled up ball of 2-d paper in a 3-d space or are folded up like filo dough.
That's fascinating! — but I don't fully agree with the framing.
Using G(t) in the context of embeddings seems problematic, specially given the probabilitistic nature.
Example - Take a sentence with a typo but semantically clear and correct (let's suppose): "The justice sistem is corrupt."
G(t) = 0, right? But semantically, it's close to G(t) → 1.
Instead of focusing on exact validity —which for me seems too rigid for something as ambiguous and context-dependent as language— what if we focused on _approximate semantic trajectories_?
You wrote:
> "If you have two points in the embedding space which represent well-formed sequences and draw a line that interpolates between them you'd think that there would have to be points in between that correspond to ill-formed sequences."
In my view, it's actually the opposite:
> If the embedding model captures meaningful structure, and you account for geometric properties like curvature, local density, and geodesics — then the path between those two points should ideally trace semantically valid (even "optimal" - if that exists) reasoning.
The problem isn't that interpolation fails — it's that we're interpolating linearly in a space that likely isn't flat!
Thanks for your comment. Lmk what you think :)