← Back to Blog

Authority Is the Degenerate Case: What Else Propagates Through the Link Graph?

For twenty five years a single quantity has been read off the web's link graph: authority, the standing a page holds, the property PageRank and its descendants measure. It is a real number and a useful one, and search was built on it. It is also, we have come to think, the simplest thing the graph has to say.

Diagram: authority and topic are global signals; character is a third, per reader property of a web page
Authority and topic are global, the same for every visitor. Character is a third property of a page, and it is only one of several signals the link graph turns out to carry.

A link is more than a vote

The classical reading treats a link as a vote: count the votes well, weighted by the standing of whoever casts them, and you recover importance. That is authority. But a link is also a choice about who to point at, and choices carry structure well beyond how important the target is. Collapse the whole graph to one global score per page and that structure is thrown away. Read the same graph directionally, and sliced by sub population rather than averaged over everyone, and it gives up far more than one number.

Two things we found flowing through it

In a paper published this month we report two attributes that propagate through the link graph, neither of which is authority and neither of which is topic.

The first is character: how a page reads, measured across eight axes with its subject and standing held fixed. A domain's character is predicted by its neighbours' character at roughly 0.4 per axis, against a permutation null of zero. Character flows across links the way authority does. Pages link to pages that read like them.

The second is geographic relevance: which country a site belongs to. The same propagation procedure, with no per country training, reconstructs national relevance for forty six European countries from link structure alone. A British institution reads as British from who links to it, not from its domain ending.

These are two unrelated classes of attribute. One is linguistic, a property of how text is written. One is physical, a fact about geography. They are recovered by one and the same mechanism.

Why that reframes the graph

If authority, character and geography all propagate through the same graph by the same method, then authority is not special. It is the degenerate reading: one scalar, global, identical for every visitor. Read the graph as the richer relational object it is and it yields a space rather than a ranking, and authority, character and geography are three of its coordinates. The web has been measured as a hierarchy. It is also a space, and we have so far mapped a handful of its dimensions.

The control that keeps this honest

A reasonable objection is that a propagation method simply amplifies whatever labels you feed it, that it is a loop reading its own input back out. Geography is the answer to that. A site's country is a fixed fact in the physical world that the method was never taught and never labelled, yet the same engine recovers it from links alone. If a procedure can reconstruct a piece of the real world it was not shown, it is reading genuine structure in the graph, not echoing its own assumptions.

So what else propagates?

This is the question we most want to hand on. Authority, character and geography are three node attributes the graph carries. They are very unlikely to be the only three. Other candidates are testable in exactly the same way, read directionally, sliced by sub population, measured against a permutation null: temporal recency, credibility, audience composition, topical stance, sentiment, perhaps even commercial intent. We do not yet know which of these propagate, nor how large the family is.

The honest position is narrow and, we think, the more interesting for it. We have shown that the link graph carries at least three distinct latent properties, and demonstrated one method that recovers them. We have not shown where the family ends. If it turns out to be large, then a quarter century of ranking the web on a single coordinate looks less like the finished science and more like the first measurement.

An open invitation

The method is simple to state and the paper is open. The interesting work now, mapping the other dimensions of the graph, does not require us to be the ones who do it. Anyone with a copy of the web graph and a capable model can put the question to any node attribute they care about: does it propagate, against a proper null, when the graph is read directionally? We would genuinely like to know the answers, including the negative ones.

See it, and check us

Read the paper

The full method, the propagation results, the geographic control and the honest open questions, published openly.

Read the paper

Related: the half of a web page you cannot see is the half that sells →