The PageRank algorithm uses incoming links between a graph’s nodes to rank the nodes, giving nodes with more incoming links a higher rank. For example, if the nodes represent web pages, the algorithm estimates the importance of each relative to the others by evaluating the number of links each page gets from other websites. Links from more important nodes count more for a node’s rank than ranks from less important nodes.
The algorithm is called PageRank for two reasons: first, because it was originally developed to rank the importance of web pages, and second, because one of its inventors was Google co-founder Larry Page.
PageRank computes the rank of nodes in a homogeneous graph (or a homogeneous projection of a heterogeneous graph) based on the value of incoming and outgoing links. It measures the importance of nodes by assigning each node a rank, determined by the weights of the edges connecting them, that represents the probability of arriving at a given node. The higher the rank, the more probable it is that the node will be visited.
Before going further, we should add some relevant definitions aimed at understanding homogeneous graphs:
- Bijection: a one-to-one correspondence between the elements of two sets such that each element of one set is paired with exactly one element of a second set, and each element of the second set is paired with exactly one element of the first set and there are no unpaired elements.
- Graph isomorphism: a one-to-one correspondence between two sets such that all binary relationships between elements of the sets are preserved. Since the set of natural numbers can be mapped onto the set of even natural numbers by multiplying each natural number by 2, the relationship between the sets is isomorphic. In graph theory, an isomorphism of two graphs is a bijection between the vertex sets of each graph.
- Induced subgraph: a graph, formed from a subset of the vertices of another graph where all of the edges of the original graph vertices form connecting pairs of vertices in the subgraph.
- Graph automorphism: a form of symmetry in which a permutation of a graph is mapped onto itself while preserving all edge-vertex connectivity.
- Homogeneous graph: a graph is homogeneous if any isomorphism between finite induced subgraphs extends to an automorphism of the graph.
The direction of edges is extremely important, as each node's rank influences the rank of the nodes it points to both in terms of the pooling of rank among nodes (how many nodes point at a given node) and the sharing of rank from one node to another (how many nodes a given node points to). As one node’s importance grows, the nodes it references increase in importance.
The calculation is an iterative process, beginning with each node initially given equal rank before computation. In every iteration, each node divides its current rank equally across its outgoing links, and the new rank value for each node is the sum of the shares sent to it. The algorithm is stopped after a given number of iterations or if the value differences between iterations are less than a predefined value.
It is, of course, the key enabler of search engines to efficiently determine which pages are most relevant when users search for information. While it can be used to determine the probability that a webpage will be visited, it is also used to find objects of importance, such as an influential academic paper or patent.
PageRank is one of the few algorithms that can be run over any kind of graph data, so it is up to the user to ensure their graph contains one kind of node and one type of relationship to ensure realistic results.
The Katana Graph Intelligence Platform is used to solve problems. Speak to a Katana Graph Intelligence Platform expert to learn how we will help your organization.
Learn more about Katana Graph’s High-Performance Graph Analytics Library.
Get to Know Katana Graph
We thrive on testing new and diverse ideas.
Katana Graph was born of cutting-edge research and scientific rigor, and these beginnings have a powerful effect on who we are to this day. We’re devoted to problem solving, and are relentless in our pursuit of more effective and more efficient solutions to real-world challenges. Continuous improvement is the very foundation of our success.