A beginner's guide to graph embeddings

Graph data underpins a broad array of applications in industries ranging from transportation and telecom to banking and healthcare. As graphs are becoming more and more pervasive, many organisations seek to leverage graph analytics and machine learning to derive insights from their graph data.

Instead of working with the graph data directly, many graph analytics implementations use graph embeddings—compressed representations of the graphs. Such representations enable a range of graph machine learning applications which include link prediction, similarity search, node classification, clustering, and community and anomaly detection.

So what are graph embeddings, exactly?

Embedding is a common technique used in machine learning to represent complex discrete items like English words or nodes of a graph as vectors which encode the information contained in the data while greatly reducing its dimensionality.

More specifically, graph embedding is the task of creating vector representations for each node in a graph so that distances between these vectors predict the occurrence of edges in the graph. Intuitively, the generated graph embeddings act as "compressed" representations of the nodes in the graph, i.e. feature vectors, for downstream machine learning tasks.

How are graph embeddings generated?

There are multiple graph embedding implementations that rely on different embedding algorithms. The most popular ones include node2Vec, GraphSAGE, and PyTorch-BigGraph.

The goal of each of these algorithms is to "learn" a feature representation for each node in a given graph. The choice of algorithm commonly depends on the structure and size of the input graph. PyTorch-BigGraph, for example, can handle multi-entity/multi-relation graphs with billions of nodes and trillions of edges.

Bottom line

Graph embeddings are used for building graph machine learning models which power a growing number of graph analytics and intelligence applications. This highlights the importance of graph embeddings and the algorithms used to generate them for graphs of different types and varying complexity.