The theoretical concept of networks is the basis of many modern approaches to analyzing data. At its most superficial level, network graph science represents a collection of concepts describing and analyzing entities and their relationships. A simple graph drawing is often used to introduce the idea of entities/nodes and their relationships/edges but is only the beginning. There are a variety of types of tools needed to have a complete network graph package.
Behind the scenes - complex yet straightforward
Graphs can be relatively simple - a set of two or more nodes related together. It gets a little more complicated when you have many nodes, edges between them, and many properties or attributes to map across all of those features. For example, a personal social network may be very dense and include thousands of these kinds of relationships.
But the complexity doesn’t end with these kinds of basic data structures and elementary visualizations. Collections of tools need to be turned into comprehensive systems to be helpful. Consider the issues with data structures, data storage, developer APIs, multi-user access, streaming data ingestion, integration with other systems, ETL processes, and much more.
While many research environments have specialized tools for specific graph-related tasks, the outputs then have to be lifted into other types of systems before providing value to other applications.
Types of Network Graph Tools
Network graph data languages and construction
Graphs do not usually consist of simple rows and columns like a relational data structure used in tabular databases. Instead, they have distinct needs for managing and describing each type of feature in a meaningful and consistent way. They may keep nodes, edges, and related properties in two or three different data structures. Minimally, there must be two lists -- node definitions and edge entities that reference pairs of nodes to one another.
Each type of feature will have a unique identifier - a name, ID number, or both. Edges will use these unique IDs to build the relationship graph used by other types of tools below.
Each feature may have a set of other attributes, like fields, to track as well. A social network graph node will have a person’s name or email as their ID and their age and location as secondary attributes. Each edge would identify two people by their IDs and add additional information like the date they connected or how they are related (spouse, friend, etc.).
Some tools also expect a particular flow of data through the graph, e.g., from one person to another, but not the other way around. There is a specific property known as directionality, and a graph can be directed (expecting flow in one direction only) or undirected (flow can go both ways).
As you can see, these kinds of structures seem relatively simple, but there have to be standard ways to build them before other applications can use them. Some systems expect comma-separated values, and some use notations, including arrow symbols. Others use simple statements (e.g., topic-1 builds-on topic-2). Systems store these representations in various forms: text files, spreadsheets, tables, and other binary formats, depending on the system.
Network graph analysis tools
Once you have some graph data described and stored in a file, on a network, or in a database (more on that below), it is time to use the data to build your knowledge. There are standard concepts of network analysis in the network graph domain. These different techniques often reflect the companies or institutions that did the initial research and testing. For example, social media companies have invested heavily in social network analysis tools. Likewise, financial fraud assessment techniques are developed or funded by banking institutions.
Some standard analyses include:
- Finding nodes that act as hubs between multiple groups
- Determining how influential one node is to others
- Computing the complexity of relationships between two different graphs
- Ranking each node on how well it connects to others, and more
The different routines use common naming to describe them though they are almost meaningless to users outside of the domain, so there is always a learning curve.
In general, once a network is analyzed, a set of properties are used to describe it. These reflect the analytical processes mentioned above and include concepts such as PageRank, modularity, average distance or path length, degree, graph density, centrality, etc.
In all cases, these are common analytic questions, but each toolset exposes and applies these analyses differently. Some may ingest a raw data file and produce an output file with an answer. They may choose only to send the results to a visualization tool. And others may return the solution as part of an ad hoc query.
Network graph visualization tools
The product of a network graph analysis routine or tool is often fed into a visualization program to help provide more context. For example, nodes and edges may be colored or sized according to some computed properties like connectedness with other nodes or how influential they are to the overall system.
There are a wide variety of tools for desktop or web-based visualization. Some are used for specific domains, like IT network topology, while others are more generic for research purposes. A broader definition of network graphs also applies to areas such as genomic sequencing or geographic cartography concepts.
When one starts to look for connected nodes of information, it can be seen in many different places and across wide ranges of time. For example, this 1936 map of Budapest by Hivatal was designed to show its centrality in Hungary.
There are even more aspects to understanding network graphs and their applications which we’ll cover in other posts.
Katana Graph addresses the most significant challenges in the network graph space: combining graph algorithms with distributed storage and accelerated hardware for new performance levels.
We’ll continue the discussion in the next post and describe the role that query languages, databases, and software licensing play in the network graph toolkit space.