In part one of this topic, we looked at how network graph tools do more than just visualize data. How data is stored and represented is essential, as are the kinds of analysis available to different tools. In this post, we continue the discussion and describe the role that query languages, databases, and software licensing play in the network graph toolkit space.
Query languages for network graphs
Graph analytics and visualization are only part of the overall network graph ecosystem. Some tools only do one thing well, while others try to be platforms to deliver a broader set of capabilities.
As with any new type of data structure, extracting data back out of a system is essential. Analytic tools may help with extraction by ingesting a dataset and producing a result.
But interaction with data may also occur through a dedicated query language. Databases that use SQL can use the language to create new data, group information, and extract the data for other tools. It is no different with graph queries, except that the types of functions available do vary.
Simple queries would include returning a list of all the nodes, edges, and their properties. Data manipulation queries would allow the creation of new nodes/edges or updating of properties. And derived information can be generated using queries to extract a subset of nodes or perform an analytical step. For example, computing the connectedness values of a graph may be requested with filtering criteria to focus on a given set of nodes.
Query syntax is a differentiator between different tools and systems. Some are tightly modeled after SQL, while others were invented for their specific proprietary product. Ones that use SQL are more readily available for adoption by other analysts, whereas new query languages are optimized for efficiency, conciseness, or other graph-specific values.
Also, unlike SQL, some query languages use an interactive interpreter where the user builds an operating set of nodes and then continues to apply subsequent steps of analysis to it.
These graph-specific languages excel at getting to network-related answers quickly and efficiently as they are familiar with the underlying data structures and concepts. SQL, on the other hand, ends up treating everything like a set of rows in tables.
Different tools adopt various standards, and many projects and vendors are testing new ones such as Graph Query Language (GQL) through an ISO committee. GQL should not be confused with GraphQL, which is not related to network graph querying.
Other specialized data platforms such as those handling RDF data and triple-stores have their query languages as well.
So far, we’ve been talking very abstractly about data formats and query languages. As we move into the domain of graph databases, things become slightly more complicated. Graph databases tend to operate like platforms with many features - they store data, query data, and apply analytics.
Usually, databases have to integrate with other products to view and interact with data (e.g., business intelligence systems) or produce derived analytics (e.g., OLAP systems). In a graph database, this may or may not be similar, depending on the platform.
Some systems come with built-in dashboarding to view the resulting graph visualizations. Others merely deal with the storage and distribution of data across their IT infrastructure and give you a query language to access it. When considering graph databases to use, it is vital to understand how well a system either supplies a wide range of helpful tooling or integrates with third-party systems to fill those gaps.
Proprietary and open source
Graph databases, like SQL databases, also come in a wide variety of flavors. Cloud service providers may take care of all the underlying infrastructure at the cost of a monthly subscription. Standalone products for an enterprise are also available from vendors. And, naturally, open-source graph databases are also available for running locally, on-prem, or in a cloud environment.
The history of graph databases started with a small ecosystem of tools for analyzing raw data. Then a more broad set of analytics was made available as researchers and developers collaborated on ways to share their research. The next step was to bring together leading organizations and products to agree on standards -- this is a recent stage of development.
Open-source options tend to help drive the standardization or at least test them first. But there are both open-source options and proprietary options at all levels of the graph analytics toolchain. When considering the options here, be sure to account for the level of support needed for the project or business use case.
While open-source can help with rapid prototyping and reduce long-term licensing costs, customers may expect full stack support from front to back, which only a vendor can provide.
Limitations of standard approaches
Beyond this current stage, we see several startups and traditional database vendors solidifying their graph-related offerings in more comprehensive ways. Leveraging standards allows them to integrate their proprietary algorithms or open-source approaches and provide them to their customers.
Having customer support for an entire platform is essential for businesses to adopt these new platforms. Support must cover data handling, distribution of a cluster of database nodes for reliability, extracting and integrating with other products, and more. Demanding business environments also want support for efficiently operating and optimizing performance to deliver answers quicker than they could before.
In a nutshell, standard approaches of one-off standalone tools and manually moving data files around a network will not meet enterprise business demands. Likewise, sucking data from one system to another for analysis and then back is also seen as inherently fragile.
Instead, modern enterprises expect a distributed system (accessible across teams) that includes the analytic tools they need, and that will scale out as their business grows. All this is needed while not sacrificing performance or accessibility.
Today we are just starting to deliver that next-generation experience for our customers. Join us if you want to participate in this quest to combine powerful algorithms with supercharged backends on modern hardware.