Graph query is a cornerstone for the overarching graph intelligence concept, which also includes graph mining, graph analytics, and graph AI in a single platform with an unprecedented capacity for using data to swiftly obtain high-value business insights. Graph query is often the first phase in workloads that combine these graph computing domains.
When graph query is the first phase, it is usually being used to filter large heterogeneous graphs to enable subsequent phases of a workload to be more efficient by focusing only on the most relevant data. In other cases, query is itself the core technology for facilitating business use cases. For example, graph query is an excellent technique for building intrusion detection for network security systems.
Graph query involves identifying specific patterns in a heterogeneous graph in which the nodes and edges are annotated with properties. Concretely, heterogeneous graphs have different types of nodes and edges. For example, in a social network graph, edges might model different types of relationships between people, including family members, co-workers, employers, college roommates, etc. Those typed relationships may further have specific-valued properties that capture concepts such as when the relationship was formed, and so on.
Graph query is typically used to answer questions about such properties. In the preceding social network graph, the nodes would be individuals (potentially of different types) with properties like their names, addresses, and preferences. The edges would indicate their relationships. In this use case, graph query could be used to match a pattern specifying what the property values should be. A sample query might find all the women who are HVAC specialists in the Northwest earning more than $70,000 a year, and who have done contract work for legal clients.
Graph query answers such questions based on a High-Performance Computing (HPC) architecture, which is integral to graph intelligence as a whole, and which is responsible for making it possible to provide query results at the speed and scale required by the modern enterprise. Within that HPC framework, graph query is an extremely powerful tool for implementing data exploration, for driving complex workloads that incorporate additional graph intelligence functions, or for directly solving high-value business problems.
Deconstructing Graph Query
The objective of graph query is to find all the sub-graphs of the host graph that match the pattern graph. A pattern graph represents the patterns sought in a graph to answer the query. Informally, when the structure of a sub-graph is the same as the structure of the pattern graph, and properties on those nodes and edges match predicates in the query, the system produces a query match. Graph intelligence implements graph query or path query by generating, analyzing, and extending intermediate embeddings in parallel on all the hosts in a cluster. Intermediate embeddings are sub-graphs. Writing a query is like asking a question that involves finding a specific pattern in the overall host graph. The platform searches for paths or patterns in the graph that answer a query by extending intermediate embeddings and checking their properties to see if they still match the constraints specified by the specific query pattern. Extending intermediate embeddings creates other intermediate embeddings or sub-graphs by traversing connections. Graph query tests those sub-graph extensions against the pattern graph to find matches that answer the query.
Characterizing the desired pattern as a graph is the most natural way to capture its intricacies. The pattern graph can be annotated with properties that express constraints: values users want to find, or open or free variables about paths that may capture missing information. Executing a query transforms it into an intermediate representation the system uses to discern where the query matches the host graph, (which typically involves multiple machines with HPC). The technique has broad applications in many domains. For example, graph queries can enable streamlining supply chain distribution for route optimization, reduce the number of stops in distribution, or optimize the tradeoffs and choices in the presence of geopolitical risks, tariffs, and environmental concerns.
Parallelizing and scaling out graph query enables multiple machines to coordinate to answer large queries more quickly. The approach partitions the host graph and distributes those pieces to different machines. Using multiple machines in an HPC architecture provides the foundation for extreme speed and scale. Intermediate embeddings may span different host machines so one can simultaneously search multiple hosts via parallel computations. Graph intelligence is the only platform using flexible partitioning based on dynamically selected policies. Competing solutions have much less speed and scalability: There is a large body of research that has shown performance is workload- and graph-dependent, so inflexible partitioning policy translates to lost performance for those competing solutions. Some of the more popular competing solutions are designed for Online Transaction Processing (OLTP), whereas graph intelligence is based on high-performance graph analytics techniques. This advantage enables graph intelligence’s graph query mechanisms to traverse much larger graphs in less time than other alternatives, which is invaluable for many applications, for example nonstop care in healthcare. With this use case, insurers can issue graph queries to understand relationships between patients, insurance plans, providers, and a gamut of treatment options including pharmaceuticals and alternative therapies, as well as fundamental claims processing and co-payment rules.
Graph intelligence relies on a high-performance communication engine for scalable graph computing. A well-designed communication substrate is instrumental to parallel or independent computations across machines when sub-graphs have nodes and edges on different servers. Such computations require communication for intermediate embeddings and their extensions to relay, for example, how much of a sub-graph has already been matched, what parts are on which host, and what state is needed for hosts to continue computing where others have left off. Message Parsing Interface (MPI), an HPC standard with a lengthy history in the scientific computing community, provides this idiomatic communication, enabling multiple hosts to coordinate, perform aggregations, implement reductions, and more. Leveraging a well-optimized MPI subsystem to build a graph-specialized communication fabric is a foundational approach for fast and efficient graph queries.
Tangible Business Value
The use cases for graph query are generally divided between employing it for data exploration or filtering — usually needed before utilizing other graph intelligence components (such as graph analytics) in end-to-end workflows — and using it as the sole component in a computation. Data exploration is similar to data discovery but enables users to home in on key facets of their data to inform their use cases. Graph mining, graph analytics, and graph AI use cases almost invariably start with a graph query step. The canonical example, of course, is filtering a larger graph to find the proper training data (or specific model features) for machine learning deployments.
The value of graph query in standalone settings is just as compelling, as when it is used as an initial step in multi-phase workflows. Numerous verticals and domains are driven by graph query use cases. As mentioned above, some of these include healthcare, supply chain management, and network security, while others involve financial services and aspects of regulatory compliance. For example, it is possible to express malfeasance as a pattern graph characterizing a query for Anti-Money Laundering: graph query is critical for finding patterns in the graph indicating, for instance, when a criminal is attempting to send money through multiple hops back to himself, thereby creating the building blocks for AML adherence.
It is easy to expand this use case to include graph AI, for example, by using Graph Neural Networks to predict links in money laundering cycles that are missing from the graph.
Graph Question Answering
Graph query significantly enhances the overall value proposition of graph intelligence in two ways. It either provides a primitive for identifying and extracting foundational information on which to base workloads involving the other components of graph intelligence, or works alone to solve an assortment of business problems across many verticals. It does so by finding patterns in a large heterogeneous graph. When supported in an HPC architecture for scale-out computations across many hosts, its speed and scale can enable many use cases at the modern pace of business, both today and tomorrow.
This article was originally posted by Chris Rossbach, CTO at Katana Graph, to Medium.