Graph mining is one of the four essential components of graph intelligence, a transformative concept for maximizing the enterprise value of graph data within a single platform. Each graph intelligence component (also including graph query, graph analytics, and graph AI) requires some functionality for recognizing patterns in graphs; graph mining and graph query rely most heavily on that functionality.
Graph mining’s key distinction is in finding, or mining, structures from graph data. Unlike the other pillars of graph intelligence, graph mining identifies patterns that may be incompletely or not precisely defined, which can additionally involve specifying high-level constraints for those patterns.
Graph mining is a broad field with many applications and algorithms. Example techniques include frequent subgraph mining and motif counting. These approaches deliver a range of business value from horizontally applicable, revenue-generating use cases like recommendation engines to more nuanced applications supporting entity resolution in financial services.
Leveraging these approaches at enterprise speed and scale necessitates overcoming a series of challenges requiring significant technical expertise and manpower. With a graph intelligence platform, these challenges are handled transparently by the platform, so users can draw freely from a rich array of techniques, allowing the system to perform efficient computation of the underlying algorithms based on a High-Performance Computing architecture.
This graph intelligence architecture unlocks and enhances the utility of graph mining, making graph mining one of the most powerful mechanisms for consistently delivering enterprise value from unstructured data.
Graph Mining Challenges
Frequent subgraph mining (FSM) is an example that demonstrates graph mining’s power and one that provides a good setting for understanding the types of challenges the graph intelligence platform overcomes. FSM identifies the most frequent subgraph patterns in a graph. It is useful for applications in chemistry and materials science, where molecules can be modeled as graphs. Using FSM to identify the most frequent patterns in molecular data can reveal shared properties between molecules to help scientists engineer new ones, or make better use of known ones. This process might involve users specifying a threshold on the size of the pattern or shape or on the number of instances to find. The graph intelligence platform addresses the following challenges for rapidly using graph mining at enterprise scale:
- Choosing the metric to define ‘frequent’: Traditional approaches to defining subgraph frequency are hard problems without polynomial-time solutions. This means systems must run for a long time to get answers. Graph intelligence, however, supports a range of carefully defined metrics already optimized to run as efficiently as possible, allowing users to express exactly what they need while enjoying better performance than they could with other platforms.
- Designing efficient algorithms for given metrics: Frequent subgraph mining needs sophisticated algorithms to determine potentially frequent patterns. For example, after finding all the frequent patterns of size K (where K is a constant, e.g. 4 or 5), the challenge is finding which patterns of size K+1 are potentially frequent. Graph intelligence’s smart graph mining algorithms avoid computational redundancy in generating these intermediate results. Such algorithms can benefit from parallelism to improve performance, but parallelization must carefully navigate a tradeoff space between degree of parallelism and expansion of intermediate state. The graph intelligence platform can transparently navigate this space and simplify the development effort for the programmer.
- Finding the most efficient combination of platform implementation and algorithm design: Although there are several ways to implement mining algorithms, graph intelligence strives to minimize computational redundancy while maximizing computational throughput. Co-optimizing these objectives requires algorithmic innovation, which in turn requires skilled scientists and engineers with a deep understanding of graph theory and group theory. It also involves compiler expertise for translating one abstraction (the high-level abstraction or algorithm) to another, in this case a lower-level abstraction for the detailed implementation produced by the compiler. One also needs systems and runtime expertise to implement orchestration, as well as insight about hardware internals to achieve maximal performance on GPUs or CPUs.
Motif enumeration is a graph mining application that counts the motifs, or subgraphs, in a graph dataset. Examples of motifs include rectangles, triangles, or patterns containing specific node types (for example, rectangles with three red nodes and one green node). Motif counting is of immense importance for mining graph data because motifs are the building blocks of large-scale graphs. Understanding what motifs are found within graphs and how many of them are present is essential for many higher-level workloads:
- Graph Classification: Classifying graphs is a fundamental problem for determining whether or not graphs are similar to each other. In this instance, the results of motif counting serve as a graph’s signature; when two graphs have similar signatures, they are similar. In banking, for example, one might model transactions as a graph on any given day. If institutions do so a week later and motif counting reveals these graphs are noticeably dissimilar, that could indicate anomalous behavior that the bank should be aware of and act on.
- Graph Machine Learning: With this application, motif enumeration supplements the feature generation process to give machine learning models stronger signals. A social network graph, for example, can contain nodes that model different people, their companies, and other such things. Users can encode the local information for each node and generate a feature for that node. Determining which motifs the nodes participate in creates a stronger signal for machine learning models. The same approach can help with entity resolution, too. When there are two nodes in a graph, augmenting their features as described above can more accurately predict if they represent the same person, which is invaluable for Know Your Customer (KYC) use cases and combating financial crimes.
- Symmetry Breaking: This is a common graph mining problem because symmetry in graph data leads to redundant computations. Graph intelligence optimizes mining by breaking symmetry transparently without affecting the correctness by eliminating redundancy.
Community detection is another valuable graph mining application. It is a popular graph problem that illustrates graph mining’s capacity to identify structures in graphs. Different communities can be considered structures. Since people in the same communities have similar interests or behaviors, effectively grouping nodes into communities is helpful for marketing, advertising, and compelling recommendations. Although community detection is difficult in large graphs, graph intelligence can employ a cluster of machines to significantly improve the performance by scaling up the compute resources used to solve the problem.
A distinguishing feature of graph intelligence is its ability to perform its four types of graph computations (query, analytics, mining, AI) in a single platform with minimal data movement. For example, graph mining can leverage graph query for better performance. Returning to our social network use case, users may want to pinpoint graph structures five hops from a node when that node is an important person in the network. This is accomplished by using graph query to extract the relevant subgraph before running mining algorithms on it, without touching the entire graph. Conversely, graph mining can assist graph query by identifying communities upon which specific queries can be subsequently run.
Mastering Graph Mining
Graph mining strengthens graph intelligence by identifying specific graph structures for an array of use cases, algorithms, and applications. It’s one of the foundational capabilities for using graphs to solve any number of enterprise problems. Graph mining is particularly powerful when coupled with the other elements of graph intelligence — graph query, graph analytics, and graph AI — in a single platform for all workloads that empower users to easily compose efficient and scalable end-to-end applications.
This article was originally published by Chris Rossbach, CTO at Katana Graph, on Medium.