Graph mining is a growing field of study, and its use applies to many real-world applications. In particular, it has become increasingly crucial in transactional databases, where graphs are prevalent. Graph adoption has increased dramatically in major industries, starting with pharmaceutical, financial, security, and healthcare.
A significant problem in graph mining is finding frequent subgraphs — that is, subgraphs that occur many times in the data set. This problem is typically transformed into the more specific issue of finding frequent induced subgraphs: subgraphs that frequently occur as caused by some other graph structure (Frequent Connected Subgraph Mining, or FCSM). The objective is to develop an algorithm for finding frequent induced subgraphs in a single or set of graphs.
Practically speaking, FCSM identifies the most frequent subgraph patterns in a graph. Since the atomic structure of molecules can be easily modeled as graphs, FCSM is useful for a spectrum of problems in chemistry and materials science. Using FCSM to identify the most frequent patterns in molecular data can reveal shared properties between molecules, helping researchers design new ones or to find new applications for known ones.
Biological systems such as metabolic networks and gene regulatory networks are commonly modeled as graphs having vertices that represent genes and proteins and edges representing logical relationships between them: which of these myriads of proteins interact? Vertex classification in protein-protein interaction analysis can involve graphs with millions of vertices.
Literature searches on biological networks — a task vital to medical research— often involve finding co-occurrences of bioentities in large independent literature corpuses. Tasks like mapping relevant terms from the six million Wikipedia articles onto the corresponding 30 million PubMed abstracts are generally considered to be prohibitively expensive to compute because the number of possible subgraphs in such a problem can explode. Graphs involving influence analysis in social networks can easily involve 100 billion edges.
Early approaches to identifying subgraph frequency had to run for an often impractically long time to get answers. The Katana Graph Intelligence Platform, however, supports a range of carefully-defined metrics already optimized to run as efficiently as possible, allowing users to express business needs precisely while enjoying better performance than they could with other platforms.
Mining data is beneficial for uncovering knowledge, but managing the process is a significant challenge faced by governments, scientific industries, businesses, and communities. Katana Graph, along with Intel, designed the Katana Graph High-Performance Graph Analytics Library as an easy-to-use library for the benefit of data scientists and the growth of the open core community. Katana Graph was designed to handle trillion-edge graphs without sacrificing performance and has been verified on massive graphs, including the Web Data Commons (WDC12) with 3.5 billion vertices and 128 billion edges.
Katana Graph Intelligence Platform is used to solve problems. Schedule a meeting to speak with a Katana Graph Intelligence Platform expert to learn how Katana Graph’s Intelligence Platform will help your organization.
Get to Know Katana Graph
We thrive on testing new and diverse ideas.
Katana Graph was born of cutting-edge research and scientific rigor, and these beginnings have a powerful effect on who we are to this day. We’re devoted to problem solving, and are relentless in our pursuit of more effective and more efficient solutions to real-world challenges. Continuous improvement is the very foundation of our success.