Graph processing is an emerging application area and a necessary tool for data scientists working with large datasets. Graphs in practical applications can grow to immense sizes. For example, social networks today can have billions of nodes and edges, so high-performance parallel computing is essential.
Katana Graph, in collaboration with Intel, designed a high-performance, easy-to-use graph analytics Python library that includes:
- Highly optimized, parallel implementations of important graph analytics algorithms
- A high-level Python interface to write custom parallel algorithms on top of the underlying C++ graph engine
- Interoperability with pandas, scikit-learn, Apache Arrow, tools, and libraries in the Intel AI software stack
- Comprehensive support for extraction, transformation, and loading (ETL) from various formats
- A plugin for the Metagraph DNA search, alignment, and assembly library
This Katana Graph library was designed to be a versatile, accessible, and efficient high-performance option for graph analytics. It supports these analytics by offering the algorithms listed below. This series will explain the definition, purpose, and use of some of these algorithms.
- Breadth-first search
- Single-source shortest path
- Connected components
- Betweenness centrality
- Triangle counting
- Louvain community detection
- Subgraph extraction
- Jaccard similarity
- Community detection using label propagation
- Local clustering coefficient functions
To clarify terminology, we will work our way through some key definitions. The first definition we will dig into is breadth-first search.
Breadth-first search is an all-pairs shortest path algorithm typically used as a building block in a number of methods. It explores nodes and edges in layers by visiting the starting node’s neighbors, then the neighbors’ neighbors, and so forth. It is most useful in large, unweighted graphs (that is, graphs that don’t have values assigned to edges as additional data about connections between nodes) but has the drawback of not being able to detect cycles within a graph whose weight values add up to negative numbers, which are known as negative cycles.
Breadth-first search is an algorithm that is used to traverse a graph. It starts at a source node and explores all of the neighboring nodes at the present depth prior to moving on to the nodes at the next depth level. This process continues until all of the neighbor nodes have been explored or until a goal state has been reached.
As opposed to depth-first search, which would visit all of the nodes in one branch before moving on to the next branch, breadth-first search visits all of the children at a given level before moving on to children at the next level outward. The result is stored in a tree of the nodes that were reachable from the source node in the course of the search. Paths in the returned tree are the shortest distance that they were between the corresponding nodes in the original graph.
Breadth-first search can be used for several purposes. One purpose is simply to determine if there exists a path between a starting or source node and another specific node in the graph. It could also be used for constructing pathfinding algorithms, like finding the shortest paths from one node to another or finding optimal paths based on some cost evaluation function. Applications of this classic search algorithm include route mapping, web crawling, and puzzle solving.
In the next part of this series we’ll explore single-source shortest path.
To learn more about how Katana Graph Intelligence Platform is used to solve problems click here to speak to a graph expert.
Get to Know Katana Graph
We thrive on testing new and diverse ideas.
Katana Graph was born of cutting-edge research and scientific rigor, and these beginnings have a powerful effect on who we are to this day. We’re devoted to problem solving, and are relentless in our pursuit of more effective and more efficient solutions to real-world challenges. Continuous improvement is the very foundation of our success.