Many people have heard about some of the more common - or at least most discussed in the early days - applications of graph technology, things like drug discovery and fraud detection. If I wanted to describe Katana Graph’s particular advantages to, say, a security knight interested in intrusion detection, where would I start?
Keshav Pingali: Well, that's a great question. And I would talk about two different aspects. One is scale out and the other is performance. So what does that mean at the most basic level? Well, nobody has less data tomorrow than they have today. So everybody's data set, the graph, the data, that they need to deal with keeps growing all the time. And what that means is that if you have a solution in the space that runs only on a single machine, then there is a problem when your datasets get bigger; and they’re getting bigger exponentially.
The explosion of data is something that we have encountered with many of our customers who are using some of the technologies that are single-machine solutions. So they are looking for a scale-out solution. Katana Graph was designed ground-up, from its start, to be a scale-out solution. So it can deal with very large graphs. We have dealt with graphs that have 128 billion edges and lots of property data. And when you go to graphs of that scale, you really need to use a cluster; you need to use 32, maybe 64, machines. We’ve shown that we can compute effectively on a cluster of that size. In fact, we can scale to 256 machines. So those are the two aspects of the answer to your question: we can deal with large data sets, and then we can compute very effectively on a distributed, scaled-out platform. This all works because we have designed the system ground-up using principles from high-performance computing. In contrast, most of the other systems in this space started off as graph databases, and then added analytics and AI almost as an afterthought.
Thank you. That’s a great intro. Continuing on with the intrusion detection example, which is as I understand it where Katana Graph got started, can you describe the application of the technology to a use case without going too deep into the science?
KP: Sure, and, of course, security has many aspects to it, but intrusion detection is one important application within the security area. To understand the problem of intrusion detection, imagine that you're the Pentagon or you're a college campus or corporation, and say you have a computer network with certain users authorized to use it. But every once in a while the bad guys might be breaking in and using your computer network. And you want to catch them as quickly as possible - ideally before they do any damage. There are many ways to do this kind of intrusion detection, but one of the ways is to build what I’ll call an interaction graph.
In an interaction graph, there are vertices in the graph that represent the users of the network, as well as resources like files and IO ports and various other resources. And then every time there is some activity in the network, you either add a node or you add some edges to this graph. For example, if a new person enters the network, you create a node for that person. If I send an email to you, then we add an edge from the node representing me to the node representing you, and so on. So this is an evolving graph. It keeps track, at a high level of abstraction, of the activities in the network.
Then one way that people do intrusion detection is to look for what are called forbidden patterns within this graph. So a simple example would be where every communication between Smith and Jones has to go through me. In other words, Smith is not allowed to communicate directly with Jones. If all of Smith’s communications have to go through me before it gets to Jones, then you can say that if in this graph, there’s a path from Smith’s vertex to Jones’ vertex that does not contain my vertex, we detect a forbidden pattern. That example is obviously a very simple pattern. Imagine that we basically have to keep looking for these patterns in big graphs and discover them instantly when they occur. If we find this pattern, then we sort of raise a red flag. And then we alert a human operator who steps in and sees whether this is really suspicious activity, or whether it's benign - a false alarm.
So in terms of what graph technology is required, you basically have to be able to deal with exceptionally large graphs, because a big corporation could have tens of thousands, or even hundreds of thousands of users. There is a lot of activity going on. And to be useful in the context of intrusion detection, you need to be able to compute on this graph very quickly. And this goes back to the two points that I raised earlier. Why Katana? Well, two things: the ability to deal with very large datasets and the ability to compute amazingly fast on these datasets. This application shows you the need for both because you have a very large, rapidly evolving graph, and, to catch the bad guys, you need to find the forbidden patterns fast. Time is of the essence here.
We have a lot of experience in this space because the way that we got started in the graph world was that some national security teams approached us several years back, wanting to explore this particular approach to intrusion detection. They had tried some of the commercial graph systems but found their performance inadequate. When they approached us, we built a solution for them on our graph engine. We worked on the Katana Galois graph engine for several years with DARPA support. Some of the early customers told us that we had some really nice technology and that we should consider doing a startup. That's how we got started.
When you say graph technology, most people think of graph databases. Can you tell me briefly how what you’re doing at Katana Graph fits relates to the graph database world?
KP: Sure, one example is graph query. Graph query has traditionally been associated with graph databases. You have a graph, and then you issue a query, and then you get back the results of the query much like you have with a relational database. It works, but what we are adding to this is the ability to do graph analytics. That means reading in a big graph, sharding it or partitioning it between the machines of a cluster, and then computing global properties of this graph.
Consider the example of page rank computation, which all of us use implicitly every day because it's used by search engines to rank web pages. That is an example of computation that requires processing on the entire graph, as opposed to graph querying, where you can use indices to just focus on a small portion of the overall graph, and then return the result. Because we designed Katana Graph from the ground up for scale-out, we do graph analytics very quickly. The graph pattern mining example we spoke of is a great example. You're looking for certain patterns, but you’re looking for them over the entire graph, which can be huge. And that, again, requires very fast, clever algorithms and the ability to compute very quickly on large graphs. At Katana, we have experts on graph algorithms, and we're actually developing new algorithms for some of our customers. And that's something that really requires deep stack expertise. You need algorithm expertise of course, but also people who can implement the graph engine - implement those algorithms on the graph engine. That's another thing that we do very well. A fintech customer came to us with a graph pattern mining problem, and we were able to build a solution for them that's a couple of orders of magnitude faster than anything out there. And we’re adding graph AI and graph machine learning, which, again, requires computing over the entire graph. So there's much more to graph computing than graph databases or graph queries. Analytics, pattern mining and graph AI really have little to do with graph databases at all; It has to do with graph computing and graph intelligence. Intelligence – we like the sound of that.