The Edges of Pandora’s Box

By: Katana Graph

November 11, 2021

The Edges of Pandora’s Box

The Pandora Papers findings were brought together by the International Consortium of Investigative Journalists (ICIJ). Pandora definitely earns its name, and we know all too well how the ancient Greek story of Pandora’s Box turned out. In a recent disclosure, twelve million documents revealed hidden wealth, tax avoidance and, in some cases, money laundering by some of the world's richest and most powerful. The files exposed how some of the most powerful people in the world - including more than 330 politicians from 90 countries - use secret offshore companies to hide their wealth, uncovering dirty deeds that make honest folk cringe.

The mined data has revealed corruption, tax evasion, scandals, secretly-owned companies, and huge hidden property deals. The journalists behind the Pandora stories will surely compete for Pulitzer Prizes.

What’s in the Box

The sheer magnitude of data is mind blowing. Extracting useful information to inform knowledgeable decisions from it requires a vast amount of organizing, tagging, sorting, and categorizing, then analyzing an immense number of combinations and permutations. Doing this in a timely manner is simply beyond the realm of traditional computing. The task is exacerbated by the unformatted nature of the Pandora Papers data file types, which include documents, images, emails, spreadsheets, video, audio, presentations, and more.

There are more than 2.94 Terabytes of data, consisting of 12 million files of unstructured data, excluding the various spreadsheet formats. More than half of the files (6.4 million) are text documents, including 4 million PDFs totaling more than 10,000-pages. The documents included passports, bank statements, tax declarations, company incorporation records, real estate contracts and due diligence questionnaires. There were also more than 4.1 million images and emails in the leak (ICIJ, BBC).

Katana Graph reports that 80% of the world’s data is unstructured; the Pandora Papers are even less structured than that.

Extracting meaning from droves of different types of data is particularly challenging when a single document can contain many years’ worth of emails, charts, and attachments. Some providers digitized their records and structured them in spreadsheets, while others kept paper files that were scanned. PDFs made from scanned paper that included spreadsheets are exceptionally difficult to interpret programmatically.

Then there is the matter of languages. The Pandora Papers included works in English, Spanish, Russian, French, Arabic, Korean and other languages, requiring extensive coordination among ICIJ partners.

Further, there are the normal problems of permutations and combinatorics in connection mapping. In this case Pandora analysts faced: 27,000 companies and 29,000 so-called ultimate beneficial owners (BBC).

Sifting through this amount of data in a finite time period is obviously beyond the realm of human agency. Gathering insights on timely topics could take weeks, months or even years, and would take thousands of people working collaboratively to make links across vastly different types of files from 90 different countries.

Serving Justice

Knowledge graphs are representations of networks of real-world entities (nodes) - in the case of Pandora, the people, documents, transactions and events described in the “papers” - and their relationships with each other (edges). Computing on knowledge graphs is now by far the best option for exploring connections and making insights sufficiently fast for the insights to still be relevant. This obviously requires that both the structured and unstructured data be processed jointly to spot the money laundering and webs of relationships and transfers of information, money, and property. Katana Graph's graph engine was designed to cut through exactly this sort of monumental data challenge, saving energy, resources, and, most crucially in this case, time, so that more justice might be served.


Newsletter Sign Up

Graph Neural Networks for Credit Modeling

The financial services sector has many early adopters of sophisticated analytics techniques.

Read More
Managing Financial Services with Graph Computing: Fraud Detection, AML, and Credit Risk

Intelligent graph computing approaches are at the fore of numerous mission-critical financial.

Read More
AI-Curated Models Bridge the Credit Decisioning Gap

The digital transformation of the financial services industry is one of the biggest things.

Read More

View All Resources

Let’s Talk

Turn Your Unmanageable
Data Into Answers

Find out how Katana Graph can help provide the foundation for your future of data-driven innovation.

Contact Sales