Can high-performance computing and graph technology help to prevent economic disaster? Traditional analytics efforts have historically been afflicted by disconnected data, omissions and duplications, and incompatible data formats. These problems render most attempts at making data-based predictions ineffective at penetrating the data silos that obscure important relationships between the contributing causes of financial debacles.
Fannie Mae, named for the Federal National Mortgage Association (FNMA), is a government-sponsored enterprise (GSE) that buys mortgage loans from banks and credit unions and then guarantees them in the mortgage market for lower-income borrowers. Founded in 1938 by the U.S. Congress, its intent was to make housing more affordable. It became a private institution in 1968 but has been under the government conservatorship of the Federal Housing Finance Agency (FHFA) since 2008.
Fannie Mae provides a subset of its data on single-family mortgage loans. This dataset includes a subset of Fannie Mae’s 30-year, fixed-rate, fully documented, single-family, amortizing loans since January 1, 2000. It includes monthly loan-level detail and has the potential to give insight into the credit performance of those loans and any associated properties, loans, borrowers, or sellers.
Fannie Mae was strongly criticized for its role in the 2008 mortgage crisis, the worst housing crisis since the Great Depression according to the 2011 Financial Crisis Inquiry Commission. Since then, many have debated the root cause of the crisis; some cite deceit in the finance sector and others blame oversight neglect by government regulators. Effects attributed to the mortgage crisis included a 3.4% contraction in the US GDP and the destruction of 3.1 trillion dollars in household wealth. The Bureau of Labor Statistics reported unemployment of ten percent and employment side-effects lasting nearly a decade.
Regardless of the root cause, a pragmatic concern is whether similar events in the future can be predicted and avoided. We are then driven to ask whether the 2008 crisis could have been recognized before it happened, given the computational resources available today.
The Fannie Mae data shows that during the fourth quarter of 2007, the portfolio contained 340,000 mortgages with a total principal value of 70 billion dollars and a delinquency rate of 19.4%. Prior to this, Fannie Mae’s historical delinquency rate averaged 1.7%. Obvious questions include whether correlations existed between recorded loan acquisition parameters and delinquencies, or whether patterns within the data could provide insights and predictions to reduce future risk.
A 2017 study published in The Journal of Real Estate Research found that traditional analysis approaches in financial forecasting, such as linear and non-linear regression models, probably wouldn’t have provided clear warnings in time for the 2008 mortgage crisis to be averted.
While recent improvements in computational efficiency and data-driven ML techniques might get better results, silos within financial data have long been recognized as an impediment to drawing insights. Given the narrow scope of available Fannie Mae data, surely its predictive power has limitations. Below is a simplified schema of the Fannie Mae loan dataset, which comprises the primary information available to traditional predictive analytics.
By dissolving barriers between data sources, a knowledge graph approach can provide a basis for better risk modeling. For example, combining the Fannie Mae data with Federal Reserve data, macroeconomic data, monthly unemployment data, and other labor statistics would be much more likely to support meaningful inferences and risk insights. A hypothetical extension of the Fannie Mae dataset to include data and metrics from other sources and silos is shown below.
Integrating these data sources is frustrated by a lack of common data fields across diverse data sources and the time required to process the data. Data scientists usually spend more time on conditioning and porting data, performing extensive denormalization of relational data, or encoding and re-labeling a great deal of business logic, than they do on generating insights from that data. Using traditional data storage and analysis tools for this sort of data integration would require extensive data normalization and manipulation, but these barriers are now largely overcome by graph analytics platforms such as the Katana Graph Intelligence Platform that have the ability to ingest both structured and unstructured data which can then utilize high-performance computing and AI to run graph analytics on the combined data.
Whether structured or unstructured data, Katana Graph can greatly improve the insights and opportunities uncovered from your organization’s data. Speak with a Graph Expert.