If you live in the U.S., your credit score has an outsized influence on your financial life. It affects the offers and interest rates you receive when applying for auto or home insurance or when opening a bank account or credit card. It’s used to calculate your mortgage rate and is even used in job applications. Despite this deep impact on consumer’s lives, conventional credit scoring models, such as Vantage and FICO, aren’t mandated to disclose much information and little is known about how accurate they are at predicting consumer default risk.

FI network member Stefania Albanesi and her co-author Domonkos F. Vamossy became interested in this topic after researching the housing crisis stemming from the Great Recession. “We found that there were a lot of mortgage defaults occurring to high credit-score borrowers,” Albanesi says. “So obviously credit scores, which are supposed to rank consumers based on their probability of default on consumer loans, were not doing a very good job of predicting default on these particular types of borrowers.”

In their recent HCEO working paper, the authors develop an alternative model for credit scoring based on deep learning. Deep learning is a type of machine learning “specifically designed for prediction in environments with high dimensional data and complicated non-linear patterns of interaction among factors affecting the outcome of interest, for which standard regression approaches perform poorly.” Their method uses the same information as conventional credit scoring models, but sought to improve performance and provide more transparency and accountability.

For the model, the authors used anonymized credit file data from the Experian credit bureau, reported quarterly from the beginning of 2004 to the end of 2015. The data contains more than 200 variables from a panel of one million nationally-representative households, drawn randomly from Experian’s customer base. The data contain information “on credit cards, bank cards, other revolving credit, auto loans, installment loans, business loans, first and second mortgages, home equity lines of credit, student loans and collections.” It also includes information on each consumer's outstanding balance, available credit, monthly payments, and whether accounts are 30, 60, 90, or 180 days delinquent, derogatory or charged off, as well as each borrower's credit score. Most demographic information is withheld in an effort to prevent discrimination.

Credit scores aim specifically to “target the probability of a 90 days past due delinquency in the next 24 months,” the paper notes. This is the baseline definition of default used in the paper. In the sample, approximately 34 percent of consumers display such a delinquency. The score itself is based largely on four main factors: “length of credit history, which is stated to explain about 15% of variation in credit scores, and credit utilization, number and variety of the debt products and inquiries, each stated to explain about 25-30% of the variation in credit scores.”

The paper has three main sets of findings. The first set tracks “a variety of performance metrics to basically assess how good the model is at predicting the outcome that you’re targeting,” Albanesi says. “And our model does really, really well pretty much on all performance metrics.” She notes that the analysis is done out of sample, “which means that we sort of put ourselves in the same position that a lender or a credit scoring company would be. When they develop the model they do it to predict people in the future, obviously without having any data for the future period.”

What’s perhaps most interesting is the second set of findings, which compares the alternative model to conventional credit scores. Because the sample includes credit scores from each quarter, the authors can use actual default rates to see if their model does a better job of predicting default than the credit score would have done. Albanesi notes that what matters more for consumers than their actual credit score is what industry bin they are put in, which determines their default risk. The bins are: Deep Subprime, Subprime, Near Prime, Prime, and Super Prime. In one exercise, Albanesi and Vamossy put the consumers in the appropriate bins based on their credit score. Next they generated similar bins, but based on their own model. The authors then compared what bin borrowers were placed in based on the model used. 

“Our model would put about 30 percent of those borrowers in a different risk category than the credit score does, with very large implication for the credit conditions that the borrower would face,” Albanesi says.

The authors also computed the average cost savings for consumers, in terms of interest rates they would be paying, if they were rated using the alternative model. Not all consumers gain, of course, but the net gains are positive. The biggest variation in credit card interest rates occurs across Subprime and Near Prime borrowers in comparison to Prime, with cost savings averaging out to 4-5 percent of total credit card balances of $1,085-$1,426. “The accumulated interest rate cost savings across all consumers in our sample is $723,636,560, which amounts to $40 per capita,” the paper notes.

Lenders also stand to benefit from a more accurate credit scoring model. If a borrower is about to default, they tend to increase their borrowing. If a lender had a better sense when this was about to happen, they could cut the line of credit. “That could either prevent default altogether or if default occurred, it would occur on a smaller balance, so the lender loses less,” Albanesi says. 

The authors are also able to predict probability of default for every individual in their sample without an empty credit record. This is important as 8.3 percent of the U.S. population has a credit record but lacks a credit score. The paper points out that these unscored consumers, who are likely to be young, minority, and/or low-income, may find it difficult to access credit. The authors’ “ability to score every active borrower is partly due to the fact that [their] model does not use any lagged observations.”

The last set of findings looks at macro policy implications, such as predicting variations in aggregate default risk. Because the data set is nationally representative, it can be used to estimate the aggregate default rate of consumer debt in the U.S. economy. The authors used it to plot the predictive default rates from 2004 through 2015. “We are basically able to predict the rise in defaults that correspond to the 2007-09 crisis and recession, and in a very accurate way, which is something that is now part of the macro credential policy guidelines,” Albanesi says. After the crisis, international banking regulators introduced laws stipulating that depository institutions and financial intermediaries should have a way to track systemic risk, and current monetary policy is much more focused on these kinds of financial risks. “We could predict two or three years ahead, and that could be potentially valuable for policymakers,” she says.

“Given the inaccuracy of credit scores, what it essentially implies at the level of the consumer credit market, is potentially a misallocation, a mispricing of consumer credit for a large swath of the population,” Albanesi says. “This is obviously of interest to economists, but it should also be of interest to policymakers. But there hasn’t been a lot of work on this. We provide a model that suggests that credit scoring models are ripe for an overhaul.”