This post is a follow-up to our previous post ‘Beam’s Data Science Philosophy’ where Nic talks about Beam’s data science philosophy. In this blog, Nic discusses Beam’s Modelling and Data principles.
What is Beam’s approach to machine learning?
Beam’s philosophy on data science and risk modelling centres on the premise that the advent of machine learning, the significant increase in data generation and computing power over the last two decades have set the stage for a fundamental shift away from traditional credit scoring methods.
Is this an industry standard way of modelling data or is your approach different?
I think Beam is different across two key dimensions, number one being modelling. The way we’re modelling the data is different and modern. If you look at the traditional credit bureaus for example (and maybe this is oversimplifying because we should definitely give them credit) what they do is they use high-level, overly simplistic features about a person to infer credit risk. What do those look like? Typically, these are demographic features, credit account history, how long you have had your various bank accounts open, where you live (using a postal code as a proxy), your education status, an estimate of how much money you make as well as a few other features.
Further, your typical consumer credit reports and credit scores also use lagged variables, sometimes carrying errors that go unnoticed. For example, your current employment could be out of date or you could have repaid an account that is showing on the credit report as unpaid since it can take up to six months for that information to be reported to a credit bureau.
We actually also use these features since they provide great signal (and in many cases we will combine features and models), but primarily what we’re using in Beam is data coming from bank statements. We analyse bank transaction data at a much more granular level in order to get much richer insight. By doing this, we get a much more up-to-date and thorough picture about a person’s financial behaviour.
Another reason why Beam is an improvement is that we’ve built our platform to keep our users’ data up to date. We can re-score you every hour, day, week or month. Every time you connect to Beam and share your bank account data, we analyse it in real time to the very last transaction you did – even if it was 30 minutes ago, so that’s really the difference.
Is there a proprietary element to this approach where you do this in a way that another company or player wouldn’t have thought to do it or is yours an industry standard approach?
The underlying model architectures we use are completely commoditized. Similar to how GitHub works, where anyone can view, download and use open source code, there’s the equivalent with machine learning models. So for most of our Data Science workflows, the base models and their architectures are open source, so by definition non-proprietary. I’d say that the three key proprietary pieces of our stack are 1) the particular data sources we use, 2) how we use this data and 3) how we merge together our modeling stack with the rest of our product stack.
You’ve spoken a lot about using alternative data to predict the likelihood that someone will pay back a loan. Is the fact that you’re using alternative data as well as traditional data plus applying these specific risk principles and algorithms a “secret sauce”?
The general principle with any kind of machine learning is the more data the better. We’re not just disregarding the traditional credit scoring methods that the current bureaus are using. We’ll combine the two together because the reality is that the traditional scoring methods have actually worked pretty well up to now.
If you look at the Actuarial space, we use modelling practices that are 100 years old. At Wits we learned statistical principles that haven’t changed for 200 years. I’ve always believed that we stand on the shoulders of giants – in an academic sense – and we should respect the discoveries they’ve left us. We would be remiss to completely disregard that and think that there’s nothing to be learned from them. I think if you adopt these actuarial risk principles, combined with modern-day compute and modelling technologies – you’ll get the best results.
Now to answer your question, the cash flow data part of what Beam does is quite important if you think of the way traditional credit scoring is done by the bureaus. Basically, you need to be “credit active”, in other words, have an existing credit product in order to be scored. As you can imagine, this poses a chicken-and-egg problem for those that are financially excluded, as they are declined credit since they can’t be scored (deemed “thin file customers”) but can’t access the credit product to start building up a repayment history.
What we are doing is we’re learning the mapping between a completely independent data source – bank statement data – and credit outcomes. This is fundamentally more inclusive since ~90% of South Africans have bank accounts. In contrast, some estimates gauge that up to a third of South Africans are “thin file customers” and thus can’t get access to credit.
This being said, all of the data we get access to is user-permissioned, meaning that customers opt-in to sharing their data with us. Most South Africans aren’t aware of the fact that their bank transaction data is actually their property and they have the power and right to share this data with third party service providers like Beam for better and more fair access to financial services. This plays on our core thesis as a business, being that Open Banking and Open Finance can be a huge enabler for the average person, but more on that in another post.
About Beam
Beam is a fintech company founded in South Africa. Our software solution helps organisations make better credit decisions by seamlessly distilling a broad range of data sources in real-time
We enable risk officers to access all the information they need to analyse their customers in one place, and make it easy to make more informed choices throughout the credit and risk lifecycle. Our state-of-the-art software and API-driven model enable better collaboration and visibility between credit professionals, technical teams and management.