Nicolas Gonatas, Beam’s co-founder and Head of Data Science, talks about Beam’s data science philosophy. This post is a follow-up to our series of articles on Beam’s Data Science Philosophy’ and Machine Learning Principles.
What kind of data sources and variables do you consider the most critical for training your machine learning models. How do you ensure data quality and relevance?
The way traditional bureaus source data for their reports and scorecards is through the SACCRA Hub. Every lender in SA has to report its loans and borrower repayment data, by regulation, to the SACCRA Hub. SACCRA generates reports on whether customers are making repayments, and how much of their credit limit they have used. There are however inaccuracies or errors that creep in, called blemishes in the data, which can become problematic. Although bureaus have done a lot of work to fix these issues, the data they use for scorecard building is still not perfect.
Beam on the other hand connects directly to an applicant’s bank account, accessing user-generated financial data. By obtaining information directly from the bank, we can protect lenders against various types of fraud or manual tampering of PDF statements, it also enables real-time data transmission to lenders with 100% transparency. We see this as important, as the bank statements represent a “truth file” of a user’s spend behaviour and financial position. The user is ultimately giving us permission to analyse their financial position in the most transparent way possible.
What challenges have you encountered in incorporating AI techniques into credit rating and how have you addressed them?
Our biggest challenge over the past six months has been accessing enough data.
The way machine learning works is by learning generalised patterns across large data sets. Algorithms learn these patterns through observing causal relationships across many input and target variables. These patterns become more refined as the dataset grows, allowing models to make accurate predictions across diverse population groups.
A major risk with limited data is model bias, which can arise from various sources including the dataset’s size, collection methods, and inherent historical biases. For instance, a model trained on data from a specific income group, or LSM, would likely develop biases towards that group’s financial behaviours. This results in poor performance when predicting financial behaviours of a lower LSM group due to incorrect associations learned between inputs and outputs.
To combat these challenges, we’ve employed several strategies. Beyond striving for larger datasets, we address data bias and overfitting through techniques such as data augmentation and transfer learning. We also diversify our data sources to ensure our models are trained on datasets that reflect a wide range of demographics and financial behaviours, thereby making our AI models more inclusive and fair.
How do you balance the need for predictive accuracy with the interpretability of AI models, especially in highly regulated Industries?
We ensure both accuracy and interpretability through a battery of tests. These tests are numerous, but are largely either model-specific or model-agnostic. Model-specific methods are tailored to specific types of models, taking advantage of their internal mechanisms and structures. These methods are designed to provide insights into how the model processes inputs to arrive at outputs.
Model-agnostic methods on the other hand are designed to work with any machine learning model, providing more flexibility. These methods don’t rely on the internal workings of the models, making them more widely applicable and generalisable across various models we test.
This is important, since this allows us to use complex (and thus unexplainable) algorithms (like neural nets), which give us the best performance, and still understand how a final decision is made, across populations, sub-groups or even individual users. This helps steer model development but also ensures we’re not having a disparate impact on the credit decisions of various demographic sub-populations – more simply put – fighting bias.
We use these interpretability techniques, combined with a host of other tools that allow us to fight the risks that may arise from automated decision-making. We collectively call this our “Model Risk Management” framework, and we apply it to all of our machine learning pipelines, ensuring that we get accurate results and that we are following interpretability and broader compliance best practices.
What role do you envision artificial intelligence and machine learning playing in the future of credit vetting and what innovations do you anticipate in this space?
Since the 1980’s, credit scoring at scale has been machine learning driven. The existing scorecards bureaus use are technically machine learning-driven, however they are largely rule-driven and use lots of hand-crafting. Now with the advent and acceleration of big data, novel algorithms and more compute we can unlock new methodologies that weren’t possible previously – being end-to-end “learned”.
Ultimately, innovation in the credit vetting space will only keep accelerating and the whole industry will see exponential value being added to our vetting capabilities over the next five to ten years.
How is Beam’s approach to credit vetting different to that of traditional credit vetters?
Traditional credit scoring has been somewhat exclusionary, or rather disempowering of people. Consumers have no understanding or influence of their traditional bureau credit score, including whether the data used to evaluate them was accurate or outdated. Beam empowers users to use their own financial data to create a credit profile, whereas before that was completely in the lenders, banks and bureau’s hands. Providers like us are empowering customers to take control of their own financial destinies, providing them with the tools and knowledge they need to improve their financial position in the long run.
Does Beam reduce exposure to bank statement or transaction fraud?
Processing PDF bank statements opens lenders up to various degrees of fraud. In many instances, customers add fake salary lines or alter transaction data. By using Beam’s automated bank linking technology, we can drastically reduce the risk of declining credit based on fraudulent information. This is because one would physically have to swipe their card and transfer money around to create fake transactions, adding a financial and unscalable time cost to committing fraud. Since it’s now manual and not cost-free, Beam reduces risk for lenders.
Furthermore, there’s a built-in KYC element to using our automated bank linking technology. We’ve implemented a host of best practices in fraud management, including ID Verification, Anti Money-Laundering and various other Home Affairs checks. In addition, as we collect more and more data, our machine learning models will be able to identify patterns of fraudulent transactions and start to see what an anomaly bank account will look like, ultimately protecting lenders.
Summary
At Beam, our machine learning and data science approach focuses on analysing granular bank transaction data in real-time, providing a more precise way to assess an applicant’s current financial situation. This approach is designed to build on traditional credit scoring methods to overcome conventional credit bureaus’ reliance on lagged and sometimes inaccurate information.
We adopt non-parametric approaches to modelling, leveraging abundant data, increased compute power and evolving architectures, particularly deep learning. We believe it’s important to provide better transparency and include user permission in accessing financial data, so that we can empower individuals in managing their financial profiles.
To conclude, our data science philosophy and machine learning principles reflect our commitment to innovation and inclusivity in reshaping credit vetting practices, ultimately empowering consumers and mitigating risks for lenders.
About Beam
Beam is a fintech company founded in South Africa. Our state-of-the-art software solution uses transactional data to give you a real-time affordability analysis of your customer and makes manual analysis, fragmented data sources, high costs and slow processes a thing of the past so that you and your team get better data with sharper insight.
Beam makes it easy and seamless to access bank statements from multiple accounts and bureau data, giving you the most up-to-date and precise view of your customer’s financial position so that your organisation can make accelerated credit decisions.
Beam’s API-first solution reduces credit decision-making time from days to seconds while helping you forecast your customer’s income and expenses instantly. Beam Console’s audit-ready reporting dashboard lets your admin, risk and underwriting teams easily and efficiently manage your customer data from one place.