By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
TrendSnapNewsTrendSnapNews
  • Home
Reading: LightAutoML: AutoML Solution for a Large Financial Services Ecosystem
Share
Notification Show More
TrendSnapNewsTrendSnapNews
  • Home
Follow US
© 2024 All Rights Reserved |Powered By TrendSnapNews
TrendSnapNews > Uncategorized > LightAutoML: AutoML Solution for a Large Financial Services Ecosystem
Uncategorized

LightAutoML: AutoML Solution for a Large Financial Services Ecosystem

June 11, 2024 16 Min Read
Share
LightAutoML: AutoML Solution for a Large Financial Services Ecosystem
SHARE

Although AutoML rose to popularity a few years ago, the ealy work on AutoML dates back to the early 90’s when scientists published the first papers on hyperparameter optimization. It was in 2014 when ICML organized the first AutoML workshop that AutoML gained the attention of ML developers. One of the major focuses over the years of AutoML is the hyperparameter search problem, where the model implements an array of optimization methods to determine the best performing hyperparameters in a large hyperparameter space for a particular machine learning model. Another method commonly implemented by AutoML models is to estimate the probability of a particular hyperparameter being the optimal hyperparameter for a given machine learning model. The model achieves this by implementing Bayesian methods that traditionally use historical data from previously estimated models, and other datasets. In addition to hyperparameter optimization, other methods try to select the best models from a space of modeling alternatives. 

Contents
LightAutoML : Methodology and ArchitectureLightAutoML Tabular PresetAuto-Typing and Data PreprocessingValidation SchemesFeature SelectionHyperparameter TuningLightAutoML : Experiment and PerformanceFinal Thoughts

In this article, we will cover LightAutoML, an AutoML system developed primarily for a European company operating in the finance sector along with its ecosystem. The LightAutoML framework is deployed across various applications, and the results demonstrated superior performance, comparable to the level of data scientists, even while building high-quality machine learning models. The LightAutoML framework attempts to make the following contributions. First, the LightAutoML framework was developed primarily for the ecosystem of a large European financial and banking institution. Owing to its framework and architecture, the LightAutoML framework is able to outperform state of the art AutoML frameworks across several open benchmarks as well as ecosystem applications. The performance of the LightAutoML framework is also compared against models that are tuned manually by data scientists, and the results indicated stronger performance by the LightAutoML framework. 

This article aims to cover the LightAutoML framework in depth, and we explore the mechanism, the methodology, the architecture of the framework along with its comparison with state of the art frameworks. So let’s get started. 

Although researchers first started working on AutoML in the mid and early 90’s, AutoML attracted a major chunk of the attention over the last few years, with some of the prominent industrial solutions implementing automatically build Machine Learning models are Amazon’s AutoGluon, DarwinAI, H20.ai, IBM Watson AI, Microsoft AzureML, and a lot more. A majority of these frameworks implement a general purpose AutoML solution that develops ML-based models automatically across different classes of applications across financial services, healthcare, education, and more. The key assumption behind this horizontal generic approach is that the process of developing automatic models remains identical across all applications. However, the LightAutoML framework implements a vertical approach to develop an AutoML solution that is not generic, but rather caters to the needs of individual applications, in this case a large financial institution. The LightAutoML framework is a vertical AutoML solution that focuses on the requirements of the complex ecosystem along with its characteristics. First, the LightAutoML framework provides fast and near optimal hyperparameter search. Although the model does not optimize these hyperparameters directly, it does manage to deliver satisfactory results. Furthermore, the model keeps the balance between speed and hyperparameter optimization dynamic, to ensure the model is optimal on small problems, and fast enough on larger ones. Second, the LightAutoML framework limits the range of machine learning models purposefully to only two types: linear models, and GBMs or gradient boosted decision trees, instead of implementing large ensembles of different algorithms. The primary reason behind limiting the range of machine learning models is to speed up the execution time of the LightAutoML framework without affecting the performance negatively for the given type of problem and data. Third, the LightAutoML framework presents a unique method of choosing preprocessing schemes for different features used in the models on the basis of certain selection rules and meta-statistics. The LightAutoML framework is evaluated on a wide range of open data sources across a wide range of applications. 

See also  Bitget Pioneers Meme Revolution at MEMECON 2024: ‘The Mona Lisa Was a Meme’ says COO

LightAutoML : Methodology and Architecture

The LightAutoML framework consists of modules known as Presets that are dedicated for end to end model development for typical machine learning tasks. At present, the LightAutoML framework supports Preset modules. First, the TabularAutoML Preset focuses on solving classical machine learning problems defined on tabular datasets. Second, the White-Box Preset implements simple interpretable algorithms such as Logistic Regression instead of WoE or Weight of Evidence encoding and discretized features to solve binary classification tasks on tabular data. Implementing simple interpretable algorithms is a common practice to model the probability of an application owing to the interpretability constraints posed by different factors. Third, the NLP Preset is capable of combining tabular data with NLP or Natural Language Processing tools including pre-trained deep learning models and specific feature extractors. Finally, the CV Preset works with image data with the help of some basic tools. It is important to note that although the LightAutoML model supports all four Presets, the framework only uses the TabularAutoML in the production-level system. 

The typical pipeline of the LightAutoML framework is included in the following image. 

Each pipeline contains three components. First, Reader, an object that receives task type and raw data as input, performs crucial metadata calculations, cleans the initial data, and figures out the data manipulations to be performed before fitting different models. Next, the LightAutoML inner datasets contain CV iterators and metadata that implement validation schemes for the datasets. The third component are the multiple machine learning pipelines stacked and/or blended to get a single prediction. A machine learning pipeline within the architecture of the LightAutoML framework is one of multiple machine learning models that share a single data validation and preprocessing scheme. The preprocessing step may have up to two feature selection steps, a feature engineering step or may be empty if no preprocessing is needed. The ML pipelines can be computed independently on the same datasets and then blended together using averaging (or weighted averaging). Alternatively, a stacking ensemble scheme can be used to build multi level ensemble architectures. 

LightAutoML Tabular Preset

Within the LightAutoML framework, TabularAutoML is the default pipeline, and it is implemented in the model to solve three types of tasks on tabular data: binary classification, regression, and multi-class classification for a wide array of performance metrics and loss functions. A table with the following four columns: categorical features, numerical features, timestamps, and a single target column with class labels or continuous value is feeded to the TabularAutoML component as input. One of the primary objectives behind the design of the LightAutoML framework was to design a tool for fast hypothesis testing, a major reason why the framework avoids using brute-force methods for pipeline optimization, and focuses only on efficiency techniques and models that work across a wide range of datasets. 

See also  Qwen2 – Alibaba’s Latest Multilingual Language Model Challenges SOTA like Llama 3

Auto-Typing and Data Preprocessing

To handle different types of features in different ways, the model needs to know each feature type. In the situation where there is a single task with a small dataset, the user can manually specify each feature type. However, specifying each feature type manually is no longer a viable option in situations that include hundreds of tasks with datasets containing thousands of features. For the TabularAutoML Preset, the LightAutoML framework needs to map features into three classes: numeric, category, and datetime. One simple and obvious solution is to use column array data types as actual feature types, that is, to map float/int columns to numeric features, timestamp or string, that could be parsed as a timestamp — to datetime, and others to category. However, this mapping is not the best because of the frequent occurrence of numeric data types in category columns. 

Validation Schemes

Validation schemes are a vital component of AutoML frameworks since data in the industry is subject to change over time, and this element of change makes IID or Independent Identically Distributed assumptions irrelevant when developing the model. AutoML models employ validation schemes to estimate their performance, search for hyperparameters, and out-of-fold prediction generation. The TabularAutoML pipeline implements three validation schemes:

  • KFold Cross Validation: KFold Cross Validation is the default validation scheme for the TabularAutoML pipeline including GroupKFold for behavioral models, and stratified KFold for classification tasks. 
  • Holdout Validation : The Holdout validation scheme is implemented if the holdout set is specified. 
  • Custom Validation Schemes: Custom validation schemes can be created by users depending on their individual requirements. Custom Validation Schemes include cross-validation, and time-series split schemes. 

Feature Selection

Although feature selection is a crucial aspect of developing models as per industry standards since it facilitates reduction in inference and model implementation costs, a majority of AutoML solutions do not focus much on this problem. On the contrary, the TabularAutoML pipeline implements three feature selection strategies: No selection, Importance cut off selection, and Importance-based forward selection. Out of the three, Importance cut off selection feature selection strategy is default. Furthermore, there are two primary ways to estimate feature importance: split-based tree importance, and permutation importance of GBM model or gradient boosted decision trees. The primary aim of importance cutoff selection is to reject features that are not helpful to the model, allowing the model to reduce the number of features without impacting the performance negatively, an approach that might speed up model inference and training. 

See also  Apple shares fall back as it races to catch up with OpenAI partnership

The above image compares different selection strategies on binary bank datasets. 

Hyperparameter Tuning

The TabularAutoML pipeline implements different approaches to tune hyperparameters on the basis of what is tuned. 

  • Early Stopping Hyperparameter Tuning selects the number of iterations for all models during the training phase. 
  • Expert System Hyperparameter Tuning is a simple way to set hyperparameters for models in a satisfactory fashion. It prevents the final model from a high decrease in score compared to hard-tuned models.
  • Tree Structured Parzen Estimation or TPE for GBM or gradient boosted decision tree models. TPE is a mixed tuning strategy that is the default choice in the LightAutoML pipeline. For each GMB framework, the LightAutoML framework trains two models: the first gets expert hyperparameters, the second is fine-tuned to fit into the time budget. 
  • Grid Search Hyperparameter Tuning is implemented in the TabularAutoML pipeline to fine-tune the regularization parameters of a linear model alongside early stopping, and warm start. 

The model tunes all the parameters by maximizing the metric function, either defined by the user or is default for the solved task. 

LightAutoML : Experiment and Performance

To evaluate the performance, the TabularAutoML Preset within the LightAutoML framework is compared against already existing open source solutions across various tasks, and cements the superior performance of the LightAutoML framework. First, the comparison is carried out on the OpenML benchmark that is evaluated on 35 binary and multiclass classification task datasets. The following table summarizes the comparison of the LightAutoML framework against existing AutoML systems. 

As it can be seen, the LightAutoML framework outperforms all other AutoML systems on 20 datasets within the benchmark. The following table contains the detailed comparison in the dataset context indicating that the LightAutoML delivers different performance on different classes of tasks. For binary classification tasks, the LightAutoML falls short in performance, whereas for tasks with a high amount of data, the LightAutoML framework delivers superior performance.

The following table compares the performance of LightAutoML framework against AutoML systems on 15 bank datasets containing a set of various binary classification tasks. As it can be observed, the LightAutoML outperforms all AutoML solutions on 12 out of 15 datasets, a win percentage of 80. 

Final Thoughts

In this article we have talked about LightAutoML, an AutoML system developed primarily for a European company operating in the finance sector along with its ecosystem. The LightAutoML framework is deployed across various applications, and the results demonstrated superior performance, comparable to the level of data scientists, even while building high-quality machine learning models. The LightAutoML framework attempts to make the following contributions. First, the LightAutoML framework was developed primarily for the ecosystem of a large European financial and banking institution. Owing to its framework and architecture, the LightAutoML framework is able to outperform state of the art AutoML frameworks across several open benchmarks as well as ecosystem applications. The performance of the LightAutoML framework is also compared against models that are tuned manually by data scientists, and the results indicated stronger performance by the LightAutoML framework. 

You Might Also Like

The King of Fighters 15 – Vice and Mature Announced for December 2024

Lego Hill Climb Adventures is a charming, simplified Trials

France National Assembly’s reelected speaker Braun-Pivet to cohabit with New Popular Front

DeFi Protocol Rho Markets Suffers $7.6 Million Loss Scare With Gray Hat Hackers

US Calls on Chinese Regime to End Its 25-Year Persecution of Falun Gong

Share This Article
Facebook Twitter Copy Link
Previous Article France’s borrowing costs rise on Macron’s snap election decision France’s borrowing costs rise on Macron’s snap election decision
Next Article Travis Kelce Reveals ‘Summer Job’ Washing Cars: Watch Travis Kelce Reveals ‘Summer Job’ Washing Cars: Watch
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Latest News

The King of Fighters 15 – Vice and Mature Announced for December 2024
The King of Fighters 15 – Vice and Mature Announced for December 2024
Uncategorized
Lego Hill Climb Adventures is a charming, simplified Trials
Lego Hill Climb Adventures is a charming, simplified Trials
Uncategorized
France National Assembly’s reelected speaker Braun-Pivet to cohabit with New Popular Front
France National Assembly’s reelected speaker Braun-Pivet to cohabit with New Popular Front
Uncategorized
DeFi Protocol Rho Markets Suffers .6 Million Loss Scare With Gray Hat Hackers
DeFi Protocol Rho Markets Suffers $7.6 Million Loss Scare With Gray Hat Hackers
Uncategorized
US Calls on Chinese Regime to End Its 25-Year Persecution of Falun Gong
US Calls on Chinese Regime to End Its 25-Year Persecution of Falun Gong
Uncategorized
The AI boom has an unlikely early winner: Wonky consultants
The AI boom has an unlikely early winner: Wonky consultants
Uncategorized

You Might Also Like

The King of Fighters 15 – Vice and Mature Announced for December 2024
Uncategorized

The King of Fighters 15 – Vice and Mature Announced for December 2024

July 20, 2024
Lego Hill Climb Adventures is a charming, simplified Trials
Uncategorized

Lego Hill Climb Adventures is a charming, simplified Trials

July 20, 2024
France National Assembly’s reelected speaker Braun-Pivet to cohabit with New Popular Front
Uncategorized

France National Assembly’s reelected speaker Braun-Pivet to cohabit with New Popular Front

July 20, 2024
DeFi Protocol Rho Markets Suffers .6 Million Loss Scare With Gray Hat Hackers
Uncategorized

DeFi Protocol Rho Markets Suffers $7.6 Million Loss Scare With Gray Hat Hackers

July 20, 2024

About Us

Welcome to TrendSnapNews, your go-to destination for the latest updates and insightful analysis on the world’s most pressing topics. At TrendSnapNews, we are committed to delivering accurate, timely, and engaging news that keeps you informed and empowered in an ever-changing world.

Legal Pages

  • About Us
  • Contact US
  • Disclaimer
  • Privacy Policy
  • Terms of Service
  • About Us
  • Contact US
  • Disclaimer
  • Privacy Policy
  • Terms of Service

Trending News

Helicopter carrying Iran's president apparently crashes in mountainous region

Helicopter carrying Iran's president apparently crashes in mountainous region

Para rowing – Paralympic power

Para rowing – Paralympic power

‘Portal’ installations in NYC, Dublin temporarily closed due to 'inappropriate behavior'

‘Portal’ installations in NYC, Dublin temporarily closed due to 'inappropriate behavior'

Helicopter carrying Iran's president apparently crashes in mountainous region
Helicopter carrying Iran's president apparently crashes in mountainous region
May 26, 2024
Para rowing – Paralympic power
Para rowing – Paralympic power
May 26, 2024
‘Portal’ installations in NYC, Dublin temporarily closed due to 'inappropriate behavior'
‘Portal’ installations in NYC, Dublin temporarily closed due to 'inappropriate behavior'
May 26, 2024
Stunning meteor lights up the sky over Europe
Stunning meteor lights up the sky over Europe
May 26, 2024
© 2024 All Rights Reserved |Powered By TrendSnapNews
trendsnapnews
Welcome Back!

Sign in to your account

Lost your password?