DataCivet : A Powerful AutoML Tool with Optuna

Abstract

Machine learning (ML) has become a vital part in many aspects of our daily life.However, building well performing machine learning applications requires highly specializeddata scientists and domain experts. Automated machine learning (AutoML) aims to reduce the demand for data scientists by enabling domain experts to build machine learning applications automatically without extensive knowledge of statistics and machine learning.The selected AutoML frameworks are evaluated on data sets from established Auto ML benchmark suites.We introducing an autoML tool DATACIVET which follows best practices and avoidscommon mistakes. It helps the users get a better understanding of data and help them with Datacivet-AutoML tool with Optuna analysis of data. Artificial Intelligence has taken charge in every major sectors and financial sector is no exception. Needless to say, they gear up the operations to give accuratesolutions for all the problems.

Introduction

In recent years ML is becoming ever more important: automatic speech recognition, self driving cars or predictive maintenance in Industry 4.0 are build upon ML. ML is nowadays able to beat human beings in tasks often described as too complex for computers, e.g., AlphaGO (Silver et al., 2017) was able to beat the human champion in GO. AutoML aims to improve the current way of building ML applications by automation.


ML experts can profit from AutoML by automating tedious tasks like hyperparameter optimization (HPO) leading to a higher efficiency. Domain experts can be enabled to build ML pipelines on their own without having to rely on a data scientist.


It is important to note that AutoML is not a new trend. Designing and tuning machine learning systems is a labor and time intensive task which requires extensive expertise. The field of automated machine learning (AutoML) is focused on automating this task. AutoML tools allow novice users to create useful machine learning models, while experts can use them to free up valuable time for other tasks. In recent years, many approaches have been developed for building and optimizing model learning pipelines, or building and optimizing deep neural networks. This paper focuses on the former. DataCivet is an Automated Machine Learning (AutoML) tool that can enhance business agility and accelerate the development of production-ready ML models with ease and efficiency. It ensures a dramatic impact on democratizing AI and making organizations more effective and efficient. DataCivet uses tabular or structured data to train a machine learning model to make predictions on new data. The workflow of DataCivet involves Gathering Data, Preparing the Data, Train, Evaluate, Test, Deploy and Predict.


DataCivet empowers data scientists, analysts, and other professionals to automatically build and deploy highly advanced ML models on structured data at immeasurably increased speed and scale. The application has possibilities in industries ranging from healthcare to Financial Markets & Fintech to Banking, Public sector, Retail to Manufacturing. It can also be applied across functions like Marketing, Finance and more.


DataCivet has the capabilities to tackle highly critical and complex tasks like supply chain management, fraud detection, lead conversion optimization etc. As an advanced level now we integrated optuna in our supercool ml tool DATACIVET. Optuna is a software framework for automating the optimization process of these hyperparameters. It automatically finds optimal hyperparameter values by making use of different samplers such as grid search, random, bayesian, and evolutionary algorithms. Optuna is a state-of-the-art automatic hyperparameter tuning framework that is completely written in Python.


Note: Classification datasets were selected from AutoML papers (Thornton et al., 2013), competitions (Isabelle et al., 2016), and machine learning benchmarks (Bischl et al., 2017a) 3 , according to a predefined list of criteria.4


A New Hope

In this work, we are trying with some of those classification dataset. We get the amazing results from this optuna. The Datacivet is completely open source, and allows anyone to extend it by adding or updating AutoML systems through pull requests. Here we are using the feature engineering method before tuning with optuna.


The Deep Feature Synthesis algorithm automatically aggregates features in a relational database structure . To do such, the algorithm recursively traverses along the relationships in the data, applying mathematical functions over the features in this traversal and appending the result to a base table. The final output is an expanded base table that represents a much larger portion of the relational data.

Conclusion

We presented an AutoML tool for the better results by integrating with the optuna framework. we get the faster and better results when compared it without optuna. we tried some of the algorithms here Random Forest, SVM , XGBM etc. These are the good algorithms for the classification problems. Here we tried both binary and multiclass datasets. Optuna is fair with both the cases.


The purpose of our research was how to update our tool for better solutions. Optuna helped us a lot for this purpose. Now we can find a best solution by this and also we are satisfied in the accuracy of our results.


In both the classification (binary and multiclass) we get the results in most satisfied level.

Leave a Reply

Your email address will not be published.