Monday, August 12, 2019

State of the Art of Automated Machine Learning

Here is a good survey paper of the current state of automated machine learning (AutoML).

Humans take too long to analyze, research, craft, iterate, and try out deep learning approaches to business and research problems.  So the machine learning community is creating "AutoML" methods to accelerate the process.

It takes forever to figure out our dirty data, bizarre distributions, what to do with insufficient learning data.  AutoML to the rescue:


"the distribution of web data can be extremely different from the target dataset, which would increase the difficulty of training the model. A common solution is to fine-tune these web data [66], [67]. Yang et al. [52] an iterative algorithm for model training and web data filtering. Additionally, dataset imbalance is also a common problem, because there probably are only a small number of web data for some special classes. To solve this problem, Synthetic Minority Over-Sampling Technique (SMOTE) [68] was proposed to synthesize new minority samples between existing real minority samples instead of up-sampling them or down-sampling the majority samples. Guo et al. [69] propose to combines boosting method with data generation to enhance the generalization and robustness of the model against imbalanced data sets."

And the algorithms can auto-tune themselves:


It gets better.  Genetic Algorithms are back!  You can automatically evolve your model for survival and "fitness."  Here is a simple method called "sequential model-based optimization (SMBO):


It is worth skimming the paper to catch up on what is happening.

No comments: