Dissertation Defense: Automated Machine Learning: Intelligent Binning Data Preparation and Regularized Regression Classifier

Wednesday, April 5, 2023 2 p.m. to 4 p.m.

Automated machine learning (AutoML) has become a new trend which is the process of automating the complete pipeline from the raw dataset to the development of machine learning model. It not only can relief data scientists’ works but also allows non-experts to finish the jobs without solid knowledge and understanding of statistical inference and machine learning.

One limitation of AutoML framework is the data quality differs significantly batch by batch. Consequently, fitted model quality for some batches of data can be very poor due to distribution shift for some numerical predictors. In this dissertation, we develop an intelligent binning to resolve this problem. In addition, various regularized regression classifiers (RRCs) including Ridge, Lasso and Elastic Net regression have been tested to enhance model performance further after binning.

We focus on the binary classification problem and had developed a AutoML framework using Python to handle the entire data preparation process including data partition and intelligent binning. This system has been tested extensively and the results have shown that (1) All the models perform better with intelligent binning for both balanced and imbalance binary classification problem. (2) Regression-based methods are more sensitive than tree-based methods using intelligent binning. RRCs can work better than other tree methods by using intelligent binning technique. (3) Weighted RRC can obtain the best results comparing other methods. (4) Our framework is an effective and reliable tool to conduct AutoML.

 

Key Words: AutoML, Statistical Learning, Machine Learning, Binning, and Regularized Regression Classifiers.

Outline of Studies: PhD in Big Data Analytics,

Educational Career:

Ph.D. in Engineering Mechanics, 2011, University of Nebraska-Lincoln

M.S. in Statistical Computing, Data Mining Track 2013, University of Central Florida

 

Committee in Charge:

Dr. Chung-Ching Morgan Wang, Chair

Dr. Liqiang Ni

Dr. Rui Xie

Dr. Bruce Cauklins

 

Approved for distribution by Dr. Chung-Ching Morgan Wang Committee Chair, on March 21, 2023.

Read More

Location:

UCF Technology Commons II (TC2): Seminar Room 222 [ View Website ]

Contact:

College of Graduate Studies 4078232766 editor@ucf.edu

Calendar:

Graduate Thesis and Dissertation

Category:

Uncategorized/Other

Tags:

Graduate UCF College of Sciences defense UCF Department of Statistics and Data Science