Data Science and Optimization Laboratory
Research
Our research focuses on multiple theoretical and applied problems in machine learning. Specifically, we conduct research on (i) efficient supervised learning methods for complex data structures, (ii) integration of data-driven methods with domain knowledge, (iii) optimization techniques for large-scale problems and (iv) application of machine learning techniques in spatio-temporal systems.
-
Efficient Gaussian Process Regression (GPR): GPR is a non-parametric and nonlinear technique that is popular for learning complex functions. However, its application to large datasets is hindered by its computational complexity. Our research entails developing frequentist and Bayesian techniques to approximate GPR while maintaining a high degree of prediction accuracy.
-
Farmanesh, B. and Pourhabib, A., Sparse pseudo-input local Kriging for large non-stationary spatial datasets with exogenous variables, arXiv:1508.01248 [stat.ML], 2015.
-
Pourhabib, A., Liang, F. and Ding, Y., Bayesian site selection for fast Gaussian process regression, IIE Transactions, 46 (5), 543-555, 2014.
-
-
Imbalanced Classification: When the training data in a classification problem is imbalanced, i.e., one class is underrepresented, most classification algorithms fail to correctly characterize class boundaries. We develop techniques based on embedded synthetic data generation to improve the accuracy of classification techniques in the context of imbalanced data.
-
Pourhabib, A., Mallick, B. K. and Ding, Y., Absent data generating classifier for imbalanced class sizes, The Journal of Machine Learning Research, 16, 2695-2724, 2015.
-
Pourhabib, A., Empirical similarity for absent data generation in imbalanced classification, arXiv:1508.01235 [stat.ML], 2015.
-
-
Statistical Model Calibration: Computational models are commonly used to represent physical experiments that are costly to run. Such models require calibration, that is utilizing physical data to improve the model’s accuracy. We devise statistical techniques for calibration when there is interdependency between physical parameters.
-
Pourhabib, A., and Balasundaram, B., Non-isometric curve to surface matching with incomplete data for functional calibration, arXiv:1508.01240 [stat.ML], 2015.
-
Pourhabib, A., Huang, J. Z., Wang, K., Zhang, C., Wang, B. and Ding, Y., Modulus prediction of Buckypaper based on multi fidelity analysis involving latent variables, IIE Transactions. 47 (2), 141-152, 2015.
-
-
Stochastic Optimization: Many stochastic optimization techniques scale poorly in the presence of a large number of variables, or when outliers exist in the data. Our research deals with developing efficient stochastic optimization techniques that can be applied to large datasets with many variables.
-
Zhao, C, and Guan, Y. Extended formulations for stochastic lot-sizing problems. Operations Research Letters 42.4, 278-283, 2014.
-
Y. Lu, C. Zhao , J. P. Watson, K. Pan, Y. Guan, Two-stage and multi-stage stochastic unit commitment under wind generation uncertainty, Proceedings of the Industrial and Systems Engineering Research Conference , Montreal, Canada, 2014.
-
-
Robust Optimization: Many optimization problems in science and engineering involve uncertainties. How to hedge against uncertainties in consideration of system reliability and computational efforts is very challenging. We focus on developing new robust models and theories for power grid problems under uncertainty, especially with high penetration of renewable energy, to ensure cost effectiveness and system robustness.
-
C. Zhao, Y. Guan, Unfied stochastic and robust unit commitment, IEEE Transactions on Power Systems, 28: 3353-3361, 2013.
-
C. Zhao, J. Wang, J. P. Watson and Y. Guan, Multi-stage robust unit commitment considering wind and demand response uncertainties, IEEE Transactions on Power Systems, 28: 2708-2717, 2013.
-
-
Data-Driven Optimization: With the information and data exploding at an astounding rate and continuing bringing significant innovations in each passing year, how to transform data into valuable information and actionable insights to facilitate data-driven decision making and planning is critical to system operators. We derive several innovative data-driven optimization approaches that integrate statistics and optimization to obtain reliable and cost-effective decisions.
-
C. Zhao, Y. Guan, Data-driven stochastic unit commitment for integrating wind generation, IEEE Transactions on Power Systems, 31(4): 2587-2596, 2015
-
C. Zhao, Y. Guan, Data-driven risk-averse two-stage stochastic program with zeta structure probability metrics, Optimization Online.
-
Zhao, Y. Guan, Data-driven risk-averse stochastic optimization with Wasserstein metric, Optimization Online.
-
-
Spatio-temporal Systems: Our research deals with developing predictive models based on high-resolution data from spatio-temporal systems. Specifically, we develop techniques that can be utilized in power grids for short-term power prediction.
-
Pourhabib, A., Huang, J. Z. and Ding, Y., Short-term wind speed forecast using measurements from multiple turbines in a wind farm, Technometrics, 58(1), 138-147, 2016.
-
Zhao, C., Wang, Q., Wang, J., & Guan, Y. Expected Value and Chance Constrained Stochastic Unit Commitment Ensuring Wind Power Utilization. Power Systems, IEEE Transactions on, 29(6), 2696-2705, 2014.
-