Regression Model For Flight Delay Data Analytics
Keywords:
Flight Delay, Multiple Linear Regression, Gradient Boosted Decision Trees, Machine Learning, Predict AccuracyAbstract
The flight delay issue, which covers the departure and arrival delays, is a serious problem in aviation transportation. Since the schedule of flights in an airport is arranged systematically, passengers ought to check-in at the airport and follow the regulation rule before boarding the flight smoothly. However, the issue of the flight delay could happen due to the weather conditions and the man-made errors, which affects the system in the airport become chaos. In our study, the flight delay issue is mainly studied. For this purpose, the flight on-time performance data from the Bureau of Transportation Statistics (BTS), and the airline data and airport data from the Kaggle website are extracted. Then, these data are managed properly using the data analytics procedure. In addition, the p-value and the variance inflation factor (VIF) are examined for each feature in the data collected. Consequently, the multiple linear regression model and the gradient boosted decision trees are constructed based on the features available. Later, the performance of these models is measured by using the mean absolute error (MAE), root mean square error (RMSE), and the correlation coefficient of determination. The results show that the gradient boosted decision trees model is the most appropriate method to predict the accuracy of the flight delay compared to the multiple linear regression. In conclusion, the gradient boosted decision trees is an effective approach to improve the prediction performance of the flight delay problem.