Analysis of Cardiovascular Disease Risk Factors in the United States using Logistic and Probit Regression
Keywords:
Cardiovascular Disease, Risk Factors, Binary Logistic Regression, Probit Regression, United StatesAbstract
Cardiovascular Disease (CVD) is a non-communication disease that remains the most life-threatening disease worldwide, including the United States. Many research studies applied various statistical methods to identify the significant risk factors of CVD. This study aims to determine the significant risk factors of CVD among the residents in the United States in 2021 by constructing the Binary Logistic Regression model and Probit Regression model. Additionally, the performance of the logit model and probit model are compared by using Deviance, AIC and BIC. The dataset is collected from the Behavioural Risk Factor Surveillance System (BRFSS). The top five mortality rates in CVD of the U.S. states are chosen to represent the U.S. population, with a total sample size of 24932 respondents. The findings reveal that the logit model and probit model produced the same results, which the significant risk factors of CVD are gender male, age group from 6 to 13 (Age 45 and above), current smokers, non-heavy drinkers, underweight, high blood pressure, high cholesterol, and diabetes. These risk factors showed an increased probability of developing CVD among the residents in the U.S. Lastly, this study indicated that the probit model performed better than the logit model as it provides a slightly lower value of Deviance, AIC and BIC compared to the logit model.