Identifying Genes Related to Diabetes Mellitus Using Penalized Logistic Regression
Keywords:
Diabetes Mellitus, gene expression, penalized logistic regression, LassoAbstract
Identification of genes associated with Diabetes Mellitus is important for early detection of this disease. This study tried to find some potential genes related to T2DM. The dataset used was GSE25462 and the method used was penalized logistic regression, specifically Lasso. The top eight selected genes were ABRA, EVX1, MIR7-3HG, SAYSD1, SLC26A1, SRGAP3, WFDC1, and 240244_at. The training data reaches the accuracy and kappa of 1 for the model with 8 genes. But, when the model is used for testing data the maximum accuracy is 0.9 and the maximum kappa is 0.615, obtained in models with 14 genes. This happened because the dataset lacked samples of the positive class. The use of ensemble learning methods is recommended to combine predictive results. The role of some genes we found in T2DM remains unclear. Biology researchers can further study the role of these genes in T2DM.