A connected network-regularized logistic regression model for feature selection

Image credit: Lingyu Li

摘要

Feature selection on a network structure can not only discover interesting variables but also mine out their intricate interactions. Regularization is often employed to ensure the sparsity and smoothness of the coefficients in logistic regression. However, currently available methods fail to embed the network connectivity in regularized penalty functions. In this paper, a connected network-regularized logistic regression (CNet-RLR) model for feature selection considering the structural connectivity in a network was proposed. Mathematically, it was a convex optimization problem constrained by inequalities reflecting network connectivity. Considering the non-differentiability of Lasso penalty, we constructed an equivalent formulation of CNet-RLR by employing auxiliary variables. An interior-point algorithm was designed to efficiently achieve the solutions. Theoretically, we proved their grouping effect and oracle property and guaranteed algorithmic convergence. In both synthetic simulation data and real-world uterine corpus endometrial carcinoma (UCEC) cancer genomics data, we validated the CNet-RLR model was efficient to identify the connected-network-structured features that can serve as diagnostic biomarkers. In the comparison study, we also proved the proposed CNet-RLR model results in better classification performance and feature interpretability than the other regularized logistic regression (RLR) alternatives and another graph embedded feature selection model.

出版物
In Applied Intelligence
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Create your slides in Markdown - click the Slides button to check out the example.

Supplementary notes can be added here, including code, data, math, and images.

李苓玉
李苓玉
博士后研究员

研究方向为生物信息学,包括并不限于:空间转录组学分析、稀疏统计学习和生物标志物识别。