隨機(jī)邏輯回歸random logistic regression-特征篩選

隨機(jī)邏輯回歸random logistic regression可用于特征篩選。
隨機(jī)Logistic回歸的工作原理是對(duì)訓(xùn)練數(shù)據(jù)進(jìn)行二次采樣并擬合L1懲罰LogisticRegression模型,在該模型中,對(duì)系數(shù)的隨機(jī)子集的損失進(jìn)行了縮放。 通過多次執(zhí)行此雙重隨機(jī)化,該方法將高分分配給在隨機(jī)化過程中反復(fù)選擇的特征。 這稱為穩(wěn)定性選擇。 簡而言之,經(jīng)常選擇的功能被認(rèn)為是良好的功能。

下面直接列出代碼
Scikit_Learn API :
sklearn.linear_model 廣義線性模型
sklearn.linear_model.LogisticRegression ? Logistic 回歸分類器
Methods:
score(X,?y[,?sample_weight]) Returns the mean accuracy on the given test data and labels
Parameters:
:x:array-like, Test samples; ? ? ? ?y: array-like, True labels for X.
sample_weight:可選項(xiàng),樣本權(quán)重
Returns:?
score: float,?Mean accuracy of self.predict(X) wrt. y 獲取各個(gè)特征的分?jǐn)?shù)
sklearn.linear_model.RandomizedLogisticRegression ?隨機(jī)邏輯回歸
官網(wǎng)對(duì)于隨機(jī)邏輯回歸的解釋:
Randomized Logistic Regression works by subsampling the training data and fitting a L1-penalized LogisticRegression model where the penalty of a random subset of coefficients has been scaled. By performing this double randomization several times, the method assigns high scores to features that are repeatedly selected across randomizations. This is known as stability selection. In short, features selected more often are considered good features.
解讀:對(duì)訓(xùn)練數(shù)據(jù)進(jìn)行多次采樣擬合回歸模型,即在不同的數(shù)據(jù)子集和特征子集上運(yùn)行特征算法,不斷重復(fù),最終選擇得分高的重要特征。這是穩(wěn)定性選擇方法。得分高的重要特征可能是由于被認(rèn)為是重要特征的頻率高(被選為重要特征的次數(shù)除以它所在的子集被測(cè)試的次數(shù))
