python機(jī)器學(xué)習(xí)-乳腺癌細(xì)胞挖掘(四)

?python機(jī)器學(xué)習(xí)-sklearn挖掘乳腺癌細(xì)胞( 博主親自錄制):http://dwz.date/bwey

模型調(diào)參
調(diào)參是一門黑箱技術(shù),需要經(jīng)驗(yàn)豐富的機(jī)器學(xué)習(xí)工程師才能做到。幸運(yùn)的是sklearn有調(diào)參的包,入門級(jí)學(xué)者也可嘗試調(diào)參。
如果參數(shù)不多,可以手動(dòng)寫函數(shù)調(diào)參,如果參數(shù)太多可以用GridSearchCV調(diào)參,如果參數(shù)多的占用時(shí)間太長(zhǎng),可以用randomSizeCV調(diào)參,節(jié)約調(diào)參時(shí)間
GridSearchCV
如果參數(shù)太多可以用GridSearchCV調(diào)參

(1)單參數(shù)調(diào)參

(2)多參數(shù)調(diào)參
因?yàn)橛衝_neighbors和weights兩個(gè)參數(shù),因此誕生了60個(gè)結(jié)果
因?yàn)橛袃蓚€(gè)參數(shù),所以得到最佳模型:weight=distance,n_neighbor=12

?RandomSizeSearchCV
randomSizeCV調(diào)參類似于GridSearchCV的抽樣
如果參數(shù)多的占用時(shí)間太長(zhǎng),可以用randomSizeCV調(diào)參,節(jié)約調(diào)參時(shí)間。
randomSizeCV調(diào)參準(zhǔn)確率會(huì)略低于GridSearchCV,但可以節(jié)約大量時(shí)間。


randomSizeCV是隨機(jī)調(diào)參方法,精確度沒(méi)有g(shù)ridsearchcv高,但可以節(jié)約大量時(shí)間,其調(diào)參代碼如下:
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 16 19:54:25 2018
微信公眾號(hào):pythonEducation?
?
@author: 231469242@qq.com
"""
from
?sklearn.grid_search?
import
?RandomizedSearchCV
import
?matplotlib.pyplot as plt
#交叉驗(yàn)證
from
?sklearn.cross_validation?
import
?cross_val_score
from
?sklearn.datasets?
import
?load_breast_cancer
from
?sklearn.neighbors?
import
?KNeighborsClassifier
?
?
#導(dǎo)入數(shù)據(jù)
cancer
=
load_breast_cancer()
x
=
cancer.data
y
=
cancer.target
?
?
#調(diào)參knn的鄰近指數(shù)n
k_range
=
list
(
range
(
1
,
31
))
weight_options
=
[
'uniform'
,
'distance'
]
param_dist
=
dict
(n_neighbors
=
k_range,weights
=
weight_options)
?
?
knn
=
KNeighborsClassifier()
#n_iter為隨機(jī)生成個(gè)數(shù)
rand
=
RandomizedSearchCV(knn,param_dist,cv
=
10
,scoring
=
'accuracy'
,
????????????????????????
n_iter
=
10
,random_state
=
5
)
?
?
rand.fit(x,y)
rand.grid_scores_
print
(
'best score:'
,rand.best_score_)
print
(
'best params:'
,rand.best_params_)
模型調(diào)參知識(shí)為大家講解到這里,歡迎各位同學(xué)報(bào)名我的python機(jī)器學(xué)習(xí)生物信息學(xué)系列課,網(wǎng)址如下:http://dwz.date/b9vw
