XGBoostについて（ハイパーパラメータ最適化）

XGboostのハイパーパラメータ調整方法についてまとめた。
XGBoostの概要、ハイパーパラメータについては前回の記事参照。

今回検討したパラメータ最適化方法

ランダムサーチ

パラメータの候補となる値をランダムに選出＆組み合わせたモデルを作り、その中から最適なものを選ぶ方法。

パラメータ候補が多くても現実的な時間内で検証可能だが、ランダムなので「ベスト」なモデルができるかどうかは確率的に決まる。

グリッドサーチ

与えられたパラメータ候補の値の全パターンでモデルを構築し、その中から最適なものを選ぶ方法。

パラメータ候補が多いと検証に非常に時間がかかる。

ベイズ最適化

最適なパラメータの値とその組み合わせを効率的に探していく最適化アルゴリズムの一種。

「ある時点までで検証したパラメータとそのモデルの性能の情報」を使って、「次に検証するパラメータとその時のモデルの性能を予測」（正確には、良い性能が得られる確率）を計算し、次のパラメータ候補を決定する。

（ステップ）グリッドサーチ

全てのパラメータの組み合わせを一度に検討せず、段階的に（まずはパラメータAを最適化→パラメータBを最適化 → パラメータCを...）パラメータを最適化していく方法。

パラメータが多い場合に計算コストが爆発的に大きくなるグリッドサーチのデメリットを改善した方法。

以下コード

解析について

解析条件

データ：住宅価格のデータセット（scikit learnのサンプルデータ）
テストデータの割合：0.5
k-fold（クロスバリデーションでの分割数：5
early_stopping_round（過剰適合を防ぐ手法）を適用する。
精度評価指標：RMSE（二乗平均平方根誤差）

解析流れ

データと訓練データ、テストデータに分割
訓練データで、ハイパーパラメータを最適化
テストデータでモデルの精度を確認

データの準備

import numpy as np
import xgboost as xgb
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt

# データ読み込み
boston = load_boston()
df_X = pd.DataFrame(boston.data, columns=boston.feature_names)
df_y = pd.Series(boston.target)
print('サンプル数：', df_X.shape[0])
#output>>> サンプル数150
print('特徴量の数：', df_X.shape[1])
#output>>> 特徴量の数:4

#訓練・テストデータに分割
X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, shuffle=True)

ハイパーパラメータ調整なしの場合

# xgboostモデルの作成
model = xgb.XGBRegressor(max_depth=3,n_jobs=-1)

#学習
model.fit(X_train, y_train)

# 学習モデルの評価（RMSEを計算）
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)

# 学習モデルの評価（RMSEを計算）
print('RMSE(train data):',round(np.sqrt(mean_squared_error(y_train, y_pred_train)),3))
print('RMSE(test data):',round(np.sqrt(mean_squared_error(y_test, y_pred_test)),3))
"""output
RMSE(train data): 0.737
RMSE(test data): 2.565
CPU times: user 261 ms, sys: 17 ms, total: 278 ms
Wall time: 114 ms
"""

ランダムサーチ

%%time

from sklearn.model_selection import RandomizedSearchCV
#探索空間（パラメータ候補）を定義する。（合計45万通りの組み合せ）
cv_params ={'max_depth':[3,4,5,6,7,8],
            'min_child_weight':[1,2,3,4,5],
            'gamma':[i/10.0 for i in range(0,6)],
            'subsample':[i/10.0 for i in range(6,11)],
            'colsample_bytree':[i/10.0 for i in range(6,11)],
            'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100],
            'n_estimators':[1000,2000],
            'reg_lambda':[1e-5, 1e-2, 0.1, 1],
            'learning_rate':[0.1,0.2,0.3]
            }

model = xgb.XGBRegressor(silent=False,n_jobs=-1)
model_rand = RandomizedSearchCV(model, cv_params, n_iter=200, cv=5, n_jobs=-1)
model_rand.fit(X_train,
y_train,
early_stopping_rounds=50,
eval_set=[(X_test, y_test)],
eval_metric='rmse',
verbose=0)
print('optimal_parameters\n', model_rand.best_params_)
"""output
optimal_parameters
 {'subsample': 0.8, 'reg_lambda': 0.01, 'reg_alpha': 0.01, 'n_estimators': 1000, 'min_child_weight': 1, 'max_depth': 5, 'learning_rate': 0.1, 'gamma': 0.3, 'colsample_bytree': 1.0}
"""

#パラメータが最適化されたモデルを作成
opt_model = model_rand.best_estimator_
y_pred_train = opt_model.predict(X_train)
y_pred_test = opt_model.predict(X_test)

# 学習モデルの評価（RMSEを計算）
print('RMSE(train data):',round(np.sqrt(mean_squared_error(y_train, y_pred_train)),3))
print('RMSE(test data):',round(np.sqrt(mean_squared_error(y_test, y_pred_test)),3))
"""output
RMSE(train data): 1.572
RMSE(test data): 2.66
CPU times: user 7.27 s, sys: 344 ms, total: 7.61 s
Wall time: 2min 52s
"""

グリッドサーチ

%%time

from sklearn.model_selection import GridSearchCV

#探索空間（パラメータ候補）を定義する。
#合計45万通りの組み合せすべて検証は多すぎるので間引いて192通りとした。
cv_params ={'max_depth':[3,5,7],
            'min_child_weight':[1,3],
            'gamma':[0, 0.2],
            'subsample':[0, 0.2],
            'colsample_bytree':[0.6, 0.7],
            'reg_alpha':[1e-5, 0.1],
            'n_estimators':[1000],
            'reg_lambda':[1e-2,0.1],
            'learning_rate':[0.1]
            }

model = xgb.XGBRegressor(silent=False,n_jobs=-1)
model_grid = GridSearchCV(model, cv_params, cv=5, n_jobs=-1)
model_grid.fit(X_train,
                y_train,
                early_stopping_rounds=50,
                eval_set=[(X_test, y_test)],
                eval_metric='rmse',
                verbose=0)
print('optimal_parameters\n', model_grid.best_params_)
"""output
optimal_parameters
 {'colsample_bytree': 0.6, 'gamma': 0.2, 'learning_rate': 0.1, 'max_depth': 5, 'min_child_weight': 3, 'n_estimators': 1000, 'reg_alpha': 1e-05, 'reg_lambda': 0.1, 'subsample': 0.2}
"""

#パラメータが最適化されたモデルを作成
opt_model = model_grid.best_estimator_
y_pred_train = opt_model.predict(X_train)
y_pred_test = opt_model.predict(X_test)

# 学習モデルの評価（RMSEを計算）
print('RMSE(train data):',round(np.sqrt(mean_squared_error(y_train, y_pred_train)),3))
print('RMSE(test data):',round(np.sqrt(mean_squared_error(y_test, y_pred_test)),3))
"""
RMSE(train data): 2.665
RMSE(test data): 2.509
CPU times: user 3.95 s, sys: 198 ms, total: 4.14 s
Wall time: 1min 11s
"""

ベイズ最適化

先にモジュールをインストールしておく

pip install baysian-optimization

%%time
from bayes_opt import BayesianOptimization
from sklearn.model_selection import cross_val_predict

# BaysianOptimizationで最適化する関数を定義する
def xgb_regressor(max_depth, min_child_weight, gamma, subsample, colsample_bytree,reg_alpha, n_estimators, reg_lambda,learning_rate):

    params = {'max_depth':int(max_depth),
                'min_child_weight':int(min_child_weight),
                'gamma':gamma,
                'subsample':subsample,
                'colsample_bytree':colsample_bytree,
                'reg_alpha':reg_alpha,
                'n_estimators':int(n_estimators),
                'reg_lambda':reg_lambda,
                'learning_rate':learning_rate
                }
    model = xgb.XGBRegressor(**params,
                            early_stopping_rounds=50,
                            eval_set=[(X_test, y_test)],
                            eval_metric='rmse',
                            silent=False,
                            n_jobs=-1
                            )

    y_pred_cv = cross_val_predict(model,X_train,y_train,cv=5, n_jobs=-1)
    rmse_cv = np.sqrt(mean_squared_error(y_train, y_pred_cv))

    return -rmse_cv

#ベイズ最適化で探索するパラメータ空間を定義する
xgb_bo = BayesianOptimization(xgb_regressor,
                            {'max_depth':(3,8),
                            'min_child_weight':(1,5),
                            'gamma':(0,0.5),
                            'subsample':(0.6,1),
                            'colsample_bytree':(0.6,1),
                            'reg_alpha':(1e-5,100),
                            'n_estimators':(1000,2000),
                            'reg_lambda':(1e-5,1),
                            'learning_rate':(0.1,0.3)
                            })

#ベイズ最適化を実行（scoreが最大となるようにパラメータを探索していく）
#init_point：初期に探索する点数
#acq:獲得関数。EIは(expected improvement)
xgb_bo.maximize(init_points=5, n_iter=200, acq='ei')

#最もスコアのよかったパラメータの値を取得する。
optimized_params = xgb_bo.max['params']

#整数のパラメータは変換
optimized_params['max_depth'] = int(optimized_params['max_depth'])
optimized_params['min_child_weight'] = int(optimized_params['min_child_weight'])
optimized_params['n_estimators'] = int(optimized_params['n_estimators'])

#調整したパラメータで精度検証する
opt_model = xgb.XGBRegressor()
opt_model.set_params(**optimized_params)
opt_model.fit(X_train, y_train)
y_pred_train = opt_model.predict(X_train)
y_pred_test = opt_model.predict(X_test)

# 学習モデルの評価（RMSEを計算）
print('RMSE(train data):',round(np.sqrt(mean_squared_error(y_train, y_pred_train)),3))
print('RMSE(test data):',round(np.sqrt(mean_squared_error(y_test, y_pred_test)),3))
"""output
RMSE(train data): 0.426
RMSE(test data): 2.266
CPU times: user 32.1 s, sys: 4.64 s, total: 36.7 s
Wall time: 11min 42s
"""

（ステップ）グリッドサーチ

XGBoostのパラメータチューニング実践 with Python（かものはしの分析ブログ）を参考にさせていただきました。

%%time

#長いので関数作成して検証する。
def StepSearchCV_xgb(X_train,y_train,X_test,y_test,cv_params,fold_number=5):

    #max_depthとmin_child_weightの最適化
    xgb_in_cv = GridSearchCV(xgb.XGBRegressor(silent=True, n_jobs=-1),
                            {'max_depth': cv_params['max_depth'],
                            'min_child_weight':cv_params['min_child_weight']},
                            cv=fold_number,
                            n_jobs=-1)

    xgb_in_cv.fit(X_train, y_train,early_stopping_rounds=50,
                eval_set=[(X_test, y_test)],
                eval_metric='rmse',
                verbose=0)

    optimal_max_depth = xgb_in_cv.best_params_['max_depth']
    optimal_min_child_weight = xgb_in_cv.best_params_['min_child_weight']

    #gammaの最適化
    xgb_in_cv = GridSearchCV(xgb.XGBRegressor(max_depth=optimal_max_depth,
                            min_child_weight=optimal_min_child_weight,
                            silent=True, n_jobs=-1),
                            {'gamma': cv_params['gamma']},
                            cv=fold_number,
                            n_jobs=-1)

    xgb_in_cv.fit(X_train, y_train,
                    early_stopping_rounds=50,
                    eval_set=[(X_test, y_test)],
                    eval_metric='rmse',
                    verbose=0)

    optimal_gamma = xgb_in_cv.best_params_['gamma']

    #subsampleとcolsample_bytreeの最適化
    xgb_in_cv = GridSearchCV(xgb.XGBRegressor(max_depth=optimal_max_depth,
                            min_child_weight=optimal_min_child_weight,
                            gamma=optimal_gamma,
                            silent=True, n_jobs=-1),
                            {'subsample': cv_params['subsample'],
                            'colsample_bytree': cv_params['colsample_bytree']},
                            cv=fold_number,
                            n_jobs = -1)

    xgb_in_cv.fit(X_train, y_train,
                    early_stopping_rounds=50,
                    eval_set=[(X_test, y_test)],
                    eval_metric='rmse',
                    verbose=0)

    optimal_subsample = xgb_in_cv.best_params_['subsample']
    optimal_colsample_bytree = xgb_in_cv.best_params_['colsample_bytree']

    #reg_alphaとreg_lambdaの最適化
    xgb_in_cv = GridSearchCV(xgb.XGBRegressor(max_depth=optimal_max_depth,
                            min_child_weight=optimal_min_child_weight,
                            gamma=optimal_gamma,
                            subsample=optimal_subsample,
                            colsample_bytree=optimal_colsample_bytree,
                            silent=True, n_jobs=-1),
                            {'reg_alpha': cv_params['reg_alpha'],
                            'reg_lambda':cv_params['reg_lambda']},
                            cv=fold_number,
                            n_jobs = -1)

    xgb_in_cv.fit(X_train, y_train,
                    early_stopping_rounds=50,
                    eval_set=[(X_test, y_test)],
                    eval_metric='rmse',
                    verbose=0)

    optimal_reg_alpha = xgb_in_cv.best_params_['reg_alpha']
    optimal_reg_lambda = xgb_in_cv.best_params_['reg_lambda']

    #lerning_rate と n_estimatorsの最適化
    xgb_in_cv = GridSearchCV(xgb.XGBRegressor(max_depth=optimal_max_depth,
                            min_child_weight=optimal_min_child_weight,
                            gamma=optimal_gamma,
                            subsample=optimal_subsample,
                            colsample_bytree=optimal_colsample_bytree,
                            reg_alpha=optimal_reg_alpha,
                            reg_lambda=optimal_reg_lambda,
                            silent=True, n_jobs=-1),
                            {'learning_rate': cv_params['learning_rate'],
                            'n_estimators':cv_params['n_estimators']},
                            cv=fold_number,
                            n_jobs = -1)

    xgb_in_cv.fit(X_train, y_train,
                    early_stopping_rounds=50,
                    eval_set=[(X_test, y_test)],
                    eval_metric='rmse',
                    verbose=0)

    optimal_learning_rate = xgb_in_cv.best_params_['learning_rate']
    optimal_n_estimators = xgb_in_cv.best_params_['n_estimators']
    optimal_model = xgb.XGBRegressor(max_depth=optimal_max_depth,
                                    min_child_weight=optimal_min_child_weight,
                                    gamma=optimal_gamma,
                                    subsample=optimal_subsample,
                                    colsample_bytree=optimal_colsample_bytree,
                                    reg_alpha=optimal_reg_alpha,
                                    n_learning_rate=optimal_learning_rate,
                                    n_estimators=optimal_n_estimators,
                                    silent=True,
                                    n_jobs=-1)

    return optimal_model

cv_params ={'max_depth':[3,4,5,6,7,8],
            'min_child_weight':[1,2,3,4,5],
            'gamma':[i/10.0 for i in range(0,6)],
            'subsample':[i/10.0 for i in range(6,11)],
            'colsample_bytree':[i/10.0 for i in range(6,11)],
            'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100],
            'n_estimators':[1000,2000],
            'reg_lambda':[1e-5, 1e-2, 0.1, 1],
            'learning_rate':[0.1,0.2,0.3]
            }

opt_model = StepSearchCV_xgb(X_train,y_train,X_test,y_test, cv_params)
opt_model.fit(X_train,
                y_train,
                early_stopping_rounds=50,
                eval_set=[(X_test, y_test)],
                eval_metric='rmse',
                verbose=0)
y_pred_train = opt_model.predict(X_train)
y_pred_test = opt_model.predict(X_test)

# 学習モデルの評価（RMSEを計算）
print('RMSE(train data):',round(np.sqrt(mean_squared_error(y_train, y_pred_train)),3))
print('RMSE(test data):',round(np.sqrt(mean_squared_error(y_test, y_pred_test)),3))
"""output
RMSE(train data): 0.173
RMSE(test data): 2.625
CPU times: user 5.66 s, sys: 477 ms, total: 6.14 s
Wall time: 53.3 s
"""

参考）検証結果まとめ

手法	条件	時間	RMSE
調整なし		0.114 s	train:0.737 test:2.565
ランダムサーチ	200 iteration	2min 52s	train:1.572 test:2.666
グリッドサーチ	192通りを検証	1min 11s	train:2.665 test:2.509
ベイズ最適化	200 iteration	11min 42s	train:0.426 test:2.266
（ステップ）グリッドサーチ		53.3 s	train:0.173 test:2.625

ベイズ最適化を使った場合は、他の方法に比べてテストデータに対する予測精度が低くなっているが、その分計算に時間がかかっている。
その他の方法では、検証方法で精度に大きな違いはみられない（簡単なデータだったため？）