데이터사이언스/Google Advanced Data Analytics

XGBoost 모델 하이퍼파라미터 튜닝 팁

누군가의 이야기 2023. 6. 4. 03:54
728x90

 

- 본 포스팅은 'Google Advanced Data Analytics Professional Certificate' 과정을 수강하며 요약/정리하기 위한 포스팅입니다.

 

 

Models

분류 모델: 

from xgboost import XGBClassifier

 

회귀 모델:

from xgboost import XGBRegressor

 

 

Evaluation metrics 

분류 모델: 

from sklearn.metrics import

accuracy_score (y_true, y_pred, *[, ...]) Accuracy classification score
average_precision_score(y_true, ...) Compute average precision (AP) from prediction scores
confusion_matrix(y_true, y_pred, *) Compute confusion matrix to evaluate the performance of the training of a model
f1_score(y_true, y_pred, *[, ...]) Compute the F1 score, also known as balanced F-score or F-measure
fbeta_score(y_true, y_pred, *, beta) Compute the F-beta score
metrics.log_loss(y_true, y_pred, *[, eps, ...]) Log loss, aka logistic loss or cross-entropy loss
multilabel_confusion_matrix(y_true, ...) Compute a confusion matrix for each class or sample
precision_recall_curve(y_true, ...) Compute precision-recall pairs for different probability thresholds
precision_score(y_true, y_pred, *[, ...]) Compute the precision
recall_score(y_true, y_pred, *[, ...]) Compute the recall
roc_auc_score(y_true, y_score, *[, ...]) Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores

 

회귀 모델:

from sklearn.metrics import 

mean_absolute_error(y_true, y_pred, *) Mean absolute error regression loss
mean_squared_error(y_true, y_pred, *) Mean squared error regression loss
mean_squared_log_error(y_true, y_pred, *) Mean squared logarithmic error regression loss
median_absolute_error(y_true, y_pred, *) Median absolute error regression loss
mean_absolute_percentage_error(...) Mean absolute percentage error (MAPE) regression loss
r2_score(y_true, y_pred, *[, ...]) R2 (coefficient of determination) regression score function

 

Hyperparameters

다음은 XGBoost 라이브러리로 구축된 그레디언트 부스팅 분류 모델에 가장 중요한 하이퍼파라미터이다.

 

아래 하이퍼파라미터들은 데이터 전문가들이 가장 직관적이기 때문에 가장 먼저 조절할 것을 고려한다.

 

n_estimators

Hyperparameter    |     What it does     |     Input type     |     Default Value
n_estimators Specifies the number of boosting rounds (i.e.,  모델이 앙상블에서 만들 트리 수) int 100

Considerations: 

A typical range is 50–500. Consider how much data you have, how deep the trees are allowed to grow, and how many samples are bootstrapped from the overall data to grow each tree (you generally need more trees if they’re shallow, and more trees if your bootstrap sample size represents just a small fraction of your overall data). For an extreme but illustrative example, if you have a dataset of 10,000, and each tree only bootstraps 20 samples, you'll need more trees than if you gave each tree 5,000 samples. Also keep in mind that, unlike random forest, which can grow base learners in parallel, gradient boosting grows base learners successively, so training can take longer for more trees.

 

max_depth

Hyperparameter    |     What it does     |     Input type     |     Default Value
max_depth Specifies how many levels your base learner trees can have. If None, 
trees grow until leaves are pure or until all leaves have less than min_child_weight.
int 3

Considerations: Controls complexity of the model. Gradient boosting typically uses weak learners, or “decision stumps” (i.e., shallow trees). Restricting tree depth can reduce training times and serving latency as well as prevent overfitting. Consider values 2–6.

 

min_child_weight

Hyperparameter    |     What it does     |     Input type     |     Default Value
min_child_weight Controls threshold below which a node becomes a leaf, based on the combined weight of the samples it contains.  
For regression models, this value is functionally equivalent to a number of samples. 
For the binary classification objective, the weight of a sample in a node is dependent on its probability of response as calculated by that tree. The weight of the sample decreases the more certain the model is (i.e., the closer the probability of response is to 0 or 1).
int or float 1

Considerations: Higher values will stop trees splitting further, and lower values will allow trees to continue to split further.

모형이 과소적합(underfitting)인 경우, 더 복잡해질 수 있도록 값을 낮추는 것이 좋다.

반대로 트리가 너무 미세하게 분할되는 것을 방지하려면 값을 높이면 된다.

 

learning_rate

Hyperparameter    |     What it does     |     Input type     |     Default Value
learning_rate Controls how much importance is given to each consecutive base learner in the ensemble’s final prediction. Also known as eta or shrinkage float 0.1

Considerations: Values can range from (0–1]. Typical values range from 0.01 to 0.3.

Lower values mean less weight is given to each consecutive base learner.

Consider how many trees are in your ensemble. Lower values typically benefit from more trees. 

 

colsample_bytree*

Hyperparameter    |     What it does     |     Input type     |     Default Value
colsample_bytree* Specifies the percentage (0–1.0] of features that each tree randomly selects during training  float  1.0

Considerations: Adds randomness to the model to make it robust to noise.

Consider how many features the dataset has and how many trees will be grown.

Fewer features sampled means more base learners might be needed.

Small colsample_bytree values on datasets with many features mean more unpredictive trees in the ensemble. 

 

subsample*

Hyperparameter    |     What it does     |     Input type     |     Default Value
subsample* Specifies the percentage (0–1.0] of observations sampled from the dataset to train each base model. float 1.0

Considerations: 델에 랜덤성을 추가하여 노이즈에 강건하게 한다.

Consider the size of your dataset.

When working with large datasets, it can be beneficial to limit the number of samples in each tree, because doing so can greatly reduce training time and yet still result in a robust model.

For example, 20% of 1 billion might be enough to capture patterns in the data, but if you only have 1,000 samples in your dataset then you’ll probably need to use them all. 

Remember that using fractions of the data to train each base learner can possibly improve model predictions and certainly speed up training times.

 

 

추가 정보

More detailed information about XGBoost can be found here:

728x90