"The train-test split is stratified to ensure that the train and test samples from each class are almost the same percentage. This may be desirable for imbalanced number of samples as in this case. \n",
"\n",
"In such imbalanced datasets, the stratified K fold cross validation is used instead of the K-fold cross validation"
"Edit: ***To save time, the parameters obtained from the grid search is directly hardcoded into section 3. Full code is xgb with hyperparameter tuning.ipynb"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
" xgb_clf in section 3 before tuning was previously: xgb_clf = XGBClassifier()\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# params = { 'max_depth': [3,6,10],\n",
"# 'learning_rate': [0.01, 0.05, 0.1],\n",
"# 'n_estimators': [100, 500, 1000],\n",
"# 'colsample_bytree': [0.3, 0.7]}\n",
"\n",
"# from sklearn.model_selection import GridSearchCV\n",
The train-test split is stratified to ensure that the train and test samples from each class are almost the same percentage. This may be desirable for imbalanced number of samples as in this case.
In such imbalanced datasets, the stratified K fold cross validation is used instead of the K-fold cross validation
Edit: ***To save time, the parameters obtained from the grid search is directly hardcoded into section 3. Full code is xgb with hyperparameter tuning.ipynb
%% Cell type:markdown id: tags:
xgb_clf in section 3 before tuning was previously: xgb_clf = XGBClassifier()
%% Cell type:code id: tags:
``` python
# params = { 'max_depth': [3,6,10],
# 'learning_rate': [0.01, 0.05, 0.1],
# 'n_estimators': [100, 500, 1000],
# 'colsample_bytree': [0.3, 0.7]}
# from sklearn.model_selection import GridSearchCV