Skip to content
Snippets Groups Projects
Commit f1076710 authored by yyl1c20's avatar yyl1c20
Browse files

Upload New File

parent 570c156c
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
<h1>1. Loading Datasets</h1>
%% Cell type:code id: tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
mTrain = pd.read_csv("TrainingDataMulti.csv")
mTest = pd.read_csv("TestingDataMulti.csv")
print("\n[ TrainingDataMulti.csv info ]")
mTrain.info()
print("\n[ TestingDataMulti.csv info ]")
mTest.info()
```
%% Output
[ TrainingDataMulti.csv info ]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6000 entries, 0 to 5999
Columns: 129 entries, R1-PA1:VH to marker
dtypes: float64(112), int64(17)
memory usage: 5.9 MB
[ TestingDataMulti.csv info ]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Columns: 128 entries, R1-PA1:VH to snort_log4
dtypes: float64(104), int64(24)
memory usage: 100.1 KB
%% Cell type:markdown id: tags:
<h1>1.1 Analysing the Data</h1>
%% Cell type:code id: tags:
``` python
mTrain.dtypes
```
%% Output
R1-PA1:VH float64
R1-PM1:V float64
R1-PA2:VH float64
R1-PM2:V float64
R1-PA3:VH float64
...
snort_log1 int64
snort_log2 int64
snort_log3 int64
snort_log4 int64
marker int64
Length: 129, dtype: object
%% Cell type:code id: tags:
``` python
mTrain['marker'].value_counts()
```
%% Output
marker
0 3000
2 1500
1 1500
Name: count, dtype: int64
%% Cell type:code id: tags:
``` python
mTrain
```
%% Output
R1-PA1:VH R1-PM1:V R1-PA2:VH R1-PM2:V R1-PA3:VH
0 70.399324 127673.0908 -49.572308 127648.0176 -169.578319 \
1 73.688102 130280.7109 -46.300719 130255.6377 -166.278082
2 73.733939 130305.7842 -46.254883 130280.7109 -166.232245
3 74.083443 130581.5902 -45.899649 130556.5169 -165.882741
4 74.553268 131083.0556 -45.424094 131057.9823 -165.424375
... ... ... ... ... ...
5995 116.889120 131860.3269 -3.076783 131810.1804 -123.094253
5996 116.849013 131810.1804 -3.116890 131760.0339 -123.128630
5997 116.384917 131734.9606 -3.586716 131684.8140 -123.586996
5998 111.125164 130506.3704 -8.846468 130456.2238 -128.858208
5999 110.878793 130481.2971 -9.092840 130456.2238 -129.104580
R1-PM3:V R1-PA4:IH R1-PM4:I R1-PA5:IH R1-PM5:I ...
0 127723.2374 65.689611 605.91099 -57.003571 626.78553 ... \
1 130355.9307 71.831719 483.59351 -50.947407 500.98896 ...
2 130381.0040 71.808800 483.59351 -50.913030 500.98896 ...
3 130656.8100 72.152575 482.86107 -50.437475 499.15786 ...
4 131158.2754 72.118198 484.50906 -50.013486 497.69298 ...
... ... ... ... ... ... ...
5995 131910.4735 114.780635 376.10794 -5.254023 374.82617 ...
5996 131885.4002 114.769176 376.29105 -5.322778 374.82617 ...
5997 131785.1071 114.299351 376.47416 -5.849899 374.82617 ...
5998 130556.5169 106.667553 478.83265 -13.464508 477.73399 ...
5999 130556.5169 106.392533 478.83265 -13.750987 477.91710 ...
control_panel_log4 relay1_log relay2_log relay3_log relay4_log
0 0 0 0 0 0 \
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
... ... ... ... ... ...
5995 0 0 0 0 0
5996 0 0 0 0 0
5997 0 0 0 0 0
5998 0 0 0 0 0
5999 0 0 0 0 0
snort_log1 snort_log2 snort_log3 snort_log4 marker
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
... ... ... ... ... ...
5995 0 0 0 0 0
5996 0 0 0 0 0
5997 0 0 0 0 0
5998 0 0 0 0 0
5999 0 0 0 0 0
[6000 rows x 129 columns]
%% Cell type:code id: tags:
``` python
mTrain.isnull().sum()
mTrain = mTrain.dropna()
mTrain
```
%% Output
R1-PA1:VH R1-PM1:V R1-PA2:VH R1-PM2:V R1-PA3:VH
0 70.399324 127673.0908 -49.572308 127648.0176 -169.578319 \
1 73.688102 130280.7109 -46.300719 130255.6377 -166.278082
2 73.733939 130305.7842 -46.254883 130280.7109 -166.232245
3 74.083443 130581.5902 -45.899649 130556.5169 -165.882741
4 74.553268 131083.0556 -45.424094 131057.9823 -165.424375
... ... ... ... ... ...
5995 116.889120 131860.3269 -3.076783 131810.1804 -123.094253
5996 116.849013 131810.1804 -3.116890 131760.0339 -123.128630
5997 116.384917 131734.9606 -3.586716 131684.8140 -123.586996
5998 111.125164 130506.3704 -8.846468 130456.2238 -128.858208
5999 110.878793 130481.2971 -9.092840 130456.2238 -129.104580
R1-PM3:V R1-PA4:IH R1-PM4:I R1-PA5:IH R1-PM5:I ...
0 127723.2374 65.689611 605.91099 -57.003571 626.78553 ... \
1 130355.9307 71.831719 483.59351 -50.947407 500.98896 ...
2 130381.0040 71.808800 483.59351 -50.913030 500.98896 ...
3 130656.8100 72.152575 482.86107 -50.437475 499.15786 ...
4 131158.2754 72.118198 484.50906 -50.013486 497.69298 ...
... ... ... ... ... ... ...
5995 131910.4735 114.780635 376.10794 -5.254023 374.82617 ...
5996 131885.4002 114.769176 376.29105 -5.322778 374.82617 ...
5997 131785.1071 114.299351 376.47416 -5.849899 374.82617 ...
5998 130556.5169 106.667553 478.83265 -13.464508 477.73399 ...
5999 130556.5169 106.392533 478.83265 -13.750987 477.91710 ...
control_panel_log4 relay1_log relay2_log relay3_log relay4_log
0 0 0 0 0 0 \
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
... ... ... ... ... ...
5995 0 0 0 0 0
5996 0 0 0 0 0
5997 0 0 0 0 0
5998 0 0 0 0 0
5999 0 0 0 0 0
snort_log1 snort_log2 snort_log3 snort_log4 marker
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
... ... ... ... ... ...
5995 0 0 0 0 0
5996 0 0 0 0 0
5997 0 0 0 0 0
5998 0 0 0 0 0
5999 0 0 0 0 0
[6000 rows x 129 columns]
%% Cell type:code id: tags:
``` python
X = mTrain.drop(columns = 'marker')
X
```
%% Output
R1-PA1:VH R1-PM1:V R1-PA2:VH R1-PM2:V R1-PA3:VH
0 70.399324 127673.0908 -49.572308 127648.0176 -169.578319 \
1 73.688102 130280.7109 -46.300719 130255.6377 -166.278082
2 73.733939 130305.7842 -46.254883 130280.7109 -166.232245
3 74.083443 130581.5902 -45.899649 130556.5169 -165.882741
4 74.553268 131083.0556 -45.424094 131057.9823 -165.424375
... ... ... ... ... ...
5995 116.889120 131860.3269 -3.076783 131810.1804 -123.094253
5996 116.849013 131810.1804 -3.116890 131760.0339 -123.128630
5997 116.384917 131734.9606 -3.586716 131684.8140 -123.586996
5998 111.125164 130506.3704 -8.846468 130456.2238 -128.858208
5999 110.878793 130481.2971 -9.092840 130456.2238 -129.104580
R1-PM3:V R1-PA4:IH R1-PM4:I R1-PA5:IH R1-PM5:I ...
0 127723.2374 65.689611 605.91099 -57.003571 626.78553 ... \
1 130355.9307 71.831719 483.59351 -50.947407 500.98896 ...
2 130381.0040 71.808800 483.59351 -50.913030 500.98896 ...
3 130656.8100 72.152575 482.86107 -50.437475 499.15786 ...
4 131158.2754 72.118198 484.50906 -50.013486 497.69298 ...
... ... ... ... ... ... ...
5995 131910.4735 114.780635 376.10794 -5.254023 374.82617 ...
5996 131885.4002 114.769176 376.29105 -5.322778 374.82617 ...
5997 131785.1071 114.299351 376.47416 -5.849899 374.82617 ...
5998 130556.5169 106.667553 478.83265 -13.464508 477.73399 ...
5999 130556.5169 106.392533 478.83265 -13.750987 477.91710 ...
control_panel_log3 control_panel_log4 relay1_log relay2_log
0 0 0 0 0 \
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
... ... ... ... ...
5995 0 0 0 0
5996 0 0 0 0
5997 0 0 0 0
5998 0 0 0 0
5999 0 0 0 0
relay3_log relay4_log snort_log1 snort_log2 snort_log3 snort_log4
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
... ... ... ... ... ... ...
5995 0 0 0 0 0 0
5996 0 0 0 0 0 0
5997 0 0 0 0 0 0
5998 0 0 0 0 0 0
5999 0 0 0 0 0 0
[6000 rows x 128 columns]
%% Cell type:code id: tags:
``` python
y = mTrain['marker']
```
%% Cell type:markdown id: tags:
Stratified Train-Test Split
The train-test split is stratified to ensure that the train and test samples from each class are almost the same percentage. This may be desirable for imbalanced number of samples as in this case.
In such imbalanced datasets, the stratified K fold cross validation is used instead of the K-fold cross validation
%% Cell type:code id: tags:
``` python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=1, test_size=0.15, stratify=y)
```
%% Cell type:code id: tags:
``` python
y_train.value_counts()
```
%% Output
marker
0 2550
2 1275
1 1275
Name: count, dtype: int64
%% Cell type:code id: tags:
``` python
y_test.value_counts()
```
%% Output
marker
0 450
2 225
1 225
Name: count, dtype: int64
%% Cell type:markdown id: tags:
%% Cell type:markdown id: tags:
<h1>3. Choosing a Model: XGBoost , training, and evaluation</h1>
%% Cell type:code id: tags:
``` python
from xgboost import XGBClassifier
xgb_clf = XGBClassifier()
xgb_clf.fit(X_train, y_train)
score = xgb_clf.score(X_test, y_test)
print(score)
```
%% Output
0.96
%% Cell type:markdown id: tags:
<h1>4. Improving</h1>
%% Cell type:markdown id: tags:
A grid search will be performed to find the optimal hyperparameters. Each point in the grid uses the K fold cross validation
%% Cell type:markdown id: tags:
<h1> 4.1 Evaluation before tuning</h1>
%% Cell type:code id: tags:
``` python
my_pred = xgb_clf.predict(X_test)
my_pred
```
%% Output
array([2, 0, 0, 0, 2, 2, 1, 1, 0, 0, 0, 0, 1, 2, 1, 0, 0, 2, 1, 0, 1, 0,
0, 0, 0, 2, 0, 2, 0, 2, 1, 1, 1, 1, 1, 1, 0, 2, 1, 0, 1, 0, 1, 0,
0, 0, 0, 1, 0, 2, 0, 2, 2, 0, 1, 0, 0, 2, 0, 2, 0, 0, 2, 0, 0, 1,
0, 2, 0, 1, 0, 0, 2, 0, 0, 2, 1, 1, 1, 2, 2, 2, 0, 0, 1, 1, 0, 1,
1, 0, 2, 0, 0, 0, 0, 2, 2, 2, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0,
2, 1, 1, 2, 0, 0, 2, 1, 0, 1, 2, 0, 0, 2, 1, 0, 0, 2, 2, 2, 1, 1,
1, 1, 2, 2, 0, 0, 1, 1, 2, 1, 0, 2, 1, 2, 0, 0, 0, 1, 0, 2, 1, 0,
0, 0, 1, 2, 1, 1, 0, 0, 1, 0, 0, 2, 1, 1, 0, 0, 2, 0, 1, 0, 0, 0,
0, 0, 2, 0, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 2, 0,
0, 2, 1, 2, 0, 0, 1, 2, 0, 2, 1, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 1,
0, 1, 1, 0, 0, 2, 0, 0, 0, 1, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 1,
0, 1, 1, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 0, 2, 0,
0, 0, 0, 0, 1, 0, 2, 2, 0, 0, 1, 1, 0, 2, 0, 0, 0, 1, 0, 0, 0, 2,
2, 2, 2, 1, 1, 1, 0, 2, 2, 0, 0, 2, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 2, 2, 0, 2, 0, 1, 0, 1, 1, 0, 2, 1,
0, 1, 1, 2, 0, 1, 0, 2, 2, 0, 0, 1, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 2, 1, 1, 1, 1, 0,
0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 2, 2, 0, 1, 0, 0, 1, 2, 0, 0, 0, 2,
0, 1, 2, 0, 2, 0, 0, 0, 1, 1, 2, 0, 2, 0, 2, 1, 0, 2, 0, 0, 2, 0,
2, 0, 2, 1, 2, 0, 0, 2, 1, 0, 0, 2, 1, 0, 0, 0, 1, 2, 1, 0, 2, 0,
0, 2, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 2, 0, 0, 0, 2, 2, 0, 2, 0, 1,
2, 1, 0, 0, 0, 1, 0, 2, 2, 1, 0, 0, 2, 0, 0, 0, 0, 2, 0, 2, 0, 2,
1, 1, 0, 0, 0, 0, 0, 2, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 0, 2, 2, 0,
2, 0, 1, 0, 1, 0, 0, 1, 2, 1, 2, 0, 0, 0, 0, 2, 2, 0, 2, 0, 1, 0,
0, 1, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 1, 1,
1, 2, 0, 0, 2, 0, 0, 1, 2, 1, 0, 0, 1, 0, 2, 2, 1, 0, 0, 1, 0, 2,
1, 0, 0, 2, 0, 2, 0, 1, 1, 1, 0, 0, 2, 2, 1, 0, 0, 1, 0, 0, 1, 2,
0, 2, 0, 2, 0, 2, 1, 0, 0, 0, 0, 2, 1, 1, 2, 0, 2, 1, 0, 0, 0, 1,
0, 0, 2, 1, 0, 0, 2, 2, 1, 1, 2, 1, 1, 0, 0, 1, 0, 0, 2, 1, 2, 0,
2, 1, 0, 0, 1, 2, 1, 0, 0, 0, 0, 1, 2, 1, 1, 1, 0, 2, 2, 2, 2, 0,
1, 2, 2, 2, 0, 0, 2, 0, 0, 2, 1, 2, 2, 1, 1, 1, 0, 0, 1, 2, 2, 0,
1, 0, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 2, 2, 2, 0, 1, 0, 0, 0, 2,
1, 2, 0, 2, 0, 0, 2, 0, 0, 0, 1, 0, 0, 1, 0, 2, 0, 2, 0, 0, 0, 2,
0, 1, 0, 0, 1, 2, 0, 0, 2, 1, 0, 1, 1, 0, 0, 1, 2, 2, 2, 0, 1, 0,
0, 2, 0, 2, 0, 0, 1, 1, 1, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0,
0, 2, 0, 0, 2, 0, 2, 1, 1, 2, 0, 1, 0, 2, 2, 0, 0, 1, 0, 2, 0, 2,
0, 2, 2, 0, 0, 0, 2, 0, 0, 2, 0, 1, 1, 0, 2, 0, 0, 0, 0, 1, 0, 0,
2, 0, 2, 1, 0, 0, 1, 2, 0, 0, 1, 0, 1, 0, 2, 2, 2, 2, 2, 0, 1, 1,
1, 0, 0, 0, 2, 1, 2, 0, 0, 0, 0, 1, 2, 0, 1, 1, 0, 1, 1, 0, 0, 1,
0, 0, 1, 0, 0, 2, 2, 0, 0, 0, 0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0,
0, 1, 2, 2, 0, 1, 2, 0, 0, 0, 1, 0, 2, 0, 1, 1, 1, 1, 1, 0],
dtype=int64)
%% Cell type:code id: tags:
``` python
score = xgb_clf.score(X_test, y_test)
score
```
%% Output
0.96
%% Cell type:markdown id: tags:
Confusion Matrix
%% Cell type:code id: tags:
``` python
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import ConfusionMatrixDisplay
confusion_matrix(y_test, my_pred)
```
%% Output
array([[450, 0, 0],
[ 0, 212, 13],
[ 13, 10, 202]], dtype=int64)
%% Cell type:code id: tags:
``` python
print(classification_report(y_test, my_pred))
```
%% Output
precision recall f1-score support
0 0.97 1.00 0.99 450
1 0.95 0.94 0.95 225
2 0.94 0.90 0.92 225
accuracy 0.96 900
macro avg 0.96 0.95 0.95 900
weighted avg 0.96 0.96 0.96 900
%% Cell type:code id: tags:
``` python
cm =confusion_matrix(y_test, my_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,)
disp.plot()
plt.show()
```
%% Output
%% Cell type:markdown id: tags:
<h1> 4.1 Grid Search</h1>
%% Cell type:code id: tags:
``` python
params = { 'max_depth': [3,6,10],
'learning_rate': [0.01, 0.05, 0.1],
'n_estimators': [100, 500, 1000],
'colsample_bytree': [0.3, 0.7]}
from sklearn.model_selection import GridSearchCV
im_xgb = GridSearchCV(estimator=xgb_clf,
param_grid=params,
scoring='neg_mean_squared_error',
verbose=1)
im_xgb.fit(X_train, y_train)
print("Best parameters:", im_xgb.best_params_)
print("Lowest RMSE: ", (-im_xgb.best_score_)**(1/2.0))
```
%% Output
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Best parameters: {'colsample_bytree': 0.3, 'learning_rate': 0.05, 'max_depth': 10, 'n_estimators': 1000}
Lowest RMSE: 0.23763541031440186
%% Cell type:markdown id: tags:
Evaluation after optimal hyperparameter is found
%% Cell type:markdown id: tags:
Confusion Matrix followed by cross validation
%% Cell type:code id: tags:
``` python
pred = im_xgb.predict(X_test)
labels = ['normal', 'data injection attack', 'command injection attack' ]
print(classification_report(y_test, pred, target_names = labels))
```
%% Output
precision recall f1-score support
normal 0.98 1.00 0.99 450
data injection attack 0.96 0.96 0.96 225
command injection attack 0.96 0.93 0.95 225
accuracy 0.97 900
macro avg 0.97 0.96 0.97 900
weighted avg 0.97 0.97 0.97 900
%% Cell type:code id: tags:
``` python
from sklearn.metrics import accuracy_score
print('XGBoost model accuracy score: {0:0.4f}'. format(accuracy_score(y_test, pred)))
```
%% Output
XGBoost model accuracy score: 0.9722
%% Cell type:code id: tags:
``` python
cma =confusion_matrix(y_test, pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cma,)
disp.plot()
plt.show()
```
%% Output
%% Cell type:code id: tags:
``` python
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
cvScore = cross_val_score(im_xgb, X, y, cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=1), scoring='f1_macro')
print (cvScore)
print (" StratifiedKFold Cross-Validation Accuracy: %0.2f%% | Standard Deviation: %0.2f%%" % (100*cvScore.mean(), 100*cvScore.std()))
```
%% Output
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 54 candidates, totalling 270 fits
Fitting 5 folds for each of 54 candidates, totalling 270 fits
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[41], line 5
1 from sklearn.model_selection import cross_val_score
2 from sklearn.model_selection import StratifiedKFold
----> 5 cvScore = cross_val_score(im_xgb, X, y, cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=1), scoring='f1_macro')
6 print (cvScore)
7 print (" StratifiedKFold Cross-Validation Accuracy: %0.2f%% | Standard Deviation: %0.2f%%" % (100*cvScore.mean(), 100*cvScore.std()))
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_validation.py:515, in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
512 # To ensure multimetric format is not supported
513 scorer = check_scoring(estimator, scoring=scoring)
--> 515 cv_results = cross_validate(
516 estimator=estimator,
517 X=X,
518 y=y,
519 groups=groups,
520 scoring={"score": scorer},
521 cv=cv,
522 n_jobs=n_jobs,
523 verbose=verbose,
524 fit_params=fit_params,
525 pre_dispatch=pre_dispatch,
526 error_score=error_score,
527 )
528 return cv_results["test_score"]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_validation.py:266, in cross_validate(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, return_train_score, return_estimator, error_score)
263 # We clone the estimator to make sure that all the folds are
264 # independent, and that it is pickle-able.
265 parallel = Parallel(n_jobs=n_jobs, verbose=verbose, pre_dispatch=pre_dispatch)
--> 266 results = parallel(
267 delayed(_fit_and_score)(
268 clone(estimator),
269 X,
270 y,
271 scorers,
272 train,
273 test,
274 verbose,
275 None,
276 fit_params,
277 return_train_score=return_train_score,
278 return_times=True,
279 return_estimator=return_estimator,
280 error_score=error_score,
281 )
282 for train, test in cv.split(X, y, groups)
283 )
285 _warn_or_raise_about_fit_failures(results, error_score)
287 # For callabe scoring, the return type is only know after calling. If the
288 # return type is a dictionary, the error scores can now be inserted with
289 # the correct key.
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\utils\parallel.py:63, in Parallel.__call__(self, iterable)
58 config = get_config()
59 iterable_with_config = (
60 (_with_config(delayed_func, config), args, kwargs)
61 for delayed_func, args, kwargs in iterable
62 )
---> 63 return super().__call__(iterable_with_config)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:1088, in Parallel.__call__(self, iterable)
1085 if self.dispatch_one_batch(iterator):
1086 self._iterating = self._original_iterator is not None
-> 1088 while self.dispatch_one_batch(iterator):
1089 pass
1091 if pre_dispatch == "all" or n_jobs == 1:
1092 # The iterable was consumed all at once by the above for loop.
1093 # No need to wait for async callbacks to trigger to
1094 # consumption.
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:901, in Parallel.dispatch_one_batch(self, iterator)
899 return False
900 else:
--> 901 self._dispatch(tasks)
902 return True
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:819, in Parallel._dispatch(self, batch)
817 with self._lock:
818 job_idx = len(self._jobs)
--> 819 job = self._backend.apply_async(batch, callback=cb)
820 # A job can complete so quickly than its callback is
821 # called before we get here, causing self._jobs to
822 # grow. To ensure correct results ordering, .insert is
823 # used (rather than .append) in the following line
824 self._jobs.insert(job_idx, job)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\_parallel_backends.py:597, in ImmediateResult.__init__(self, batch)
594 def __init__(self, batch):
595 # Don't delay the application, to avoid keeping the input
596 # arguments in memory
--> 597 self.results = batch()
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:288, in BatchedCalls.__call__(self)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:288, in <listcomp>(.0)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\utils\parallel.py:123, in _FuncWrapper.__call__(self, *args, **kwargs)
121 config = {}
122 with config_context(**config):
--> 123 return self.function(*args, **kwargs)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_validation.py:686, in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
684 estimator.fit(X_train, **fit_params)
685 else:
--> 686 estimator.fit(X_train, y_train, **fit_params)
688 except Exception:
689 # Note fit time as time until error
690 fit_time = time.time() - start_time
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_search.py:874, in BaseSearchCV.fit(self, X, y, groups, **fit_params)
868 results = self._format_results(
869 all_candidate_params, n_splits, all_out, all_more_results
870 )
872 return results
--> 874 self._run_search(evaluate_candidates)
876 # multimetric is determined here because in the case of a callable
877 # self.scoring the return type is only known after calling
878 first_test_score = all_out[0]["test_scores"]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_search.py:1388, in GridSearchCV._run_search(self, evaluate_candidates)
1386 def _run_search(self, evaluate_candidates):
1387 """Search all candidates in param_grid"""
-> 1388 evaluate_candidates(ParameterGrid(self.param_grid))
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_search.py:821, in BaseSearchCV.fit.<locals>.evaluate_candidates(candidate_params, cv, more_results)
813 if self.verbose > 0:
814 print(
815 "Fitting {0} folds for each of {1} candidates,"
816 " totalling {2} fits".format(
817 n_splits, n_candidates, n_candidates * n_splits
818 )
819 )
--> 821 out = parallel(
822 delayed(_fit_and_score)(
823 clone(base_estimator),
824 X,
825 y,
826 train=train,
827 test=test,
828 parameters=parameters,
829 split_progress=(split_idx, n_splits),
830 candidate_progress=(cand_idx, n_candidates),
831 **fit_and_score_kwargs,
832 )
833 for (cand_idx, parameters), (split_idx, (train, test)) in product(
834 enumerate(candidate_params), enumerate(cv.split(X, y, groups))
835 )
836 )
838 if len(out) < 1:
839 raise ValueError(
840 "No fits were performed. "
841 "Was the CV iterator empty? "
842 "Were there no candidates?"
843 )
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\utils\parallel.py:63, in Parallel.__call__(self, iterable)
58 config = get_config()
59 iterable_with_config = (
60 (_with_config(delayed_func, config), args, kwargs)
61 for delayed_func, args, kwargs in iterable
62 )
---> 63 return super().__call__(iterable_with_config)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:1088, in Parallel.__call__(self, iterable)
1085 if self.dispatch_one_batch(iterator):
1086 self._iterating = self._original_iterator is not None
-> 1088 while self.dispatch_one_batch(iterator):
1089 pass
1091 if pre_dispatch == "all" or n_jobs == 1:
1092 # The iterable was consumed all at once by the above for loop.
1093 # No need to wait for async callbacks to trigger to
1094 # consumption.
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:901, in Parallel.dispatch_one_batch(self, iterator)
899 return False
900 else:
--> 901 self._dispatch(tasks)
902 return True
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:819, in Parallel._dispatch(self, batch)
817 with self._lock:
818 job_idx = len(self._jobs)
--> 819 job = self._backend.apply_async(batch, callback=cb)
820 # A job can complete so quickly than its callback is
821 # called before we get here, causing self._jobs to
822 # grow. To ensure correct results ordering, .insert is
823 # used (rather than .append) in the following line
824 self._jobs.insert(job_idx, job)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\_parallel_backends.py:597, in ImmediateResult.__init__(self, batch)
594 def __init__(self, batch):
595 # Don't delay the application, to avoid keeping the input
596 # arguments in memory
--> 597 self.results = batch()
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:288, in BatchedCalls.__call__(self)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\joblib\parallel.py:288, in <listcomp>(.0)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\utils\parallel.py:123, in _FuncWrapper.__call__(self, *args, **kwargs)
121 config = {}
122 with config_context(**config):
--> 123 return self.function(*args, **kwargs)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sklearn\model_selection\_validation.py:686, in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
684 estimator.fit(X_train, **fit_params)
685 else:
--> 686 estimator.fit(X_train, y_train, **fit_params)
688 except Exception:
689 # Note fit time as time until error
690 fit_time = time.time() - start_time
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\xgboost\core.py:620, in require_keyword_args.<locals>.throw_if.<locals>.inner_f(*args, **kwargs)
618 for k, arg in zip(sig.parameters, args):
619 kwargs[k] = arg
--> 620 return func(**kwargs)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\xgboost\sklearn.py:1490, in XGBClassifier.fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
1462 (
1463 model,
1464 metric,
(...)
1469 xgb_model, eval_metric, params, early_stopping_rounds, callbacks
1470 )
1471 train_dmatrix, evals = _wrap_evaluation_matrices(
1472 missing=self.missing,
1473 X=X,
(...)
1487 feature_types=self.feature_types,
1488 )
-> 1490 self._Booster = train(
1491 params,
1492 train_dmatrix,
1493 self.get_num_boosting_rounds(),
1494 evals=evals,
1495 early_stopping_rounds=early_stopping_rounds,
1496 evals_result=evals_result,
1497 obj=obj,
1498 custom_metric=metric,
1499 verbose_eval=verbose,
1500 xgb_model=model,
1501 callbacks=callbacks,
1502 )
1504 if not callable(self.objective):
1505 self.objective = params["objective"]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\xgboost\core.py:620, in require_keyword_args.<locals>.throw_if.<locals>.inner_f(*args, **kwargs)
618 for k, arg in zip(sig.parameters, args):
619 kwargs[k] = arg
--> 620 return func(**kwargs)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\xgboost\training.py:185, in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, custom_metric)
183 if cb_container.before_iteration(bst, i, dtrain, evals):
184 break
--> 185 bst.update(dtrain, i, obj)
186 if cb_container.after_iteration(bst, i, dtrain, evals):
187 break
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\xgboost\core.py:1918, in Booster.update(self, dtrain, iteration, fobj)
1915 self._validate_dmatrix_features(dtrain)
1917 if fobj is None:
-> 1918 _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
1919 ctypes.c_int(iteration),
1920 dtrain.handle))
1921 else:
1922 pred = self.predict(dtrain, output_margin=True, training=True)
KeyboardInterrupt:
%% Cell type:markdown id: tags:
<h1> 6. Testing Data</h1>
%% Cell type:code id: tags:
``` python
y_testpred = im_xgb.predict(mTest.values)
y_testpred = pd.DataFrame(y_testpred, columns=['predicted marker'])
y_testpred.value_counts()
```
%% Output
predicted marker
1 36
0 32
2 32
Name: count, dtype: int64
%% Cell type:code id: tags:
``` python
y_testpred.to_csv('xgbPredictedlabels.csv')
```
%% Cell type:code id: tags:
``` python
mTest["marker"] = y_testpred
mTest.to_csv('TestingResultsMulti.csv')
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment