Classification example 1 using Health Data with PyCaret
#Fetal cardiotocography example
#Code from https://github.com/pycaret/pycaret/
#Dataset link: https://www.kaggle.com/akshat0007/fetalhr
## Importing necessary libraries
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
## Reading the dataset using pandas
import pandas as pd
url = 'https://raw.githubusercontent.com/davidrkearney/colab-notebooks/main/datasets/CTG.csv'
df = pd.read_csv(url, error_bad_lines=False)
df
## Having a look of our data
df.head()
Feature Abbreviations used in the dataset :-
FileName: of CTG examination
Date: of the examination
b: start instant
e: end instant
LBE: baseline value (medical expert)
LB: baseline value (SisPorto)
AC: accelerations (SisPorto)
FM: foetal movement (SisPorto)
UC: uterine contractions (SisPorto)
ASTV: percentage of time with abnormal short term variability (SisPorto)
mSTV: mean value of short term variability (SisPorto)
ALTV: percentage of time with abnormal long term variability (SisPorto)
mLTV: mean value of long term variability (SisPorto)
DL: light decelerations
DS: severe decelerations
DP: prolongued decelerations
DR: repetitive decelerations
Width: histogram width
Min: low freq. of the histogram
Max: high freq. of the histogram
Nmax: number of histogram peaks
Nzeros: number of histogram zeros
Mode: histogram mode
Mean: histogram mean
Median: histogram median
Variance: histogram variance
Tendency: histogram tendency: -1=left assymetric; 0=symmetric; 1=right assymetric
A: calm sleep
B: REM sleep
C: calm vigilance
D: active vigilance
SH: shift pattern (A or Susp with shifts)
AD: accelerative/decelerative pattern (stress situation)
DE: decelerative pattern (vagal stimulation)
LD: largely decelerative pattern
FS: flat-sinusoidal pattern (pathological state)
SUSP: suspect pattern
CLASS: Class code (1 to 10) for classes A to SUSP
NSP: Normal=1; Suspect=2; Pathologic=3
## Dropping the columns which we don't need
df=df.drop(["FileName","Date","SegFile","b","e"],axis=1)
df.head()
df.columns
## This will print the number of columns and rows
print(df.shape)
## Checking for the null values
df.isnull().sum()
## Dropping the the rows containing null values
df=df.dropna()
df.isnull().sum()
## Checking the data type of the columns
df.dtypes
# This command will basically import all the modules from pycaret that are necessary for classification tasks
from pycaret.classification import *
# Setting up the classifier
# Pass the complete dataset as data and the featured to be predicted as target
clf=setup(data=df,target='NSP')
# This model will be used to compare all the model along with the cross validation
compare_models()
xgboost_classifier=create_model('rf')
## Let's now check the model hyperparameters
print(xgboost_classifier)
# Whenenver we compare different models or build a model, the model uses deault
#hyperparameter values. Hence, we need to tune our model to get better performance
tuned_xgboost_classifier=tune_model(xgboost_classifier)
We can clearly conclude that our tuned model has performed better than our original model with default hyperparameters. The mean accuracy increased from 0.9899 to 0.9906
pycaret library really makes the process of tuning hyperparameters easy
We just need to pass the model in the following command
tune_model(model_name)
plot_model(tuned_xgboost_classifier,plot='class_report')
plot_model(tuned_xgboost_classifier,plot='confusion_matrix')
## This can be used to save our trained model for future use.
save_model(tuned_xgboost_classifier,"XGBOOST CLASSIFIER")
## This can be used to load our model. We don't need to train our model again and again.
saved_model=load_model('XGBOOST CLASSIFIER')