Machine Learning Models for Lung Cancer Prediction

Decision Trees

A decision tree is a flowchart-like structure where each internal node represents a feature (or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The model splits the dataset into smaller subsets while at the same time developing an associated decision tree incrementally.

Confusion Matrix

Random Forest

A random forest is an ensemble learning method that combines multiple decision trees to improve the overall performance. Each tree is trained on a random subset of the data, and the final prediction is made by averaging the predictions of all individual trees. This method reduces the risk of overfitting and improves accuracy.

Confusion Matrix

Support Vector Machine (SVM)

A support vector machine (SVM) is a supervised learning model that finds the optimal hyperplane to classify data points in a high-dimensional space. SVMs are effective in high-dimensional spaces and are versatile with different kernel functions for nonlinear classification.

SVM Implementation

The following SVM implementation uses a grid search with cross-validation to find the best hyperparameters:

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, StratifiedKFold

param_grid = {
    'C': [0.1, 1, 10, 100, 1000],
    'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
    'kernel': ['linear', 'poly', 'rbf', 'sigmoid']
}

cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
grid_search = GridSearchCV(SVC(), param_grid, cv=cv, scoring='accuracy', verbose=2, n_jobs=-1)
grid_search.fit(X_train, y_train)

best_svc = grid_search.best_estimator_
y_pred_best = best_svc.predict(X_test)

SVM Results

The SVM model provided the following results:

Classification Report:
               precision    recall  f1-score   support

           0       0.71      0.62      0.67         8
           1       0.95      0.96      0.95        54

    accuracy                           0.92        62
   macro avg       0.83      0.79      0.81        62
weighted avg       0.92      0.92      0.92        62

Confusion Matrix:
 [[ 5  3]
 [ 2 52]]

The classification report shows the precision, recall, and f1-score for each class:

The confusion matrix provides a detailed breakdown of the true positive, false positive, true negative, and false negative predictions:

Confusion Matrix