Multi-Model Approach for Brain Tumor
Classification and Segmentation

Deep Learning Course Exam Project SDIC Master Degree, University of Trieste (UniTS)

October 2024

Stefano Lusardi Marco Tallone Piero Zappi

Introduction

A deep learning project for brain tumor
classification and segmentation on MRI images
using CNN, U-Net, and VIT models.

Datasets Description
Classification Task
Segmentation Task
Conclusion

Introduction

A deep learning project for brain tumor
classification and segmentation on MRI images
using CNN, U-Net, and VIT models.

Datasets Description

Brain Tumor MRI Dataset (classification)
BraTS2020 Dataset (segmentation)

Classification Task
Segmentation Task
Conclusion

Introduction

A deep learning project for brain tumor
classification and segmentation on MRI images
using CNN, U-Net, and VIT models.

Datasets Description
Classification Task

Custom CNN
VIT
AlexNet
VGG16

Segmentation Task
Conclusion

Introduction

A deep learning project for brain tumor
classification and segmentation on MRI images
using CNN, U-Net, and VIT models.

Datasets Description
Classification Task
Segmentation Task

U-Net Models

Conclusion

Introduction

A deep learning project for brain tumor
classification and segmentation on MRI images
using CNN, U-Net, and VIT models.

Datasets Description
Classification Task
Segmentation Task
Conclusion

Final considerations
Possible improvements

Datasets

Brain Tumor MRI Dataset

Classification task dataset

Combination of three datasets
$7023$ images of human brain MRI images
Four classes: glioma, meningioma, no-tumor and pituitary

Datasets

Brain Tumor MRI Dataset

The dataset is separated into training and testing sets with a ratio of $80\%$ and $20\%$ respectively.

Datasets

Resizing and Data Augmentation

⚠ Images are resized to $128 \times 128$ pixels to reduce complexity

Transformations applied to the images at each epoch:

Random horizontal flip
Random rotation up to 10 degrees
Random change in brightness, contrast, saturation, and hue

These transformations add variability to the dataset and help the model generalize better

Datasets

BraTS2020 Dataset

Segmentation task dataset

BraTS stands for Brain Tumor Segmentation
It is composed by 155 horizontal ”slices” of brain MRI images for 369 patients (volumes): $$ 155 \cdot 369 = 57\,195 $$
We used 90% of data for training and 10% for testing
We used the 50% “most significant” slices of the dataset

Datasets

BraTS2020 Dataset

We used the 50% “most significant” slices of the dataset

Datasets

BraTS2020 Dataset

Images have 4 channels:

T1 weighted (T1): good for visualizing the brain but not the tumor
T1 weighted with contrast (T1c): taken with the same technique as T1 but with contrast
T2 weighted (T2): good for visualizing the edema
Fluid Attenuated Inversion Recovery (FLAIR): improves the visualization of the edema

Datasets

BraTS2020 Dataset

Each slice has 3 mask labels (some might be empty):

Necrotic and Non-Enhancing Tumor Core (NCR/NET)
Edema (ED)
Enhancing Tumor (ET)

Classification

Performance Assessment

Loss Function: Cross-entropy loss $$ L(y,\hat{y}) = - \sum_{i=1}^{N} y_i \log(\hat{y}_i) $$

Accuracy: Number of correct predictions divided by total number of predictions

Confidence: Given by the softmax applied to the net output $$ S(x_i) = \frac{e^{x_i}}{\sum_{j=1}^{N} e^{x_j}} $$

Classification

Custom CNN Architecture

Number of parameters: $3\,001\,156$

Classification

Training Details

Costum CNN model training parameters:

Epochs: 50
Optimizer: Adam (weight decay $1 \times 10^{-5}$)
Scheduler: stepLR (step size $10$, gamma $0.5$)
Loss function: Cross-entropy
Learning rate: $1 \times 10^{-4}$
Batch size: 64 (both training and validation)
Activation function: Mish
Dropout rate: $0.4$
Image size: $128 \times 128$

Classification

Training Loss and Accuracy

Final training loss: $1.4 \cdot 10^{-3}$
Final training accuracy: $99.9\%$

Classification

Confidence and Test Accuracy

Final training confidence: $99.9\%$
Final test confidence: $99.9\%$
Final test accuracy: $99.0\%$

Classification

VIT Architecture

Number of parameters: $21\,459\,460$

Classification

Training Details

VIT model training parameters:

Epochs: 50
Optimizer: Adam (weight decay $1 \times 10^{-5}$)
Scheduler: stepLR (step size $10$, gamma $0.5$)
Loss function: Cross-entropy
Learning rate: $1 \times 10^{-4}$
Batch size: 64 (both training and validation)
Activation function: Mish
Dropout rate: $0.2$
Image size and Patch size: $128 \times 128$, $16 \times 16$
Number of heads: 8
Number of layers: 10
Patch embedding dimension: 512
Feedforward dimension: 1024

Classification

Training Loss and Accuracy

Final training loss: $0.27$
Final training accuracy: $90\%$

Classification

Confidence and Test Accuracy

Final training confidence: $96\%$
Final test confidence: $93\%$
Final test accuracy: $88\%$

Classification

AlexNet Architecture

Number of parameters: $4\,589\,316$

Classification

VGG16 Architecture

Number of parameters: $65\,070\,916$
Dropout rate: $0.5$

Classification

Setup Differences

Model	Data augmentation	Scheduler	Activation	L2 regularization
CustomCNN	Yes ✅	Yes ✅	Mish	Yes ✅
AlexNet	No ❌	Yes ✅	ReLU	Yes ✅
VGG16	No ❌	No ❌	ReLU	No ❌
VIT	Yes ✅	Yes ✅	Mish	Yes ✅

All the other hyperparameters and settings are the same for all models (batch size, optimizer, epochs, etc…)
Note that the CustomCNN is the one with less parameters ($3\,001\,156$) while VGG16 is the one with more parameters ($65\,070\,916$)
VGG16 also has the highest dropout rate ($0.5$)

Classification

Training Loss and Accuracy for AlexNet

Final training loss: $1.2 \cdot 10^{-3}$
Final training accuracy: $99.9\%$

Classification

Confidence and Test Accuracy for AlexNet

Final training confidence: $99.9\%$
Final test confidence: $96.5\%$
Final test accuracy: $90\%$

Classification

Training Loss and Accuracy for VGG16

Final training loss: $8.9 \cdot 10^{-6}$
Final training accuracy: $99.9\%$

Classification

Confidence and Test Accuracy for VGG16

Final training confidence: $100\%$
Final test confidence: $98\%$
Final test accuracy: $95\%$

Classification

Training Performance Comparison

Model	Loss	Accuracy	Confidence
CustomCNN	$1.4 \cdot 10^{-3}$	$99\%$	$100\%$
AlexNet	$1.2 \cdot 10^{-3}$	$99\%$	$99.9\%$
VGG16	$8.9 \cdot 10^{-6}$	$99\%$	$100\%$
VIT	$0.27$	$90\%$	$96.1\%$

⚠ Note that these are the values reached during the last epoch

Classification

Focus on Accuracy

Classification

Test Performance Comparison

Model	Accuracy	Confidence
CustomCNN	$99\%$	$100\%$
AlexNet	$90\%$	$96.5\%$
VGG16	$95\%$	$98.0\%$
VIT	$88\%$	$93.3\%$

⚠ Note that these are the values reached after the last epoch

Classification

Visualizing the 1^st layer filters: CustomCNN

Classification

Visualizing the 1^st layer filters: AlexNet

Classification

Visualizing the 1^st layer filters: VGG16

Segmentation

U-Net Models

$3$ models for the segmentation task:

Classic U-Net: baseline U-Net model architecture

Segmentation

U-Net Models

$3$ models for the segmentation task:

Classic U-Net: baseline U-Net model architecture
Improved U-Net: small improvements, fewer parameters

Segmentation

U-Net Models

$3$ models for the segmentation task:

Classic U-Net: baseline U-Net model architecture
Improved U-Net: small improvements, fewer parameters
Attention U-Net: attention mechanism added

Segmentation

U-Net Models

$3$ models for the segmentation task:

Classic U-Net: baseline U-Net model architecture
Improved U-Net: small improvements, fewer parameters
Attention U-Net: attention mechanism added

Segmentation

Classic U-Net

Input: $4$ channels
Output: $3$ channels [ R G B ]

Segmentation

Classic U-Net

$3 \times 3$ convolutions
ReLU activations

Segmentation

Classic U-Net

$2 \times 2$ max pooling

Segmentation

Classic U-Net

$2 \times$ bilinear upsampling

Segmentation

Classic U-Net

Skip connections: concatenation

Segmentation

Improved U-Net

$7 \times 7$ kernels
Inverse bottleneck
Separable convolutions

Segmentation

Improved U-Net

Segmentation

Improved U-Net

Segmentation

Improved U-Net

Batch normalization

Segmentation

Improved U-Net

Additive skip connections

Segmentation

Attention U-Net

Attention gates

Segmentation

Attention U-Net

Segmentation

Training Details

U-Net models training parameters:

Epochs: 20
Optimizer: Adam (with weight decay $1 \times 10^{-2}$)
Scheduler: Exponential Decay (gamma $0.9$)
Loss function: BCE with Logits Loss: $$ \ell(y, \hat{y}) = -[y \log(\sigma(\hat{y})) + (1 - y) \log(1 - \sigma(\hat{y}))] $$
Learning rate: $2 \times 10^{-3}$
Batch size: 32 (both training and validation)
First encoder filters: 32
Image size: $240 \times 240$

Segmentation

Visualizing a prediction

Segmentation

Visualizing a prediction

Segmentation

Performance Assessment

$$\text{Dice} = \frac{2 \times |X \cap Y|}{|X| + |Y|}$$ Dice Coefficient
“overlap” metric

$$\text{Precision} = \frac{TP}{TP + FP}$$ Precision
prediction quality

$$\text{Recall} = \frac{TP}{TP + FN}$$ Recall
prediction quantity

Segmentation

Performance Assessment

$$\text{Dice} = \frac{2 \times |X \cap Y|}{|X| + |Y|}$$ Dice Coefficient
“overlap” metric

$$\text{Precision} = \frac{TP}{TP + FP}$$ Precision
prediction quality

$$\text{Recall} = \frac{TP}{TP + FN}$$ Recall
prediction quantity

Segmentation

Performance Assessment

$$\text{Dice} = \frac{2 \times |X \cap Y|}{|X| + |Y|}$$ Dice Coefficient
“overlap” metric

$$\text{Precision} = \frac{TP}{TP + FP}$$ Precision
prediction quality

$$\text{Recall} = \frac{TP}{TP + FN}$$ Recall
prediction quantity

Segmentation

Performance Assessment

$$\text{Dice} = \frac{2 \times |X \cap Y|}{|X| + |Y|}$$ Dice Coefficient
“overlap” metric

$$\text{Precision} = \frac{TP}{TP + FP}$$ Precision
prediction quality

$$\text{Recall} = \frac{TP}{TP + FN}$$ Recall
prediction quantity

Segmentation

Visualizing Attention Maps

Conclusion

Possible improvements:

Use original image size for better classification results
Use larger dataset to train the VIT model
Enlarge the VIT model for better performance
Attempt transfer learning with VIT model
Test different architectures for the segmentation task
Fully exploit segmentation dataset
Perform complete hyperparameter search for the segmentation models
Use metadata information to predict patient’s survival days

References

B. H. Menze et al. "The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)", IEEE Transactions on Medical Imaging 34(10), 1993-2024 (2015) DOI: 10.1109/TMI.2014.2377694
S. Bakas et al. "Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features", Nature Scientific Data, 4:170117 (2017) DOI: 10.1038/sdata.2017.117
S. Bakas et al. "Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge", arXiv preprint arXiv:1811.02629 (2018)
S. Bakas et al. "Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-GBM collection", The Cancer Imaging Archive, 2017 DOI: 10.7937/K9/TCIA.2017.KLXWJJ1Q
S. Bakas et al. "Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-LGG collection", The Cancer Imaging Archive, 2017 DOI: 10.7937/K9/TCIA.2017.GJQ7R0EF
Alex Krizhevsky et al. "ImageNet classification with deep convolutional neural networks", Commun. ACM 60, 6 (June 2017), 84–90, https://doi.org/10.1145/3065386

References

A. Dosovitskiy et al. "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", International Conference on Learning Representations http://openreview.net/forum?id=YicbFdNTTy
Omer, A.A.M. "Image Classification Based on Vision Transformer", Journal of Computer and Communications, 12, 49-59 (2024) https://doi.org/10.4236/jcc.2024.124005
Ronneberger, O., Fischer, P., & Brox, T. "U-Net: Convolutional Networks for Biomedical Image Segmentation", In Nassir Navab, Joachim Hornegger, William M. Wells, & Alejandro F. Frangi (Eds.), Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015, 234–241 https://doi.org/10.1007/978-3-319-24574-4_28
Chollet, F. "Xception: Deep Learning with Depthwise Separable Convolutions", CoRR, abs/1610.02357 http://arxiv.org/abs/1610.02357
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T., & Xie, S. "A ConvNet for the 2020s", arXiv preprint arXiv:2201.03545 https://arxiv.org/abs/2201.03545
Sandler, M. et al. "Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation", CoRR, abs/1801.04381 http://arxiv.org/abs/1801.04381
Oktay, O. et al. "Attention U-Net: Learning Where to Look for the Pancreas", CoRR, abs/1804.03999 http://arxiv.org/abs/1804.03999

Multi-Model Approach for Brain Tumor Classification and Segmentation

Introduction

Introduction

Introduction

Introduction

Introduction

Brain Tumor MRI Dataset

Brain Tumor MRI Dataset

Resizing and Data Augmentation

BraTS2020 Dataset

BraTS2020 Dataset

BraTS2020 Dataset

BraTS2020 Dataset

Performance Assessment

Custom CNN Architecture

Training Details

Training Loss and Accuracy

Confidence and Test Accuracy

VIT Architecture

Training Details

Training Loss and Accuracy

Confidence and Test Accuracy

AlexNet Architecture

VGG16 Architecture

Setup Differences

Training Loss and Accuracy for AlexNet

Confidence and Test Accuracy for AlexNet

Training Loss and Accuracy for VGG16

Confidence and Test Accuracy for VGG16

Training Performance Comparison

Focus on Accuracy

Test Performance Comparison

Visualizing the 1st layer filters: CustomCNN

Visualizing the 1st layer filters: AlexNet

Visualizing the 1st layer filters: VGG16

U-Net Models

U-Net Models

U-Net Models

U-Net Models

Classic U-Net

Classic U-Net

Classic U-Net

Classic U-Net

Classic U-Net

Improved U-Net

Improved U-Net

Improved U-Net

Improved U-Net

Improved U-Net

Attention U-Net

Attention U-Net

Training Details

Visualizing a prediction

Visualizing a prediction

Performance Assessment

Performance Assessment

Performance Assessment

Performance Assessment

Visualizing Attention Maps

Conclusion

Conclusion

References

References

Multi-Model Approach for Brain Tumor
Classification and Segmentation

Visualizing the 1^st layer filters: CustomCNN

Visualizing the 1^st layer filters: AlexNet

Visualizing the 1^st layer filters: VGG16