Brain tumor detection

Introduction

Machine learning has become invaluable in the medical field, helping to detect early signs of illness, process large datasets, and manage risk factors effectively. Many medical instruments, such as MRI, CT scanners, ultrasound, and laparoscopy, produce data in the form of images. Here, computer vision techniques enable ML algorithms to analyze these images, identifying patterns and anomalies that might be challenging for human eyes to detect. While human analysis can miss subtle details due to various factors, ML combined with computer vision can assist in making more accurate and timely decisions.

Dataset

The dataset used is Brain MRI Images for Brain Tumor Detection by Navoneel Chakrabarty.

Most of the images were properly prepared, with a few exceptions where the brain didn't occupy the entire frame, resulting in some unnecessary padding. I manually cropped these images, which led to improved accuracy.

The dataset was split into 80% for training and 20% for testing.

Data augmentation

Since the dataset is relatively small, each image fed into the model is randomly flipped horizontally with a 50% probability and rotated by a small random angle to augment the data.

Models

AlexNet

This convolutional neural network was introduced in "ImageNet Classification with Deep Convolutional Neural Networks" (2012) and was the state-of-the-art at the time of publication. It uses 5 Convolutional Layers, with ReLU and Max Pooling, and 3 Fully Connected Layers.

VGG-16

VGG-16 is a convolutional neural network introduced in "Very Deep Convolutional Networks for Large-Scale Image Recognition" (2014). This model explored the influence of network's depth on the accuracy.

ResNet50

A residual neural network introduced in "Deep Residual Learning for Image Recognition" (2015) adds residual connections to the architecture, which helps with vanishing gradients.

EfficientNet

EfficientNet is a convolutional neural network introduced in "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks" (2019). This network is a result of optimising CNN scaling in depth, width, height, and resolution.

MobileNetV3

A neural network introduced in "Searching for MobileNetV3" paper from 2019. Main assumption was to make the network as small as possible for usage in mobile devices with accuracy remaining high. This neural network is based on Inverted Residuals and Linear Bottlenecks to keep the number of parameters low.

Vision Transformer

A transformer introduced in "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" paper from 2020. This architecture used transformers in Computer Vision by embedding images into words and feeding these words into a transformer architecture using multi-head attention to extract important features.

Transfer learning

All of the models mentioned above come with weights pre-trained on ImageNet1K. The classifier has been modified to return a single output using Dropout and a Fully Connected Layer, with only the final layer being updated during training.

Each model was trained on the same dataset, for 30 epochs.

Loss curves

Each model was able to learn from the dataset and reduce the loss for both the training and the test data. Modern architectures proved to be more efficient, with the Vision Transformer achieving the lowest test loss. The only exception was MobileNetV3, which does not prioritize efficiency as its core architectural focus.

Accuracies

All models achieved an accuracy of around 90%. MobileNet performed comparably to the larger, though older, AlexNet. The Vision Transformer outperformed all other models, reaching an impressive 96-98% accuracy.

Confusion matrices

Most models can accurately predict the presence of a tumor when one exists; however, they also produce a significant number of false positives. In reviewing the dataset, I noticed that many images labeled as negative still show various changes in brain structure. These abnormalities take different shapes, but it's unclear whether they are tumors or something else.

Time used for training

All models were trained on a relatively old GPU, the NVIDIA GeForce 1050Ti, meaning that training times would be significantly shorter on a modern, machine-learning-optimized GPU. However, training times between models did not vary much, despite differences in the number of parameters in each model. This is primarily because only a small portion of each model was actually trained, and data loading introduced considerable overhead. The performance differences between models would likely become more apparent after deployment, where these limitations would have less impact.

Conclusions

Computer vision has advanced significantly over the past decade and is expected to progress even further in the coming years. State-of-the-art models are continually being replaced by newer, more accurate, and efficient alternatives. This rapid pace of improvement suggests that we will soon see models with even higher precision, reduced error rates, and broader applications across various fields, including healthcare, and autonomous vehicles. As computer vision models continue to evolve, they will likely play an increasingly critical role in industries that rely on detailed image analysis, enabling breakthroughs in diagnosis, safety, and data-driven decision-making.

Notebooks