Internship at AUH, DCPT

Over a 6-month period, I worked as an intern at the Danish Center for Particle Therapy at Aarhus University Hospital. I developed machine learning algorithms and models as part of a larger project focused on breast cancer detection using YOLOv8.

Project Owner

Project Owner

DCPT, AUH

Developed

Developed

2024

Type

Type

E-commerce

Role

Role

Solo-dev

Challenge

The goal of the project was to develop an image sorting and classification tool for chest datasets collected from multiple hospitals. The main challenges included:

  • Data variability across hospitals (Aarhus, Aalborg, Odense), with inconsistent file naming conventions, metadata availability, and image quality.

  • Large data volume, processing over 35,000 images while maintaining reproducibility and manageable memory usage.

  • Balancing classification accuracy vs. generalization to handle diverse image distributions.

  • Automating renaming and sorting using metadata (year, patient ID) or filenames when metadata was incomplete or missing.

    The project is a part of bigger project BCCT (Breast Conservative Cosmetic Treatment).

Results

A trained YOLOv8 classifier, achieving >90% accuracy on validation data, with clear separation of the patient hands positions.

  • Modular Python scripts to process all datasets, classify images, and sort them into structured folders by class.

  • A renaming system incorporating patient randomization numbers and years from either filenames or a CSV metadata file.

  • Verified output across all hospitals, ensuring correct categorization and reproducibility of results.

  • Clear documentation to support further scaling or adaptation to additional datasets.

92%

Image Classification accuracy

0.89

F-1 Score

Process

Requirements Analysis & Technical Research:
I reviewed medical imaging workflows and clinical requirements for colorectal cancer detection and anatomical landmark localization.


System Architecture & Model Design:
Based on the tasks, I designed dedicated pipelines:

  • An image classification pipeline using Convolutional Neural Networks for cancer detection.

  • A landmark detection pipeline for precise sternal notch localization.

  • Modular components for preprocessing, model training, and validation.


Data Preparation & Model Implementation: I performed normalization, augmentation, and resizing to improve generalization. Using frameworks such as TensorFlow and PyTorch, we developed and trained CNN architectures, tuning hyperparameters iteratively to optimize performance


Integration & Testing:
End-to-end tests verified predictions on diverse image sets, including real-world clinical data collected during the internship. We evaluated models with precision, recall, and F1-score, and performed error analysis to refine the pipelines.

Stack

Conclusion

This project resulted in a reliable, automated image classification and organization pipeline that simplifies processing of large-scale datasets from multiple sources. It establishes a strong foundation for future improvements, such as incorporating additional posture classes, refining metadata extraction, or scaling to even larger datasets.