Internship at AUH, DCPT
Over a 6-month period, I worked as an intern at the Danish Center for Particle Therapy at Aarhus University Hospital. I developed machine learning algorithms and models as part of a larger project focused on breast cancer detection using YOLOv8.
DCPT, AUH
2024
E-commerce
Solo-dev
Challenge
The goal of the project was to develop an image sorting and classification tool for chest datasets collected from multiple hospitals. The main challenges included:
Data variability across hospitals (Aarhus, Aalborg, Odense), with inconsistent file naming conventions, metadata availability, and image quality.
Large data volume, processing over 35,000 images while maintaining reproducibility and manageable memory usage.
Balancing classification accuracy vs. generalization to handle diverse image distributions.
Automating renaming and sorting using metadata (year, patient ID) or filenames when metadata was incomplete or missing.
The project is a part of bigger project BCCT (Breast Conservative Cosmetic Treatment).
Results
A trained YOLOv8 classifier, achieving >90% accuracy on validation data, with clear separation of the patient hands positions.
Modular Python scripts to process all datasets, classify images, and sort them into structured folders by class.
A renaming system incorporating patient randomization numbers and years from either filenames or a
CSV
metadata file.Verified output across all hospitals, ensuring correct categorization and reproducibility of results.
Clear documentation to support further scaling or adaptation to additional datasets.
92%
Image Classification accuracy
0.89
F-1 Score

Process
Requirements Analysis & Technical Research:
I reviewed medical imaging workflows and clinical requirements for colorectal cancer detection and anatomical landmark localization.
System Architecture & Model Design:
Based on the tasks, I designed dedicated pipelines:
An image classification pipeline using Convolutional Neural Networks for cancer detection.
A landmark detection pipeline for precise sternal notch localization.
Modular components for preprocessing, model training, and validation.
Data Preparation & Model Implementation: I performed normalization, augmentation, and resizing to improve generalization. Using frameworks such as TensorFlow and PyTorch, we developed and trained CNN architectures, tuning hyperparameters iteratively to optimize performance
Integration & Testing:
End-to-end tests verified predictions on diverse image sets, including real-world clinical data collected during the internship. We evaluated models with precision, recall, and F1-score, and performed error analysis to refine the pipelines.
Stack


Conclusion
This project resulted in a reliable, automated image classification and organization pipeline that simplifies processing of large-scale datasets from multiple sources. It establishes a strong foundation for future improvements, such as incorporating additional posture classes, refining metadata extraction, or scaling to even larger datasets.