In March of 2020, I joined the ADA group to work on my Introductory research project, as part of my master program in Computer Science, under the supervision of Dr. Mitra Baratchi and Dr. Jan van Rijn. As my research topic was related to satellite data, I had the opportunity to collaborate with Dr. Andreas Vollrath, who was working at the ESA Phi-lab. My project was titled “Building Deep Learning Models for Remote Sensing Applications Through Automated Machine Learning”. Based on the results of our assumption test and preliminary experiments, we presented an online poster for the ESA Phi-week 2020. The presentation video can be watched here. We demonstrated the potential of applying the most recent advances in AutoML, specifically hyperparameter optimisation, to a remote sensing dataset. Starting from that, we proposed to build an AutoML system focused on Earth Observation data. This project proposal led to my thesis work “Automated Machine Learning for Satellite Data: Integrating Remote Sensing Pre-trained Models into AutoML Systems”.
Application areas that can benefit from machine learning models extracted from Satellite data are numerous. Environmental mappings, crops monitoring, urban planning and emergency response are some examples. However, in order to leverage the latest advancements in machine learning, much expert knowledge and hands-on expertise is required. Automated Machine Learning (AutoML) aims to automate the different stages of a machine learning pipeline, making crucial decisions in a data-driven and objective way. To reach this goal, AutoML systems have been developed.
Open source AutoML systems are usually benchmarked with natural image datasets. Satellite images differ from natural images in various aspects. One of the most important differences is related to the number and type of spectral bands composing the image. Natural colour images have three bands (red, green & blue), but in satellite images, more spectral bands are available. Therefore, the input data can be composed of more and different types of spectral bands (a common practice in real-world applications). These differences made us wonder how the current AutoML systems work for satellite datasets. To come up with an answer, the first part of our project focused on testing the image classification task available in the AutoML system of AutoKeras on a varied remote sensing benchmark suite of 7 datasets that we compiled by reviewing the remote sensing literature. We found good performance in almost all of these datasets, with impressive results for the RGB version of two of these datasets (EuroSAT and So2Sat). However, we did not achieve better results than manually designed models for more real-world application datasets. This showed the importance of customising these systems to achieve better performance on remote sensing data.
Furthermore, as acquiring ground truth label data is quite expensive, a problem in the field of remote sensing is designing high performing models in the lack of large labelled datasets. The use of pre-trained models has given promising results to address this challenge. As a next step, we added remote sensing pre-trained models to AutoKeras. This helped to improve the accuracy on non-RGB datasets further. In our experiments, we compared 3 AutoML approaches (initializing architectures (i) without pre-training, (ii) pre-trained on Imagenet and (iii) pre-trained on remote sensing datasets) against manually designed architectures on our benchmark set. AutoML architectures outperformed the manually designed architectures in 5 of these datasets. We showed that the best AutoML approach depends on the properties of the dataset at hand, and these results further highlighted the importance of having customised Earth Observation AutoML systems. To know more about this project, please take a look at the project page.
By using AutoML methods to reduce the gap between state-of-the-art machine learning research and applied machine learning, the use of advanced AI for remote sensing applications can be more accessible. One of my personal goals is to keep updating and documenting our project repository so people of different backgrounds can use it.
We believe that making this benchmark publicly available will enable the community to further experiment with relevant remote sensing datasets and expand the AutoML systems for different application goals.