AutoAI4EO: NAS with AutoKeras for Earth Observation (Part 1)

This is the first post in a series about our research on Neural Architecture Search (NAS) for Earth Observation (EO). This blog post is an introduction to NAS for EO with AutoKeras . In Part 2 we will talk about NAS for the classification of satellite imagery and transfer learning in AutoKeras. Part 3 will cover NAS for super-resolution for satellite imagery and advanced search space modifications in AutoKeras.  

Today we will talk about what NAS is, and why it is so useful for the analysis of EO imagery. We’ll discuss AutoKeras, and how we have used it in our research to create methods customized for EO. 

What is Neural Architecture Search? 

Deep neural networks have become ubiquitous algorithms for automating different tasks ranging from language processing to facial recognition. These powerful methods can be used to model complex relationships in data without using manually engineered features. However, the design of the neural networks themselves can be a tedious and time-consuming process, during which many different architectures are examined by deep learning experts. Neural network architectures are controlled by hyperparameters that define the types of layers used, the number of layers, but also training parameters like the choice of optimizer or learning rate. The number of choices that have to be made makes it hard for other scientists to use these tools for their research: as a consequence, they often use simpler, available options like CNNs and miss out on the full capabilities of state-of-the-art neural network architecture (think: GANs, transformers, etc.).  

Figure 1: Diagram of a standard NAS framework. A search strategy s samples candidate models from the search space S. The candidate model is evaluated. The search strategy is updated with the evaluation result. 

There is currently a need for these state-of-the-art architectures to become more accessible to researchers from other fields, this is where NAS comes in. NAS frameworks can automatically design and optimize neural networks based on the input data, thus circumventing a human design expert. To do this, the framework needs 3 components: 

  1. A search space populated with possible hyperparameters (e.g., layer type, activation functions). Candidate neural networks are built from components in the search space. 
  1. A search strategy: you need an algorithm that will traverse the search space and intelligently design neural networks. The total number of possible networks that can be constructed from a given search space is often much larger than the number of those that can reasonably be evaluated (for example, there are approximately 1015 architectures in the ENAS search space . This number is so large, because there is a choice of 4 different activation functions for the 12 layers that are in the base module that is repeated to create the network, resulting in 412  or approximately 1015 options [1]), therefore you need a way to decide on the next candidate architecture to consider. 
  1. An evaluation metric: finally, you want to be able to evaluate and compare the architectures that your framework has generated so you can find the best one. 

The search space and the evaluation metric are often task-specific and used for many NAS frameworks. However, the search strategy can vary strongly from framework to framework. There are many search strategies you can choose from, including ones using Reinforcement Learning (e.g., NAS-RL [2]), Evolutionary Algorithms (e.g., Large-scale Evolution [3]), or even stochastic gradient descent (e.g., DARTS [4]). NAS libraries often allow you to use these strategies as well as others like random search and Bayesian optimization. So far, our research is focused on designing the search space and thus we used the standard search strategy offered by the NAS library. 

Why use NAS for EO? 

We’ve established that NAS has great potential to make state-of-the-art neural networks more accessible to scientists from all domains. But why is it especially well-suited to address EO problems? 

Let’s think about a typical EO analysis pipeline for the task of image classification. You want to classify your images based on land cover: whether it is a city, a forest, a desert, etc. First, the raw data would be obtained by a measurement instrument, like Sentinel-2. These images first need to be preprocessed: for example, you want to calibrate the colors in the image to account for different lighting conditions, you want to remove artifacts, and filter your data for clouds. Then, you could consider using techniques like super-resolution to increase the quality of the data. Finally, you can classify your images using either a trained classifier or take the extra steps to label your data and train a classifier specifically for your dataset.  

Usually, you would manually select the methods and procedures you would use at each step. This is a time-consuming process and makes it harder to automatically process the vast amounts of data that are generated by EO instruments. An additional challenge is that each step requires different expertise, and thus often different people to carry out these steps. This process could be automated with the help of AutoML techniques, saving researchers valuable hours. Additionally, it is really not possible for humans to find the best pipeline by (informed) trial and error if the pipeline is very complex and there are many design choices to be made.  AutoML can make it possible to automate tasks based on EO data and as a result analyze more data.  

Figure 2: Left: Sentinel-2 image. 27 January 2019. European Space Agency. Right: Sample of a frog from the CIFAR-10 dataset [5]. This dataset is often used for image classification.

Interest is rising in AI4EO: artificial intelligence techniques for Earth Observation, that are not simply direct applications of existing machine learning methods, but take unique properties of  EO data into account. EO data exists in many forms, from measurements of wind direction to optical satellite images. EO imagery can contain many more features of different scales than natural images (for instance, of faces). Additionally, the tasks performed with EO images can be very different from the tasks performed with natural images. For example, in deforestation mapping, small differences between individual images can be of great importance. Therefore, we cannot simply use techniques for natural images. Besides, the performance of ML methods for EO problems can be greatly increased by using available knowledge and theory of physical models, as well as having the benefit of making these ML models more explainable. 

In the case of NAS, there are examples of adaptions of existing NAS frameworks for related domains such as spatio-temporal forecasting. For instance, AutoST [6] modifies the DART framework with knowledge of spatio-temporal systems. The resulting framework can generate networks that outperform state-of-the-art approaches to various forecasting problems. 

AutoKeras’ role 

As mentioned in the previous section, there are many options in terms of NAS frameworks. In this section, we will describe one of those, AutoKeras [7], which we have used for our research on NAS for EO. AutoKeras is a Python library that enables users to implement NAS in Keras. It offers some ready-made options like NAS for Image Classification and NAS for Regression. There are many options to populate the search space, including existing models like ResNet [8] and Transformers as well as different search strategies. AutoKeras generates candidate architectures by mutating and repeating so-called blocks: these blocks are sub-networks, like a stack of one or more CNN layers or even complete models.  Block parameters like the number of layers or the kernel size are automatically configured by AutoKeras, but the framework can also select different types of blocks and stack them. 

AutoKeras can be used to create customized search spaces by changing the block parameters, but users can also create custom blocks. Additionally, it is possible to load pre-trained weights to speed up training. We can use this functionality to customize our methods for EO. We need to do this, because we want to use characteristics of EO data to achieve better results on EO tasks than we could by simply using techniques developed for natural images. Additionally, customising our search space to our task will help reduce the search space. Though an infinite search space can, in theory, help us discover neural networks that a human would not think of, in practice, this can result in prohibitively long running times before a good architecture is found.  

In the coming blog posts, we will discuss two examples of how we have used AutoKeras in our research to create NAS methods specifically for EO. Next up will be the classification of satellite imagery, where we have used custom blocks and the power of transfer learning to achieve state-of-the-art results in image classification.  


[1] Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, Jeff Dean.  Proceedings of the 35th International Conference on Machine Learning, PMLR 80:4095-4104, 2018. 

[2]  B. Zoph, Q.V. Le, Neural Architecture Search with reinforcement learning, Proceedings of the International Conference on Learning Representations (ICLR), 2017. 

[3] Real, E., Moore, S., Selle, A., Saxena, S., Suematsu, Y. L., Tan, J., … & Kurakin, A. (2017, August). Large-scale evolution ofimage classifiers. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 2902-2911).JMLR. org 

[4] Liu, H., Simonyan, K., & Yang, Y. (2018). Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055. 

[5] A. Krizhevsky, “Learning Multiple Layers of Features from
Tiny Images,” Technical report, 2009.

[6] Li, T., Zhang, J., Bao, K., Liang, Y., Li, Y., Zheng, Y. (n.d.). AutoST: Efficient Neural Architecture Search for Spatio-Temporal Prediction. KDD, 20. 

[7] Haifeng Jin, Qingquan Song, and Xia Hu. 2019. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19). Association for Computing Machinery, New York, NY, USA, 1946–1956. 

[8] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90. 


Exact stochastic constraint optimisation with applications in network analysis

In business, governance, science as well as in our daily lives, we often have to solve problems that involve optimal decision-making under constraints (limitations on the choices that we can make) and uncertainty (probabilistic knowledge about the future).

There are many stochastic constraint (optimisation) problems (SCPs) that arise from probabilistic networks. These are networks where we associate a probability with the nodes and/or edges of the network.

Going viral

One of those problems is the spread of influence (or viral marketing) problem. We are given a social network with probabilistic influence relationships. Suppose Alexa has a probability of 0.8 to convince Claire to do something. We can model this with a node that symbolises Alexa and a node that symbolises Claire, and an edge pointing from Alexa to Claire, labelled with probability 0.8.

In this network, we assume all influence relationships to be symmetric. Hence, there is an edge with label 0.8 pointing from node a (Alexa) to node c (Claire), and the same arrow pointing back, resulting in one undirected edge connecting these two nodes.

Suppose now that we are a company planning a marketing campaign to promote our product. We want to use a word-of-mouth strategy on the probabilistic social network that we have of our potential customers. The way to start the process is to give free samples of our product to certain people in the network, and hope that they like it and convince their friends to buy the product, who then convince their friends, and so on, until our product goes viral. We have a limited budget for this (the constraint), and want to maximise the expected number of people who will eventually buy our product (the optimisation criterion). Which people in the network should get a free sample of our product?

Alternatively, we can require that the expected number of people who become customers exceeds a certain threshold (the constraint), while we aim to minimise the number of free samples that are distributed (the optimisation criterion). In practice, we can solve a stochastic constraint optimisation problem with a stochastic optimisation criterion and a budget constraint by formulating it as a stochastic constraint satisfaction problem (CSP) with a stochastic constraint and budget constraint. We then start the solving process with a very low threshold for the stochastic constraint (e.g., 0), and increase it each time we find a solution to the CSP. Then, we continue the search for a better solution until no better solution can be found. By then, the last solution we have found is an optimum solution.

Finding an optimum solution is hard

It is the stochastic constraint mentioned above that makes it hard to find a solution to SCPs. Given a strategy (a choice for which people receive a free sample), computing the probability that a person will become one of our customers is expensive. It involves summing probabilities over many, often partially overlapping, paths in the network. We then have to do this for all people in the network (to get the expected number of customers), and potentially have to repeat this for all possible strategies, in order to find a strategy that maximises the expected number of customers (by repeatedly finding an expected number of customers that exceeds an ever-increasing threshold of the stochastic constraint).

Joining forces

To find a solution to the problem, we therefore need two things: a way to quickly reason about probabilities, and a way to traverse the search space of strategies intelligently. We can combine the technique of knowledge compilation with the paradigm of constraint programming to create data structures (specifically, decision diagrams) that capture information about probability distributions, such that we can quickly compute probabilities, and to create constraint propagation algorithms on those data structures, which prune the search space.

Dedicated constraint propagation yields faster solving

A constraint propagation algorithm helps to prune the search space by removing values from the domains of variables. In particular, it ensures that variable domains contain no values that would falsify the constraint(s) in which the variables are present. Comparing different approaches that combine knowledge compilation and constraint programming, we find that having a dedicated constraint propagation algorithm for stochastic constraints tends to yield shorter solving times than approaches that decompose stochastic constraints into small, linear constraints (for which propagators already exist). We attribute this to the fact that in the decomposition process, we lose information about the relationship between different variables that participate in the stochastic constraint. Hence, the pruning power of the propagation methods is less, and the solver cannot prune certain parts of the search space that do not contain any solutions.


You can learn more about different ways of combining knowledge compilation and constraint programming in our paper “Exact stochastic constraint optimisation with applications in network analysis”, which was published in Artificial Intelligence, vol 304, March 2022.
Authors: Anna L.D. Latour, Behrouz Babaki, Daniël Fokkinga, Marie Anastacio, Holger H. Hoos, Siegfried Nijssen.