Sparkle: Accessible Meta-Algorithmics

Many fields of computational science advance by improving the algorithms for solving key problems, such as SAT solving or binary classification. When comparing these algorithms, they are usually ranked based on some performance metric, with the algorithm that performs best over a set of problem instances getting the highest score. In other words, the generalist is considered to be the best algorithm.


Meta-algorithmic techniques, such as automated algorithm selection, are able to provide great performance improvements over such a generalist approach. They do this by taking advantage of the complementary strengths of different algorithms, or in the case of algorithm configuration, of configurable or tunable algorithm components. Algorithms that are specially selected or configured for a specific subset of problem instances usually perform better, on this subset, than a generalist algorithm.

However, meta-algorithms are complex and difficult to use correctly. At the same time, when using them incorrectly, you may not benefit from their full potential, or in the extreme case, you may even obtain worse performance than without using them.


The Sparkle platform aims to make meta-algorithms more accessible to novice users. To achieve this, Sparkle implements standard protocols for algorithm selection and algorithm configuration that support their correct use. After an experiment, the results, as well as the problem instances, algorithms and protocol used, are presented in an automatically generated LaTeX/PDF document, parts of which can be easily copied and pasted into scientific papers.

As an example, we outline below how algorithm selection can be used in Sparkle. Once the platform is initialised, the components that we need (problem instances, solvers and a feature extractor) are added to Sparkle. Using these, the feature values for each instance, and the performance values of each solver on each instance can be computed. With those values a portfolio selector is constructed, which can be used to automatically select a solver for new problem instances. Finally, we generate a report that describes which components were used, which process Sparkle used to construct a portfolio selector with those components, and how large the marginal contribution of each solver is to this portfolio selector. --run-solver-later --run-extractor-later PTN/ --run-solver-later --deterministic 0 PbO-CCSAT-Generic/ --run-solver-later --deterministic 0 CSCCSat/ --run-solver-later --deterministic 0 MiniSAT/ --run-extractor-later SAT-features-competition2012_revised_without_SatELite_sparkle/


With the first version of Sparkle ready, we are working to improve it further by making things easier to use, and providing more meta-algorithmic techniques. For this we are supported by the software lab at LIACS, headed by Prof. Joost Visser. From this lab, Jérémie Gobeil helps to both improve Sparkle itself, as well as the development processes. Additionally, several students, including Jeroen Rook and Richard Middelkoop, also put work into Sparkle as part of their studies.


You can take a look, explore and try the Sparkle platform by getting it from the bitbucket repository at


AutoML for Dynamic Recommender Systems

What can we do when the performance of an AI system, such as a Recommender System, degrades due to changes in the data that it is processing? This is the central question we seek to answer in our recent project: “Hyper-Parameter Optimization for Latent Spaces in Dynamic Recommender Systems”

When an AI system is deployed, it is usually assumed that the data the system is processing is similar to the data it has been trained on. However, there are many scenarios in which this assumption is likely to be violated. Think, for example, of a recommendation system. These are systems designed to predict user preferences for items, such as movies, songs or books, and are commonly used by technology companies like Amazon, Netflix or Spotify to recommend new items or content to their users. These systems are a good example to illustrate the relevance of the problem studied in this work: User preferences are based on their taste and we all know that taste most definitely changes over time, just as the collection of items is constantly updated, with new items added or old items removed. Overall, we are dealing with a dynamic, rather than a static system, which means we have to rethink the way a learning algorithm is deployed.

Overview of the proposed method. The Nelder-Mead algorithm produces a set of configurations, from which the best one is selected and deployed.

Recommender systems are often based on collaborative filtering algorithms, and these algorithms have associated hyper-parameters, such as embedding size or learning rate, affecting the behavior and performance of the system. When the incoming data changes, for example because the collection of items is updated, there might be a new, unknown set of hyper-parameters that works more effectively on this new data than the previous setting. Our approach therefore involves i) detecting when system performance degrades (concept drift) and ii) initialise re-configuration of the matrix factorization algorithm. For this, we employ an adaption of the Nelder-Mead algorithm, which uses a set of heuristics to produce a number of potentially good configurations, from which the best one is selected and deployed.

To optimally control the characteristics of the stream, we also built a data generator, which produces realistic streams of preference data and injects drift into the data by means of expanding or shrinking the latent feature space or, in other words, change the user preferences.

Results on synthetic data stream. The y-axis shows RMSE, whereas the x-axis shows the number of samples processed.

We consider multiple baselines, including an online adaption of the well-known configurator SMAC as well as a carefully tuned static model, and yield promising results. In future work, we want to apply this method to situations in which the parameter space is more complex, for example, when using deep learning models requiring multiple components and specific tuning of the underlying components, such as convolutional architectures, recurrent layers or even attention models based on transformers.

This project is joint work by Matthias König and Holger H. Hoos (ADA group at LIACS, Leiden University), together with Bruno Veloso (Portucalense University, Portugal), Luciano Caroprese (ICAR-CNR, Italy), Sónia Teixeira (University of Porto, Portugal & LIAAD-INESC TEC, Portugal), Giuseppe Manco (ICAR-CNR, Italy) and João Gama (University of Porto, Portugal & LIAAD-INESC TEC, Portugal). It has been executed in the context of the HumanE-AI-Net, one of the two ICT-48 Networks of Centres of Excellence recently established by the European commission, in which ADA plays a key role. The accompanying paper will be presented at ECML PKDD 2021 and can be found here, along with the code and other materials.

AutoML for Earth Observation data

In March of 2020, I joined the ADA group to work on my Introductory research project, as part of my master program in Computer Science, under the supervision of Dr. Mitra Baratchi and Dr. Jan van Rijn. As my research topic was related to satellite data, I had the opportunity to collaborate with Dr. Andreas Vollrath, who was working at the ESA Phi-lab. My project was titled “Building Deep Learning Models for Remote Sensing Applications Through Automated Machine Learning”. Based on the results of our assumption test and preliminary experiments, we presented an online poster for the ESA Phi-week 2020. The presentation video can be watched here. We demonstrated the potential of applying the most recent advances in AutoML, specifically hyperparameter optimisation, to a remote sensing dataset. Starting from that, we proposed to build an AutoML system focused on Earth Observation data. This project proposal led to my thesis work “Automated Machine Learning for Satellite Data: Integrating Remote Sensing Pre-trained Models into AutoML Systems”.

Cover image showing AutoML being used for satellite data and representing the collaboration between Earth Observation and Machine Learning fields to achieve the best results.

Application areas that can benefit from machine learning models extracted from Satellite data are numerous. Environmental mappings, crops monitoring, urban planning and emergency response are some examples. However, in order to leverage the latest advancements in machine learning, much expert knowledge and hands-on expertise is required. Automated Machine Learning (AutoML) aims to automate the different stages of a machine learning pipeline, making crucial decisions in a data-driven and objective way. To reach this goal, AutoML systems have been developed. 

Open source AutoML systems are usually benchmarked with natural image datasets. Satellite images differ from natural images in various aspects. One of the most important differences is related to the number and type of spectral bands composing the image. Natural colour images have three bands (red, green & blue), but in satellite images, more spectral bands are available. Therefore, the input data can be composed of more and different types of spectral bands (a common practice in real-world applications). These differences made us wonder how the current AutoML systems work for satellite datasets. To come up with an answer, the first part of our project focused on testing the image classification task available in the AutoML system of AutoKeras on a varied remote sensing benchmark suite of 7 datasets that we compiled by reviewing the remote sensing literature. We found good performance in almost all of these datasets, with impressive results for the RGB version of two of these datasets (EuroSAT and So2Sat). However, we did not achieve better results than manually designed models for more real-world application datasets. This showed the importance of customising these systems to achieve better performance on remote sensing data.

Furthermore, as acquiring ground truth label data is quite expensive, a problem in the field of remote sensing is designing high performing models in the lack of large labelled datasets. The use of pre-trained models has given promising results to address this challenge. As a next step, we added remote sensing pre-trained models to AutoKeras. This helped to improve the accuracy on non-RGB datasets further. In our experiments, we compared 3 AutoML approaches (initializing architectures (i) without pre-training, (ii) pre-trained on Imagenet and (iii) pre-trained on remote sensing datasets) against manually designed architectures on our benchmark set. AutoML architectures outperformed the manually designed architectures in 5 of these datasets. We showed that the best AutoML approach depends on the properties of the dataset at hand, and these results further highlighted the importance of having customised Earth Observation AutoML systems. To know more about this project, please take a look at the project page.

By using AutoML methods to reduce the gap between state-of-the-art machine learning research and applied machine learning, the use of advanced AI for remote sensing applications can be more accessible. One of my personal goals is to keep updating and documenting our project repository so people of different backgrounds can use it.
We believe that making this benchmark publicly available will enable the community to further experiment with relevant remote sensing datasets and expand the AutoML systems for different application goals.

Gilles Ottervanger finishes successful Master’s project at ADA

Last month, I successfully defended my Master’s thesis titled “MultiETSC: Automated Machine Learning for Time Series Classification”. I have been working on my MultiETSC project for a little over a year as a part of the ADA group, supervised by Can Wang, Mitra Baratchi and Holger H. Hoos. In this project, we developed an AutoML approach to the problem of early time series classification (ETSC).

ETSC is a problem that is relevant for time-critical applications of time series classifications. It arises from the desire to get classification output before the full time series is observed. ETSC requires a classifier to decide at what point to classify based on the partially observed time series. This decision trades off the earliness of the classification for the accuracy of the classification. Therefore, any ETSC algorithm is faced with two competing objectives: earliness and accuracy.

For MutliETSC, we took eight existing and one naive self-produced ETSC algorithms and partly rewrote them to share a common interface. On this set of algorithms, we applied the MO-ParamILS configurator, which can optimise multiple objectives simultaneously. MultiETSC produces a set of configurations that give multiple options for trading off earliness against accuracy, while non of these configurations is strictly better than the other. This allows a user to pick a configuration that fits the ETSC problem at hand.

In our research, we show that leaving out either the algorithm selection or the multi-objective optimisation parts will both result in inferior results. This means that both these aspects significantly contribute to the quality of MultiETSC’s output.

If you want to know more about this project, have a look at the project page.

Finishing this project, sadly means I will be leaving the ADA group, but I am happy to have had the opportunity to work in such a great environment with such smart and motivated people as in the ADA group.

Another familiar student rejoins the ADA group for his PhD

It would seem that former Master student Laurens Arp will not be leaving the ADA group just yet, as he has just renewed his membership for the group, this time as a PhD student.

Prior to starting his PhD studies, Laurens worked on his Master’s Thesis on spatial interpolation within the ADA group supervised by dr. Mitra Baratchi and prof.dr. Holger Hoos, culminating in his MSc degree obtained in 2020.

The research Laurens will be carrying out now is part of dr. Mitra Baratchi’s “Physics-aware Spatio-temporal Machine Learning for Earth Observation Data” project, which involves a collaboration with the European Space Agency. In this project, the goal is to estimate environmental parameters (leaf density, CO2 levels in the atmosphere, etc.) from optical Earth observation data acquired by satellites. Generally speaking, there are two approaches one might take to this end. The first of those, traditionally favoured by the ecology community, would be to create inverted versions of deterministic radiative transfer models, which model how radiation (captured in Earth observation data) is affected by environmental parameters (ground truth). The other approach, perhaps more familiar to the AI and machine learning communities, would be to use spatio-temporal machine learning methods (e.g., temporal CNNs) to model the correlation between data (spectral images of radiation) and ground truth (environmental parameters). Both approaches have their own strengths and weaknesses, which motivates the development of hybrid models benefiting from the strengths of both. Thus, the creation of such hybrid models is what the project will aim to achieve.

Between global warming, the loss of biodiversity through mass extinction and other ecological and environmental perils the Earth is faced with these days, the challenges we, as residents of the Earth, are faced with can be daunting indeed. It is our hope that the models we will create will provide a meaningful contribution to the global efforts to address these issues.

The ADA research group welcomes Richard Middelkoop as a new Bachelor student

Besides working in the ADA research group, Richard studies Computer Science & Economics where his main interest lies in being able to apply codes to real-life applications. Therefore, he found the ADA group an excellent environment to work on his bachelor project.

Richards’ project is concerned with Parallel Algorithm Portfolios, and adding this new functionality to the Sparkle platform. The problems which can be solved by the functionality will encompass decision problems, optimisation problems with a single solution and optimisation problems with several possible solution. All being well,
the project will showcase a practical application of Parallel Portfolios.

AutoML adoption in software engineering for machine learning

This blog post has originally been published on

By Koen van der Blom, Holger Hoos, Alex Serban, Joost Visser

In our global survey among teams that build ML applications, we found ample room for increased adoption of AutoML techniques. While AutoML is adopted at least partially by more than 70% of teams in research labs and tech companies, for teams in non-tech and government organisations adoption is still below 50%. Full adoption remains limited to under 10% in research and tech, and to less than 5% in non-tech and government.

Software engineering for machine learning

With the inclusion of ML techniques in software, the risks and requirements of the software also changes. In turn, the best ways to maintain and develop software with ML components is different from traditional software engineering (TSE). We call this new field software engineering for machine learning (SE4ML).

Based on a study of the academic and “grey” literature (the latter comprises blog articles, presentation slides and white papers), we identified best practices for SE4ML, and composed a recommended reading list on the topic. These practices include the use of AutoML techniques to improve the use of ML components in software. All practices are described in our practice catalogue. We encourage readers to have a look and to send us their suggestions for additions.

Figure 1: Engineering best practices for machine learning.

Having a solid overview of the best practices, we set up a survey to find out the extent to which these SE4ML practices have already been adopted. We asked teams working on software with machine learning components to which degree they followed each practice in their work. In the resulting data we saw that tech companies have the highest adoption of the new ML related best practices, while research labs have the highest adoption of AutoML. Overall, more practices are adopted by teams that are larger and more experienced, with practices originating from TSE seeing slightly lower adoption than the new ML specific practices.

Effects of best practice adoption

Besides the adoption of best practices, we also investigated the effects of the practices. This resulted in insights into which specific practices relate to which desired outcomes. For instance, we found that software quality is influenced most by continuous integration, automated regression tests, and static analysis. On the other hand, agility is primarily affected by automated model deployment, having a cohesive- multi-disciplinary team, and enabling parallel training experiments. With these insights into the effectiveness of different practices, we hope to increase practice adoption and improve the quality of software with ML components. 

A brief overview of additional survey results can be found in our piece titled “The 2020 State of Engineering Practices for Machine Learning”, and in our technical publication “Adoption and Effects of Software Engineering Best Practices in Machine Learning”.

How about automated ML?

A recent line of research advancements in ML focuses on automated machine learning (AutoML), an area that aims to automate (parts of) the construction and use of ML pipelines to enable a wider audience to make effective and responsible use of ML, without needing to become an expert in the field. We took a closer look at our survey results and found that, compared to non-tech companies and governmental organisations, research labs and tech companies are ahead in the adoption of AutoML practices (see Figure 2).

Figure 2: AutoML adoption by organisation type.

While overall AutoML adoption is similar across continents, non-profit and low-tech organisations see higher adoption in Europe than in North America. We also found that teams with multiple years of experience are more likely to adopt AutoML techniques. Finally, across the board, there is significant room to increase adoption of AutoML, but this is especially true for non-tech companies and governmental organisations.

More detailed information on our findings regarding AutoML adoption can be found in our piece titled “The 2020 State of AutoML Adoption”.

Looking forward

Based on feedback from and interviews with participants, we recently revised our survey to learn more. Specifically, in our latest version of the survey, we ask in more detail about responsible use of ML and about the adoption of different AutoML techniques such as automated feature selection and neural architecture search. You can help us by taking the 10-minute survey and by spreading the word. If you want to stay up to date with our progress, keep an eye on our website.

PhD Student Yasha Pushak and Professor Holger Hoos Win Second Award for Ground-Breaking Research on Automated Algorithm Configuration

Just two years after receiving the best paper award at the 15th International Conference on Parallel Problem Solving from Nature (PPSN 2018), PhD student and Vanier Scholar Yasha Pushak and Professor Holger Hoos received another best paper award at the 22nd international Genetic and Evolutionary Computation Conference (GECCO 2020) for a line of research on automated algorithm configuration. Their second paper, titled “Golden Parameter Search: Exploiting Structure to Quickly Configure Parameters in Parallel” was the single winner of the ECOM Track at GECCO 2020, determined by voting of the audience.

Important and challenging computation problems are typically solved through the use of highly parameterized algorithms. Finding high-quality parameter settings for the algorithm can often improve their performance by several orders of magnitude. In the last two decades, automated algorithm configuration has emerged as a hot topic of research.

State of the art algorithm configurators rely on powerful search heuristics to explore the space of parameter configurations; however, they are typically quite costly, often requiring days or even weeks to find high-quality parameter configurations. These methods assume that finding high-quality parameter configurations is challenging. Consider, for example, the problem of trying to climb to the highest point of a rugged mountain with a large number of peaks in a dense fog: If you always climb up the steepest hill, you will likely get stuck at a peak that is far from the highest on the mountain. To avoid this, most configurators spend time exploring the entire “landscapes” of the algorithm configuration problem — including regions that don’t initially appear promising, just to be certain they haven’t missed anything. 

In their 2018 PPSN award-winning paper, “Algorithm Configuration Landscapes: More Benign than Expected?” Yasha Pushak and Holger Hoos hypothesized that the landscapes induced by algorithm configuration problems are likely much simpler and more structured than previously assumed. By way of analogy, consider the task of choosing a temperature at which to bake a cake: If the temperature is too high the cake will burn, but if it is too low the cake won’t bake. Therefore, finding the optimal temperature simply corresponds to trading off between “too high” and “too low”. Yasha Pushak and Holger Hoos developed a statistical procedure to test for similar structure in the response of algorithm parameters. They applied their procedure to a large set of parameters and were unable to reject their hypotheses in 99.5% of the cases they studied. Read more about algorithm configuration landscapes here.

Capitalizing on these insights, Yasha Pushak and Holger Hoos developed a new automated algorithm configuration procedure: Golden Parameter Search (GPS). GPS is the first of a new generation of automated algorithm configuration procedures designed to exploit structural properties of algorithm configuration landscapes. In their 2020 GECCO award-winning paper, they showed that GPS was often capable of finding similar-or better-quality parameter configurations using a fraction of the computational budget required by several other long-standing, state-of-the-art algorithm configuration procedures. Read more about GPS here.

Laurens Arp presents his work titled “Dynamic macro-scale traffic flow optimisation using crowd-sourced urban movement data” at IEEEMDM 2020

Traffic congestion tends to be bad for the environment, the economy, and above all: the drivers’ moods. As such, it is a worthwhile cause to pursue improvements for; in particular, being computer- and data scientists, using data-driven methods to try to alleviate this problem seemed a particularly exciting approach. This is just what we (Laurens Arp, Dyon van Vreumingen, Daniela Gahwens and Mitra Baratchi) published a paper for in the IEEE MDM 2020 proceedings. The project, which started as a course project for Mitra’s Urban Computing course at Leiden University, evolved first into a short paper submission to the Netmob Future Cities Challenge (FCC), and subsequently into a full paper submission to the MDM conference.

The main idea of the method proposed was to redistribute traffic by imposing external costs (or rewards) to specific road segments. We used a movement dataset for Tokyo, provided by Foursquare for the FCC, from which we could derive which proportion of drivers would want to go to and from specific parts of the city. By combining this with a macro-scale traffic flow model (Greenshields), we were able to compute the occupancy of specific roads and the total travel time this resulted in. This model could then be used as an objective function for black-box optimisation; we would optimise the cost parameters of road segments so that the optimal number of desired routes got redirected to alternative roads such that the overall traffic time was minimised.

The amount of improvement (under the Greenshields model) we were able to achieve was highly dependent on the number of cars we estimated would be on the road at the same time. The best improvement achieved was 13.15% for a little over 13 500 cars (925 hours), and the worst was 1.35% for just under 113 000 cars (437 hours). Interestingly, even the relatively modest improvements, occurring for large amounts of cars, could still be meaningful, because there are more drivers to benefit from them. We also added a few fairness analyses to the paper, the results of which did not seem to indicate any unfair disadvantage to individual drivers. You can find the pre-recorded conference presentation here.

We hope that our paper will be able to contribute to the perpetually on-going efforts worldwide of causing less damage to the environment, boosting the economy, and perhaps also helping some drivers’ moods.

The ADA research group welcomes Bram Renting as a group member

The ADA research group welcomes Bram Renting as a new PhD student.

Bram joined ADA in July 2020 as a PhD student under the supervision of Prof. dr. Holger Hoos and Prof. dr. Catholijn Jonker. He holds a master’s degree in computer science from TU Delft and has subsequently spent a year in the industry as a data scientist. He works at the ADA research group through the Hybrid Intelligence Centre consortium, which is a Dutch consortium that pushes for the development of human-AI collaboration.

Bram will research multi-agent systems where agents both have to cooperate and have individual preferences, so-called mixed-motive problems. His focus lies on scenarios where consensus between agents is required before actions are taken. Bargaining methods are used to reach this consensus. He starts with a focus on decentralised scheduling problems, which has applications in real life, such as calendar scheduling. The goal is to enhance human performance in these jobs through AI.