Anna Latour promoted to doctor

On Tuesday 13 September 2022, Anna Latour successfully defended her dissertation in the Great Auditorium of Leiden’s beautiful (and ancient) Academy Building.

Anna’s dissertation, titled “Optimal decision-making under constraints and uncertainty”, was formally approved by her doctorate committee in February. Now, finally, was her time to defend it to a committee of opponents. She is the first of prof. Hoos’s doctorate students from ADA Leiden to receive her title.

The Great Auditorium of the Academy Building in Leiden.

In true Leiden tradition, the ceremony was solemn, but festive. Anna’s opponents (prof. Plaat, dr. Van Leeuwen, prof. Kersting, prof. Stuckey, prof. Bonsangue, and prof. Kleijn) questioned her on the semantics and pros and cons of different probabilistic languages, deep learning, the interpretation of probabilities in logic, the choking hazards presented by printed copies of dissertations, adversarial settings in stochastic constraint (optimisation) problems, and the use of gradient methods in constraint solving. This resulted in a good discussion, characterised by mutual respect and a sense of humour. In good COVID practice, the ceremony was hybrid, with two opponents and even a paranimph participating virtually from Germany, Australia and Canada, and with all in-person attendees wearing masks. Anna’s promotores, prof. Hoos and prof. Kok, as well as her co-promotor dr. Nijssen were present in person.

Still from the livestream.
Anna with the certificate declaring that she has the rights to the title of “doctor”.

The discussion was ended by a loud “Hora Est!” from the Beadle, after which the committee left the room for deliberation. Upon their return, it was prof. Hoos’s duty to speak the “magic formula” that promoted Anna to doctor in Mathematics and the Natural Sciences. This has to be done in Dutch, and since it was his first time promoting a student to doctor in the Netherlands, he was somewhat nervous. He did well, and Anna received her certificate (with a giant seal!) from professor Kleijn, the doctorate committee’s secretary.

Anna’s co-promotor, dr. Nijssen, then praised her in his laudatio for being a scientist with many talents, stating that she has proved not only her scientific competence by earning her doctorate, but also demonstrated her proficiency at writing, presenting and teaching, by winning awards and scholarships for all of those skills during her time as a PhD candidate.

Following the ceremony, we had a reception in the beautiful atrium of the Academy Building, with a view of the spiral staircase, surrounded by the busts of dead scientists. Afterwards, the freshly minted doctor, her (co-)promotores, her opponents and her paranimph went for an Italian lunch and a chat about the future.

The ancient Romans held a feast in honour of the gods Jupiter, Juno and Minerva (Epulum Jovis) on the 13th of September. We think it is only fitting that, on a day that celebrates the Goddess of Knowledge (who adorns Leiden University’s logo), Leiden University gains another female doctor, from a research group named after the Godmother of Programming. Gaudeamus Igitur!

The next step in Anna’s career is a postdoctoral position as Research Fellow in prof. Meel’s group at the School of Computing of the National University of Singapore. She is working on problems in the field of “Beyond NP”, focusing on (optimisation versions of) Boolean satisfiability, and counting. You can follow her and her career through her website.

You can download dr. Latour’s dissertation from the Leiden University dissertation repository.

The ADA research group welcomes Julia Wąsala

Julia recently joined the ADA group as a PhD candidate after completing her master’s thesis in the group and graduating cum laude. Before starting her master Computer Science: Artificial Intelligence at Leiden University, she completed a bachelor’s degree in Astronomy at Leiden. 

In her master’s she specialised in Automated Machine Learning for Earth Observation. She developed a neural architecture search framework for super-resolution for Earth Observation images. Super-resolution is a technique used to increase the resolution of images. Increasing the resolution of satellite imagery can help achieve better results on a variety of tasks that are relevant to applications such as land and forestry management and disaster relief.

During her PhD, she will continue to work on the topic of Automated Machine Learning for Earth Observation, in collaboration with ESA Phi-Lab and SRON. The manual design of Earth Observation analysis pipelines is a time-consuming process. Developing automated tools can both save time, as well as make tools like Deep Learning more accessible to non-domain experts. 

Julia will be supervised by LIACS researchers dr. Mitra Baratchi and prof. dr. Holger Hoos, as well as prof. dr. Ilse Aben dr. Bram Maasakkers from SRON, and dr. Rochelle Schneider from ESA-ESRIN.

The ADA research group welcomes Annelot Bosman as a group member

Annelot recently joined the ADA group as a PhD candidate. Before coming to Leiden she completed both her bachelor’s and master’s in Econometrics at the Erasmus University Rotterdam. During her studies, she specialised in Operations Research and Quantitative Logistics.

Her PhD topic focusses on (complete) robustness verification of deep neural networks, which involves MIP-based verification. The topic is especially important as neural networks are used more and more in practical applications.  Verifying the safety of Neural Networks is an important step in being able to trust the decisions they make under any arbitrary circumstance.

Annelot will be supervised by both prof.dr. Holger Hoos and dr. Jan van Rijn.  Holger Hoos is part of the recently established Alexander-von-Humboldt Chair in AI Methodology (AIM) at RWTH Aachen University, which is an integral part of the multi-institutional ADA Research Group.

Exact stochastic constraint optimisation with applications in network analysis

In business, governance, science as well as in our daily lives, we often have to solve problems that involve optimal decision-making under constraints (limitations on the choices that we can make) and uncertainty (probabilistic knowledge about the future).

There are many stochastic constraint (optimisation) problems (SCPs) that arise from probabilistic networks. These are networks where we associate a probability with the nodes and/or edges of the network.

Going viral

One of those problems is the spread of influence (or viral marketing) problem. We are given a social network with probabilistic influence relationships. Suppose Alexa has a probability of 0.8 to convince Claire to do something. We can model this with a node that symbolises Alexa and a node that symbolises Claire, and an edge pointing from Alexa to Claire, labelled with probability 0.8.

In this network, we assume all influence relationships to be symmetric. Hence, there is an edge with label 0.8 pointing from node a (Alexa) to node c (Claire), and the same arrow pointing back, resulting in one undirected edge connecting these two nodes.

Suppose now that we are a company planning a marketing campaign to promote our product. We want to use a word-of-mouth strategy on the probabilistic social network that we have of our potential customers. The way to start the process is to give free samples of our product to certain people in the network, and hope that they like it and convince their friends to buy the product, who then convince their friends, and so on, until our product goes viral. We have a limited budget for this (the constraint), and want to maximise the expected number of people who will eventually buy our product (the optimisation criterion). Which people in the network should get a free sample of our product?

Alternatively, we can require that the expected number of people who become customers exceeds a certain threshold (the constraint), while we aim to minimise the number of free samples that are distributed (the optimisation criterion). In practice, we can solve a stochastic constraint optimisation problem with a stochastic optimisation criterion and a budget constraint by formulating it as a stochastic constraint satisfaction problem (CSP) with a stochastic constraint and budget constraint. We then start the solving process with a very low threshold for the stochastic constraint (e.g., 0), and increase it each time we find a solution to the CSP. Then, we continue the search for a better solution until no better solution can be found. By then, the last solution we have found is an optimum solution.

Finding an optimum solution is hard

It is the stochastic constraint mentioned above that makes it hard to find a solution to SCPs. Given a strategy (a choice for which people receive a free sample), computing the probability that a person will become one of our customers is expensive. It involves summing probabilities over many, often partially overlapping, paths in the network. We then have to do this for all people in the network (to get the expected number of customers), and potentially have to repeat this for all possible strategies, in order to find a strategy that maximises the expected number of customers (by repeatedly finding an expected number of customers that exceeds an ever-increasing threshold of the stochastic constraint).

Joining forces

To find a solution to the problem, we therefore need two things: a way to quickly reason about probabilities, and a way to traverse the search space of strategies intelligently. We can combine the technique of knowledge compilation with the paradigm of constraint programming to create data structures (specifically, decision diagrams) that capture information about probability distributions, such that we can quickly compute probabilities, and to create constraint propagation algorithms on those data structures, which prune the search space.

Dedicated constraint propagation yields faster solving

A constraint propagation algorithm helps to prune the search space by removing values from the domains of variables. In particular, it ensures that variable domains contain no values that would falsify the constraint(s) in which the variables are present. Comparing different approaches that combine knowledge compilation and constraint programming, we find that having a dedicated constraint propagation algorithm for stochastic constraints tends to yield shorter solving times than approaches that decompose stochastic constraints into small, linear constraints (for which propagators already exist). We attribute this to the fact that in the decomposition process, we lose information about the relationship between different variables that participate in the stochastic constraint. Hence, the pruning power of the propagation methods is less, and the solver cannot prune certain parts of the search space that do not contain any solutions.

Paper

You can learn more about different ways of combining knowledge compilation and constraint programming in our paper “Exact stochastic constraint optimisation with applications in network analysis”, which was published in Artificial Intelligence, vol 304, March 2022.
Authors: Anna L.D. Latour, Behrouz Babaki, Daniël Fokkinga, Marie Anastacio, Holger H. Hoos, Siegfried Nijssen.
Link: doi.org/10.1016/j.artint.2021.103650

The ADA research group welcomes Mike Huisman as a group member

Author: Mike Huisman

After obtaining his BSc. in Artificial Intelligence and MSc. in Computer Science, Mike Huisman joined LIACS in March 2021 as a teaching Ph.D. candidate. He is part of the Reinforcement Learning Group at LIACS and has also joined the ADA group in November 2021.

His research is in the field of deep meta-learning, where the goal is to learn an efficient deep learning algorithm that can learn new tasks quickly (from a limited amount of data). This can democratize the use of deep learning by lowering the threshold for using powerful deep learning technologies in terms of required compute resources and data.   He conducts his research under the supervision of Prof. Dr. A. Plaat and Dr. J.N. van Rijn.

Thesis: Parallel Algorithm Portfolios in Sparkle

Author: Richard Middelkoop

On 29 July, I defended my bachelor thesis entitled “Parallel Algorithm Portfolios in Sparkle” after half a year of work following the project start in February 2021. For this project I was part of the ADA group and was supervised by Koen van der Blom and Holger Hoos.

In a parallel portfolio, a set of independent, non-communicating solvers solve problem instances in parallel. Within the thesis, insight is given into the design and implementation of parallel portfolios into Sparkle. Additionally, experiments show the validity of the implementation and the practical capabilities of parallel portfolios as implemented in Sparkle.

My work built on the existing platform created by the Sparkle development team. The thesis is of value to the platform since the state of the art is represented by a set of algorithms, and parallel algorithm portfolios allow for an easily accessible method to conduct experiments with such portfolios.

Below an example is outlined showing how parallel algorithm portfolios can be used in Sparkle. Once the platform is initialized, the needed components (problem instances, solvers) are added to Sparkle. Using these the portfolio can be constructed. With the constructed portfolio and a selection of instances, an experiment can be conducted. Finally, we generate a report that describes the used components and an empirical evaluation of the results from the experiment.

initialise.py
add_instances.py --run-solver-later --run-extractor-later PTN/
add_solver.py --run-solver-later --deterministic 0 CSCCSat/
add_solver.py --run-solver-later --deterministic 0 MiniSAT/
add_solver.py --run-solver-later --deterministic 0 PbO-CCSAT-Generic/
construct_sparkle_parallel_portfolio.py --nickname runtime_experiment
run_sparkle_parallel_portfolio.py --instance-paths Instances/PTN/ --portfolio-name runtime_experiment
generate_report.py

First results in the thesis suggest that parallelising up to 64 instances of a nondeterministic solver with different random seeds results in performance gains compared to lower numbers of instances. When using 128 solver instances performance is similar to 64 solver instances, and there no longer seems to be a benefit to using more solver instances. With 256 or more solver instances the overhead appears to increase and performance starts to drop again in this practical implementation.

If you are interested in the project, feel free to take a look at the bitbucket project page and try Sparkle for yourself.

AutoML adoption is low, what is stopping people?

Machine learning (ML) is used more and more in a wide range of applications, but there are not enough machine learning experts to properly support this growth. With automated machine learning (AutoML) the goal is to make ML easier to use. As a result, experts should be able to deploy more ML systems, and less expertise would be needed to work with AutoML than when working with ML directly.

AutoML adoption

For AutoML to have this effect, however, it first needs to be adopted in practice. To study adoption levels in practice, we held a survey among teams that engineer software with ML components. As it turns out, adoption is not actually very high. In the figure below, we can see that, depending on the AutoML technique, more that 20 or even 30% of the respondents did not adopt AutoML at all. In addition, another 50 to 60% do not adopt AutoML techniques completely.

Adoption per technique. The first bar shows adoption of the feature technique including ‘Implicit, e.g. deep learning’ answers, while in the second bar they are excluded.

What holds back adoption?

Although we were able to assess the adoption levels through the survey, it does not tell us why people do or do not adopt these AutoML techniques. To find out, we started holding interviews, and already gained some insights based on two initial interviews.

The people we talked to raised a number of usability concerns. Firstly, there is a perception that AutoML techniques are hard to use and will require a significant initial investment to learn how to use. Secondly, someone indicated that in practice AutoML systems sometimes fail to work on the used data, but do not always make clear what the problem is. Thirdly, a participant raised that it is difficult to predict a good run length for AutoML systems. Related to this, there were also concerns about the required computational resources to use AutoML.

Looking forward

Clearly, there is room to improve AutoML adoption, and there are still some issues to resolve that could help to increase adoption. To get a more crisp view of which issues are common, and which are more atypical we plan to hold more interviews. In addition to learning about common issues, we also hope to find out whether there are differences between organisation types. Government organisations dealing with citizen data may, for instance, be more hesitant to adopt AutoML for privacy or bias sensitive applications.

Paper

If you are interested in the details you can learn more by reading our paper “AutoML Adoption in ML Software“ which was published in the AutoML workshop at ICML 2021.
Authors: Koen van der Blom, Alex Serban, Holger Hoos, Joost Visser.
Link: https://ada.liacs.nl/papers/BloEtAl21.pdf

Sparkle: Accessible Meta-Algorithmics

Many fields of computational science advance by improving the algorithms for solving key problems, such as SAT solving or binary classification. When comparing these algorithms, they are usually ranked based on some performance metric, with the algorithm that performs best over a set of problem instances getting the highest score. In other words, the generalist is considered to be the best algorithm.

Meta-algorithms

Meta-algorithmic techniques, such as automated algorithm selection, are able to provide great performance improvements over such a generalist approach. They do this by taking advantage of the complementary strengths of different algorithms, or in the case of algorithm configuration, of configurable or tunable algorithm components. Algorithms that are specially selected or configured for a specific subset of problem instances usually perform better, on this subset, than a generalist algorithm.

However, meta-algorithms are complex and difficult to use correctly. At the same time, when using them incorrectly, you may not benefit from their full potential, or in the extreme case, you may even obtain worse performance than without using them.

Sparkle

The Sparkle platform aims to make meta-algorithms more accessible to novice users. To achieve this, Sparkle implements standard protocols for algorithm selection and algorithm configuration that support their correct use. After an experiment, the results, as well as the problem instances, algorithms and protocol used, are presented in an automatically generated LaTeX/PDF document, parts of which can be easily copied and pasted into scientific papers.

As an example, we outline below how algorithm selection can be used in Sparkle. Once the platform is initialised, the components that we need (problem instances, solvers and a feature extractor) are added to Sparkle. Using these, the feature values for each instance, and the performance values of each solver on each instance can be computed. With those values a portfolio selector is constructed, which can be used to automatically select a solver for new problem instances. Finally, we generate a report that describes which components were used, which process Sparkle used to construct a portfolio selector with those components, and how large the marginal contribution of each solver is to this portfolio selector.

initialise.py
add_instances.py --run-solver-later --run-extractor-later PTN/
add_solver.py --run-solver-later --deterministic 0 PbO-CCSAT-Generic/
add_solver.py --run-solver-later --deterministic 0 CSCCSat/
add_solver.py --run-solver-later --deterministic 0 MiniSAT/
add_feature_extractor.py --run-extractor-later SAT-features-competition2012_revised_without_SatELite_sparkle/
compute_features.py
run_solvers.py
construct_sparkle_portfolio_selector.py
generate_report.py

Team

With the first version of Sparkle ready, we are working to improve it further by making things easier to use, and providing more meta-algorithmic techniques. For this we are supported by the software lab at LIACS, headed by Prof. Joost Visser. From this lab, Jérémie Gobeil helps to both improve Sparkle itself, as well as the development processes. Additionally, several students, including Jeroen Rook and Richard Middelkoop, also put work into Sparkle as part of their studies.

Interested?

You can take a look, explore and try the Sparkle platform by getting it from the bitbucket repository at  https://bitbucket.org/sparkle-ai/sparkle/.

AutoML for Dynamic Recommender Systems

What can we do when the performance of an AI system, such as a Recommender System, degrades due to changes in the data that it is processing? This is the central question we seek to answer in our recent project: “Hyper-Parameter Optimization for Latent Spaces in Dynamic Recommender Systems”

When an AI system is deployed, it is usually assumed that the data the system is processing is similar to the data it has been trained on. However, there are many scenarios in which this assumption is likely to be violated. Think, for example, of a recommendation system. These are systems designed to predict user preferences for items, such as movies, songs or books, and are commonly used by technology companies like Amazon, Netflix or Spotify to recommend new items or content to their users. These systems are a good example to illustrate the relevance of the problem studied in this work: User preferences are based on their taste and we all know that taste most definitely changes over time, just as the collection of items is constantly updated, with new items added or old items removed. Overall, we are dealing with a dynamic, rather than a static system, which means we have to rethink the way a learning algorithm is deployed.

Overview of the proposed method. The Nelder-Mead algorithm produces a set of configurations, from which the best one is selected and deployed.

Recommender systems are often based on collaborative filtering algorithms, and these algorithms have associated hyper-parameters, such as embedding size or learning rate, affecting the behavior and performance of the system. When the incoming data changes, for example because the collection of items is updated, there might be a new, unknown set of hyper-parameters that works more effectively on this new data than the previous setting. Our approach therefore involves i) detecting when system performance degrades (concept drift) and ii) initialise re-configuration of the matrix factorization algorithm. For this, we employ an adaption of the Nelder-Mead algorithm, which uses a set of heuristics to produce a number of potentially good configurations, from which the best one is selected and deployed.

To optimally control the characteristics of the stream, we also built a data generator, which produces realistic streams of preference data and injects drift into the data by means of expanding or shrinking the latent feature space or, in other words, change the user preferences.

Results on synthetic data stream. The y-axis shows RMSE, whereas the x-axis shows the number of samples processed.

We consider multiple baselines, including an online adaption of the well-known configurator SMAC as well as a carefully tuned static model, and yield promising results. In future work, we want to apply this method to situations in which the parameter space is more complex, for example, when using deep learning models requiring multiple components and specific tuning of the underlying components, such as convolutional architectures, recurrent layers or even attention models based on transformers.

This project is joint work by Matthias König and Holger H. Hoos (ADA group at LIACS, Leiden University), together with Bruno Veloso (Portucalense University, Portugal), Luciano Caroprese (ICAR-CNR, Italy), Sónia Teixeira (University of Porto, Portugal & LIAAD-INESC TEC, Portugal), Giuseppe Manco (ICAR-CNR, Italy) and João Gama (University of Porto, Portugal & LIAAD-INESC TEC, Portugal). It has been executed in the context of the HumanE-AI-Net, one of the two ICT-48 Networks of Centres of Excellence recently established by the European commission, in which ADA plays a key role. The accompanying paper will be presented at ECML PKDD 2021 and can be found here, along with the code and other materials.

AutoML for Earth Observation data

In March of 2020, I joined the ADA group to work on my Introductory research project, as part of my master program in Computer Science, under the supervision of Dr. Mitra Baratchi and Dr. Jan van Rijn. As my research topic was related to satellite data, I had the opportunity to collaborate with Dr. Andreas Vollrath, who was working at the ESA Phi-lab. My project was titled “Building Deep Learning Models for Remote Sensing Applications Through Automated Machine Learning”. Based on the results of our assumption test and preliminary experiments, we presented an online poster for the ESA Phi-week 2020. The presentation video can be watched here. We demonstrated the potential of applying the most recent advances in AutoML, specifically hyperparameter optimisation, to a remote sensing dataset. Starting from that, we proposed to build an AutoML system focused on Earth Observation data. This project proposal led to my thesis work “Automated Machine Learning for Satellite Data: Integrating Remote Sensing Pre-trained Models into AutoML Systems”.

Cover image showing AutoML being used for satellite data and representing the collaboration between Earth Observation and Machine Learning fields to achieve the best results.

Application areas that can benefit from machine learning models extracted from Satellite data are numerous. Environmental mappings, crops monitoring, urban planning and emergency response are some examples. However, in order to leverage the latest advancements in machine learning, much expert knowledge and hands-on expertise is required. Automated Machine Learning (AutoML) aims to automate the different stages of a machine learning pipeline, making crucial decisions in a data-driven and objective way. To reach this goal, AutoML systems have been developed. 

Open source AutoML systems are usually benchmarked with natural image datasets. Satellite images differ from natural images in various aspects. One of the most important differences is related to the number and type of spectral bands composing the image. Natural colour images have three bands (red, green & blue), but in satellite images, more spectral bands are available. Therefore, the input data can be composed of more and different types of spectral bands (a common practice in real-world applications). These differences made us wonder how the current AutoML systems work for satellite datasets. To come up with an answer, the first part of our project focused on testing the image classification task available in the AutoML system of AutoKeras on a varied remote sensing benchmark suite of 7 datasets that we compiled by reviewing the remote sensing literature. We found good performance in almost all of these datasets, with impressive results for the RGB version of two of these datasets (EuroSAT and So2Sat). However, we did not achieve better results than manually designed models for more real-world application datasets. This showed the importance of customising these systems to achieve better performance on remote sensing data.

Furthermore, as acquiring ground truth label data is quite expensive, a problem in the field of remote sensing is designing high performing models in the lack of large labelled datasets. The use of pre-trained models has given promising results to address this challenge. As a next step, we added remote sensing pre-trained models to AutoKeras. This helped to improve the accuracy on non-RGB datasets further. In our experiments, we compared 3 AutoML approaches (initializing architectures (i) without pre-training, (ii) pre-trained on Imagenet and (iii) pre-trained on remote sensing datasets) against manually designed architectures on our benchmark set. AutoML architectures outperformed the manually designed architectures in 5 of these datasets. We showed that the best AutoML approach depends on the properties of the dataset at hand, and these results further highlighted the importance of having customised Earth Observation AutoML systems. To know more about this project, please take a look at the project page.

By using AutoML methods to reduce the gap between state-of-the-art machine learning research and applied machine learning, the use of advanced AI for remote sensing applications can be more accessible. One of my personal goals is to keep updating and documenting our project repository so people of different backgrounds can use it.
We believe that making this benchmark publicly available will enable the community to further experiment with relevant remote sensing datasets and expand the AutoML systems for different application goals.