Models, code, and papers for "Benjamin Doerr":

The $(1+(\lambda,\lambda))$ genetic algorithm is one of the few algorithms for which a super-constant speed-up through the use of crossover could be proven. So far, this algorithm has been used with parameters based also on intuitive considerations. In this work, we rigorously regard the whole parameter space and show that the asymptotic time complexity proven by Doerr and Doerr (GECCO 2015) for the intuitive choice is best possible among all settings for population size, mutation probability, and crossover bias.

While evolutionary algorithms are known to be very successful for a broad range of applications, the algorithm designer is often left with many algorithmic choices, for example, the size of the population, the mutation rates, and the crossover rates of the algorithm. These parameters are known to have a crucial influence on the optimization time, and thus need to be chosen carefully, a task that often requires substantial efforts. Moreover, the optimal parameters can change during the optimization process. It is therefore of great interest to design mechanisms that dynamically choose best-possible parameters. An example for such an update mechanism is the one-fifth success rule for step-size adaption in evolutionary strategies. While in continuous domains this principle is well understood also from a mathematical point of view, no comparable theory is available for problems in discrete domains. In this work we show that the one-fifth success rule can be effective also in discrete settings. We regard the $(1+(\lambda,\lambda))$~GA proposed in [Doerr/Doerr/Ebel: From black-box complexity to designing new genetic algorithms, TCS 2015]. We prove that if its population size is chosen according to the one-fifth success rule then the expected optimization time on \textsc{OneMax} is linear. This is better than what \emph{any} static population size $\lambda$ can achieve and is asymptotically optimal also among all adaptive parameter choices.

Understanding how crossover works is still one of the big challenges in evolutionary computation research, and making our understanding precise and proven by mathematical means might be an even bigger one. As one of few examples where crossover provably is useful, the $(1+(\lambda, \lambda))$ Genetic Algorithm (GA) was proposed recently in [Doerr, Doerr, Ebel: TCS 2015]. Using the fitness level method, the expected optimization time on general OneMax functions was analyzed and a $O(\max\{n\log(n)/\lambda, \lambda n\})$ bound was proven for any offspring population size $\lambda \in [1..n]$. We improve this work in several ways, leading to sharper bounds and a better understanding of how the use of crossover speeds up the runtime in this algorithm. We first improve the upper bound on the runtime to $O(\max\{n\log(n)/\lambda, n\lambda \log\log(\lambda)/\log(\lambda)\})$. This improvement is made possible from observing that in the parallel generation of $\lambda$ offspring via crossover (but not mutation), the best of these often is better than the expected value, and hence several fitness levels can be gained in one iteration. We then present the first lower bound for this problem. It matches our upper bound for all values of $\lambda$. This allows to determine the asymptotically optimal value for the population size. It is $\lambda = \Theta(\sqrt{\log(n)\log\log(n)/\log\log\log(n)})$, which gives an optimization time of $\Theta(n \sqrt{\log(n)\log\log\log(n)/\log\log(n)})$. Hence the improved runtime analysis gives a better runtime guarantee along with a better suggestion for the parameter $\lambda$. We finally give a tail bound for the upper tail of the runtime distribution, which shows that the actual runtime exceeds our runtime guarantee by a factor of $(1+\delta)$ with probability $O((n/\lambda^2)^{-\delta})$ only.

In the first and so far only mathematical runtime analysis of an estimation-of-distribution algorithm (EDA) on a multimodal problem, Hasen\"ohrl and Sutton (GECCO 2018) showed for any $k = o(n)$ that the compact genetic algorithm (cGA) with any hypothetical population size $\mu = \Omega(ne^{4k} + n^{3.5+\varepsilon})$ with high probability finds the optimum of the $n$-dimensional jump function with jump size $k$ in time $O(\mu n^{1.5} \log n)$. We significantly improve this result for small jump sizes $k \le \frac 1 {20} \ln n -1$. In this case, already for $\mu = \Omega(\sqrt n \log n) \cap \text{poly}(n)$ the runtime of the cGA with high probability is only $O(\mu \sqrt n)$. For the smallest admissible values of $\mu$, our result gives a runtime of $O(n \log n)$, whereas the previous one only shows $O(n^{5+\varepsilon})$. Since it is known that the cGA with high probability needs at least $\Omega(\mu \sqrt n)$ iterations to optimize the unimodal OneMx function, our result shows that the cGA in contrast to most classic evolutionary algorithms here is able to cross moderate-sized valleys of low fitness at no extra cost. For large $k$, we show that the exponential (in $k$) runtime guarantee of Hasen\"ohrl and Sutton is tight and cannot be improved, also not by using a smaller hypothetical population size. We prove that any choice of the hypothetical population size leads to a runtime that, with high probability, is at least exponential in the jump size $k$. This result might be the first non-trivial exponential lower bound for EDAs that holds for arbitrary parameter settings.

In the first runtime analysis of an estimation-of-distribution algorithm (EDA) on the multi-modal jump function class, Hasen\"ohrl and Sutton (GECCO 2018) proved that the runtime of the compact genetic algorithm with suitable parameter choice on jump functions with high probability is at most polynomial (in the dimension) if the jump size is at most logarithmic (in the dimension), and is at most exponential in the jump size if the jump size is super-logarithmic. The exponential runtime guarantee was achieved with a hypothetical population size that is also exponential in the jump size. Consequently, this setting cannot lead to a better runtime. In this work, we show that any choice of the hypothetical population size leads to a runtime that, with high probability, is at least exponential in the jump size. This result might be the first non-trivial exponential lower bound for EDAs that holds for arbitrary parameter settings.

We prove that the compact genetic algorithm (cGA) with hypothetical population size $\mu = \Omega(\sqrt n \log n) \cap \text{poly}(n)$ with high probability finds the optimum of any $n$-dimensional jump function with jump size $k < \frac 1 {20} \ln n$ in $O(\mu \sqrt n)$ iterations. Since it is known that the cGA with high probability needs at least $\Omega(\mu \sqrt n + n \log n)$ iterations to optimize the unimodal OneMax function, our result shows that the cGA in contrast to most classic evolutionary algorithms here is able to cross moderate-sized valleys of low fitness at no extra cost. Our runtime guarantee improves over the recent upper bound $O(\mu n^{1.5} \log n)$ valid for $\mu = \Omega(n^{3.5+\varepsilon})$ of Hasen\"ohrl and Sutton (GECCO 2018). For the best choice of the hypothetical population size, this result gives a runtime guarantee of $O(n^{5+\varepsilon})$, whereas ours gives $O(n \log n)$. We also provide a simple general method based on parallel runs that, under mild conditions, (i)~overcomes the need to specify a suitable population size, but gives a performance close to the one stemming from the best-possible population size, and (ii)~transforms EDAs with high-probability performance guarantees into EDAs with similar bounds on the expected runtime.

Apart from few exceptions, the mathematical runtime analysis of evolutionary algorithms is mostly concerned with expected runtimes. In this work, we argue that stochastic domination is a notion that should be used more frequently in this area. Stochastic domination allows to formulate much more informative performance guarantees, it allows to decouple the algorithm analysis into the true algorithmic part of detecting a domination statement and the probability-theoretical part of deriving the desired probabilistic guarantees from this statement, and it helps finding simpler and more natural proofs. As particular results, we prove a fitness level theorem which shows that the runtime is dominated by a sum of independent geometric random variables, we prove the first tail bounds for several classic runtime problems, and we give a short and natural proof for Witt's result that the runtime of any $(\mu,p)$ mutation-based algorithm on any function with unique optimum is subdominated by the runtime of a variant of the \oea on the \onemax function. As side-products, we determine the fastest unbiased (1+1) algorithm for the \leadingones benchmark problem, both in the general case and when restricted to static mutation operators, and we prove a Chernoff-type tail bound for sums of independent coupon collector distributions.

This chapter collects several probabilistic tools that proved to be useful in the analysis of randomized search heuristics. This includes classic material like Markov, Chebyshev and Chernoff inequalities, but also lesser known topics like stochastic domination and coupling or Chernoff bounds for geometrically distributed random variables and for negatively correlated random variables. Most of the results presented here have appeared previously, some, however, only in recent conference publications. While the focus is on collecting tools for the analysis of randomized search heuristics, many of these may be useful as well in the analysis of classic randomized algorithms or discrete random structures.

We give an elementary proof of the fact that a binomial random variable $X$ with parameters $n$ and $0.29/n \le p < 1$ with probability at least $1/4$ strictly exceeds its expectation. We also show that for $1/n \le p < 1 - 1/n$, $X$ exceeds its expectation by more than one with probability at least $0.0370$. Both probabilities approach $1/2$ when $np$ and $n(1-p)$ tend to infinity.

We show that for all $1<k \leq \log n$ the $k$-ary unbiased black-box complexity of the $n$-dimensional $\onemax$ function class is $O(n/k)$. This indicates that the power of higher arity operators is much stronger than what the previous $O(n/\log k)$ bound by Doerr et al. (Faster black-box algorithms through higher arity operators, Proc. of FOGA 2011, pp. 163--172, ACM, 2011) suggests. The key to this result is an encoding strategy, which might be of independent interest. We show that, using $k$-ary unbiased variation operators only, we may simulate an unrestricted memory of size $O(2^k)$ bits.

Parameter control aims at realizing performance gains through a dynamic choice of the parameters which determine the behavior of the underlying optimization algorithm. In the context of evolutionary algorithms this research line has for a long time been dominated by empirical approaches. With the significant advances in running time analysis achieved in the last ten years, the parameter control question has become accessible to theoretical investigations. A number of running time results for a broad range of different parameter control mechanisms have been obtained in recent years. This book chapter surveys these works, and puts them into context, by proposing an updated classification scheme for parameter control.

Motivated by a problem in the theory of randomized search heuristics, we give a very precise analysis for the coupon collector problem where the collector starts with a random set of coupons (chosen uniformly from all sets). We show that the expected number of rounds until we have a coupon of each type is $nH_{n/2} - 1/2 \pm o(1)$, where $H_{n/2}$ denotes the $(n/2)$th harmonic number when $n$ is even, and $H_{n/2}:= (1/2) H_{\lfloor n/2 \rfloor} + (1/2) H_{\lceil n/2 \rceil}$ when $n$ is odd. Consequently, the coupon collector with random initial stake is by half a round faster than the one starting with exactly $n/2$ coupons (apart from additive $o(1)$ terms). This result implies that classic simple heuristic called \emph{randomized local search} needs an expected number of $nH_{n/2} - 1/2 \pm o(1)$ iterations to find the optimum of any monotonic function defined on bit-strings of length $n$.

We propose and analyze a self-adaptive version of the $(1,\lambda)$ evolutionary algorithm in which the current mutation rate is part of the individual and thus also subject to mutation. A rigorous runtime analysis on the OneMax benchmark function reveals that a simple local mutation scheme for the rate leads to an expected optimization time (number of fitness evaluations) of $O(n\lambda/\log\lambda+n\log n)$ when $\lambda$ is at least $C \ln n$ for some constant $C > 0$. For all values of $\lambda \ge C \ln n$, this performance is asymptotically best possible among all $\lambda$-parallel mutation-based unbiased black-box algorithms. Our result shows that self-adaptation in evolutionary computation can find complex optimal parameter settings on the fly. At the same time, it proves that a relatively complicated self-adjusting scheme for the mutation rate proposed by Doerr, Gie{\ss}en, Witt, and Yang~(GECCO~2017) can be replaced by our simple endogenous scheme. On the technical side, the paper contributes new tools for the analysis of two-dimensional drift processes arising in the analysis of dynamic parameter choices in EAs, including bounds on occupation probabilities in processes with non-constant drift.

Drift analysis aims at translating the expected progress of an evolutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analysis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a multiplicative growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a simple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible linear dependence on the growth parameter $\delta$ (the previous-best was quadratic). This gives immediately stronger run time guarantees for a number of applications.

Despite significant progress in the theory of evolutionary algorithms, the theoretical understanding of true population-based evolutionary algorithms remains challenging and only few rigorous results exist. Already for the most basic problem, the determination of the asymptotic runtime of the $(\mu+\lambda)$ evolutionary algorithm on the simple OneMax benchmark function, only the special cases $\mu=1$ and $\lambda=1$ have been solved. In this work, we analyze this long-standing problem and show the asymptotically tight result that the runtime $T$, the number of iterations until the optimum is found, satisfies \[E[T] = \Theta\bigg(\frac{n\log n}{\lambda}+\frac{n}{\lambda / \mu} + \frac{n\log^+\log^+ \lambda/ \mu}{\log^+ \lambda / \mu}\bigg),\] where $\log^+ x := \max\{1, \log x\}$ for all $x > 0$. The same methods allow to improve the previous-best $O(\frac{n \log n}{\lambda} + n \log \lambda)$ runtime guarantee for the $(\lambda+\lambda)$~EA with fair parent selection to a tight $\Theta(\frac{n \log n}{\lambda} + n)$ runtime result.

We conduct a first fundamental analysis of the working principles of binary differential evolution (BDE), an optimization heuristic for binary decision variables that was derived by Gong and Tuson (2007) from the very successful classic differential evolution (DE) for continuous optimization. We show that unlike most other optimization paradigms, it is stable in the sense that neutral bit values are sampled with probability close to $1/2$ for a long time. This is generally a desirable property, however, it makes it harder to find the optima for decision variables with small influence on the objective function. This can result in an optimization time exponential in the dimension when optimizing simple symmetric functions like OneMax. On the positive side, BDE quickly detects and optimizes the most important decision variables. For example, dominant bits converge to the optimal value in time logarithmic in the population size. This enables BDE to optimize the most important bits very fast. Overall, our results indicate that BDE is an interesting optimization paradigm having characteristics significantly different from classic evolutionary algorithms or estimation-of-distribution algorithms (EDAs). On the technical side, we observe that the strong stochastic dependencies in the random experiment describing a run of BDE prevent us from proving all desired results with the mathematical rigor that was successfully used in the analysis of other evolutionary algorithms. Inspired by mean-field approaches in statistical physics we propose a more independent variant of BDE, show experimentally its similarity to BDE, and prove some statements rigorously only for the independent variant. Such a semi-rigorous approach might be interesting for other problems in evolutionary computation where purely mathematical methods failed so far.

Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that maintain a probabilistic model of the solution space. This model is updated from iteration to iteration, based on the quality of the solutions sampled according to the model. As previous works show, this short-term perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. Such frequencies take long to be moved back to the middle range, leading to significant performance losses. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based compact genetic algorithm (sig-cGA) optimizes the commonly regarded benchmark functions OneMax, LeadingOnes, and BinVal all in $O(n\log n)$ time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed scGA -- an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model -- we prove that it optimizes OneMax only in a time exponential in the hypothetical population size $1/\rho$. Similarly, we show that the convex search algorithm cannot optimize OneMax in polynomial time.

To gain a better theoretical understanding of how evolutionary algorithms cope with plateaus of constant fitness, we analyze how the $(1 + 1)$~EA optimizes the $n$-dimensional $Plateau_k$ function. This function has a plateau of second-best fitness in a radius of $k$ around the optimum. As optimization algorithm, we regard the $(1 + 1)$~EA using an arbitrary unbiased mutation operator. Denoting by $\alpha$ the random number of bits flipped in an application of this operator and assuming $\Pr[\alpha = 1] = \Omega(1)$, we show the surprising result that for $k \ge 2$ the expected optimization time of this algorithm is \[\frac{n^k}{k!\Pr[1 \le \alpha \le k]}(1 + o(1)),\] that is, the size of the plateau times the expected waiting time for an iteration flipping between $1$ and $k$ bits. Our result implies that the optimal mutation rate for this function is approximately~$k/en$. Our main analysis tool is a combined analysis of the Markov chains on the search point space and on the Hamming level space, an approach that promises to be useful also for other plateau problems.

The $(1+(\lambda,\lambda))$ genetic algorithm, first proposed at GECCO 2013, showed a surprisingly good performance on so me optimization problems. The theoretical analysis so far was restricted to the OneMax test function, where this GA profited from the perfect fitness-distance correlation. In this work, we conduct a rigorous runtime analysis of this GA on random 3-SAT instances in the planted solution model having at least logarithmic average degree, which are known to have a weaker fitness distance correlation. We prove that this GA with fixed not too large population size again obtains runtimes better than $\Theta(n \log n)$, which is a lower bound for most evolutionary algorithms on pseudo-Boolean problems with unique optimum. However, the self-adjusting version of the GA risks reaching population sizes at which the intermediate selection of the GA, due to the weaker fitness-distance correlation, is not able to distinguish a profitable offspring from others. We show that this problem can be overcome by equipping the self-adjusting GA with an upper limit for the population size. Apart from sparse instances, this limit can be chosen in a way that the asymptotic performance does not worsen compared to the idealistic OneMax case. Overall, this work shows that the $(1+(\lambda,\lambda))$ GA can provably have a good performance on combinatorial search and optimization problems also in the presence of a weaker fitness-distance correlation.

We show that the unrestricted black-box complexity of the $n$-dimensional XOR- and permutation-invariant LeadingOnes function class is $O(n \log (n) / \log \log n)$. This shows that the recent natural looking $O(n\log n)$ bound is not tight. The black-box optimization algorithm leading to this bound can be implemented in a way that only 3-ary unbiased variation operators are used. Hence our bound is also valid for the unbiased black-box complexity recently introduced by Lehre and Witt (GECCO 2010). The bound also remains valid if we impose the additional restriction that the black-box algorithm does not have access to the objective values but only to their relative order (ranking-based black-box complexity).