GA - INTRO

Evolutionary type algorithms are an essential part of the Artificial Intelligence as well as Computational Intelligence and Machine Learning toolkits^7,13,29. The first record of the proposal to mutation/selection learning machines is probably that of Alan Turing in the article on Computing Machinery and Intelligence published in 1950. Since this period, Genetic Algorithms (abbreviated GA) and their variants have been used with success in a variety of fields; a detailed list of genetic algorithm real-world applications is provided in Wikipedia . Since 2004, the Genetic and Evolutionary Computation Conference boasts the "Humies" competition awarding prizes to human-competitive results produced by any form of genetic and evolutionary computation. Besides their numerical ability to generate high-quality solutions to mathematical optimization and solution search problems, evolutionary algorithms are very often presented on specific problem instances with limited regard to their rigorous mathematical foundations, without any rigorous analysis of the underlying random evolutionary processes. Their performance often relies on intuition and empirical evidence driven by numerical observations, so that evolutionary computing research sometimes falls in obscure inspired paradigms and a crucial question arise:

To what extent do we accept numerical evidence as a proof when dealing with complex systems of high risk as nuclear plants, health care management, security or defense systems?

In the context of optimization problems this question doesn't seem so critical. Population-based algorithms are used as hole punchers for drilling and perforating complex solution state spaces. If you can move mountains you can move molehills; the central idea here is to "mimic" natural-evolution-type mechanisms "inspired by Darwin's theory of evolution in Nature" to improve gradually step by step the population adaptation without the need of the fitness derivative information. For convex optimization problems, the performance and convergence of the algorithm follows from standard analysis. For more complex optimization problems with local minima, we expect large population explorations to escape from suboptimal or inadequate solutions.

When the population size is sufficiently large but finite, Genetic Algorithms equipped with an inverse temperature pressure and vanishing mutations behave as a generalized simulated annealing algorithm on product spaces⁴, see also sections 4 and 5 in the review article on genetic algorithms⁷. This approach allows to develop a global optimization theory based on generalized simulated annealing theory on product spaces, see for instance the articles^4,7 and the pioneering article by Cerf published in 1995. Leaving aside computational cost considerations, the population size can be seen as a precision parameter, larger population increases the chance to find the optimal solution and prevent being stuck in local optima. Increasing the computational power with the population size should significantly improve the optimization performance. The concentration properties of the infinite population GA are discussed in the article on annealed Feynman-Kac Models²⁶.

Most of genetic type evolutionary computing algorithms can be seen as an advanced sophisticated Monte Carlo method that converges towards some particular probability distributions (a.k.a. the infinite population model) as the number of individuals/samples tends to $\infty$. For instance, as shown in the first rigorous article on this subject¹ published in 1996, when the number of samples tends to $\infty$, the occupation measures of the simple Genetic Algorithm , the most popular type of evolutionary type algorithm , converge towards a conditional probability measure w.r.t. a series of conditioning events that depends on the fitness functions. Simple GA with mutation/selection transitions can be interpreted as a mean-field particle methodology to sample Feynman-Kac probability measures . These probability measures represent the distribution of the paths of a reference Markov process, weighted by a collection of potential/fitness functions^5,12,16. These functional models are natural mathematical extensions of the traditional change of probability measures, commonly used in importance sampling. For regulation problems, and open loop optimal control problems, these two viewpoints can also be encapsulated in a single mathematical framework⁸.

This GA Monte Carlo sampling methodology and their limiting Feynman-Kac probability measures (in discrete time as well as in continuous time^{5,16,17,21,23}) encapsulates in a single mathematical framework a variety of algorithms known by many other names and often used as natural heuristics since the beginning of the 1950s in different scientific disciplines, to name a few:

The Pruned-Enriched Rosenbluth Method (a.k.a. PERM) in molecular simulation, the Reconfiguration Monte Carlo as well as the Stochastic Reconfiguration, and the Population Monte Carlo method in theoretical physics. Stochastic reconfiguration techniques and related branching variants are discussed in the review article by Assaraf, Caffarel, and Khelif and in the article by Sorella and Capriotti ; see also the genetic-type branching methodologies developed in the articles^7,24 as well as Chapter 11 on Feynman-Kac and interacting particle recipes in the book¹⁸.
The Particle filter, the Bootstrap filter and the Monte Carlo filter arising in signal processing coincide a simple mutation/selection genetic algorithm.

Different terminologies are also often used depending on the choice of the fitness function and the mutation transitions:
- Branching/lookahead strategies based on fitness dependent mutations are discussed in the article on nonlinear filtering², these fitness depending mutations coincide with the conditional explorations presented in section 2.3.2 in the review article on branching genetic-type models⁵ as well as in section 11.5 in the book on Feynman-Kac formulae¹⁸,
- Interacting Kalman filter methodology, a.k.a. Rao-Blackwellized particle filter, based on Kalman filter mutations, are presented in section 12.6.7 in the book on Feynman-Kac formulae¹⁸ as well as section 8.3 in the book on Mean field simulation techniques¹⁸. When applied to fixed parameter estimation in a linear-Gaussian model, every Kalman filter is associated with a parameter. The mutation is defined as a MCMC local move on the parameter space with the corresponding Kalman updates/prediction. The selection favors parameters and Kalman predictors with high likelihood.
- Interacting Particle filter methodology, defined as in the Interacting Kalman filter methodology by replacing Kalman filter mutations by Particle filter mutations. These island-type particle quenched and annealed GA models based on the unnormalized properties of GA Monte Carlo methodologies are presented in section 8.4 and section 8.5 in the book on Mean field simulation techniques¹⁸. When applied to fixed parameter estimation in a nonlinear model, every Particle filter is associated with a parameter. The mutation is defined as a MCMC local move on the parameter space with the corresponding Particle filter updates/prediction. The selection favors parameters and Particle predictors with high likelihood. This Interacting Particle filter MCMC methodology, a.k.a. SMC2, is discussed in some details in section 2.2.5 in the book¹². The MCMC-local move can be replaced by a local move vanishing when the number of particles increases. This model is sometimes called Nested particle filter.
- Sequential Monte Carlo for approximate Bayesian computation, a.k.a. Sequential Monte Carlo without likelihood and related convolution particle filters evolve a genetic algorithm on the state-observation space to solve filtering problems with an unknown or computationally very costly to evaluate likelihood function, These genetic type particle methodologies on augmented state spaces are discussed in Case C on page 15 in the article on filtering³⁴.
Twisted/Guided mutations and adaptive selection times procedures ^1,20,35 are sometimes called adaptive multilevel splitting and adaptive SMC. The selection periods are computed online with respect to some adaptive empirical criteria. These importance sampling/splitting techniques are used to reduce the bias and the variance of the GA Monte Carlo samplers.
In molecular chemistry, branching/lookahead (importance sampling) strategies are sometimes called Markovian anticipation as well as Pilot explorations.
In statistical machine learning literature the GA methodology is also called Sequential Monte Carlo ³², the time homogeneous version is also termed Interacting Metropolis algorithm³³.
The go-with-the-winner heuristic (see section 3.3 in the book¹⁸) and Gibbs-cloning algorithms in operation research.
Subset simulation (see also section 1.5 in the article¹⁹, section 2.2. in the article ¹² as well as section 4.3.4 in the book¹⁷) and multi-level splitting in rare event simulation^9,28.
In continuous time settings, the GA methodology is sometimes called Branching/Elimination , and Killing/Regeneration, as well as Dying/Branching and Killing/Cloning processes; see also the article in theoretical physics by Giardina, Kurchan and Peliti .
The continuous time GA methodology also coincides with the Fleming-Viot & Moran-type particle simulation techniques for sampling non absorbed processes^5,17,21,25, also known as Diffusion Quantum Monte Carlo techniques for the calculations of quasi-invariant measures, fixed point of normalized Feynman-Kac semigroups and related ground state energies of Schrödinger operators^21,22 arising in quantum mechanics and molecular chemistry^5,17 (see also chapter 27 in the book¹⁶).

Most of the terminology encountered in the literature arises from the application domains as well as with the branching/genetic evolution of the Monte Carlo methodology. As underlined in the articles^25,30, from a mathematical viewpoint, only the terminologies "Fleming-Viot" and "Moran" are misleading. The reasons are two-fold: firstly, the Moran particle model is a finite population model that converges as the number of particles tends to infinity towards a stochastic Fleming Viot super process and secondly, the genetic noise arising in the limit requires a Moran finite population process with symmetric-selection jump rates. In the context of GA, the selection/killing-jump rates are clearly far from being symmetric, the empirical measures of the finite population model are biased and the limiting Feynman-Kac semigroup, as the number of particles tends to infinity, is purely deterministic.

Contrary to much of what is written in the literature on evolutionary algorithms, the GA methodologies discussed above belong to the class of (mean field type) Monte Carlo methods. They involve generating random samples/particles from a probability distribution. As the law of large numbers or as the ergodic theorem describing the behavior of sample averages, these GA Monte Carlo methodologies cannot be classified as metaheuristics.

These GA Monte Carlo techniques have been successfully applied in many different fields including biology, risk analysis, computer vision, signal processing, tracking, optimal control, econometrics, finance, robotics, and statistical inference. For a more detailed discussion we refer to the books^16,17,18, see also the applications to statistical learning and rare event simulation discussed in the slides: MCQMC-2012 slides and the ENS-Lyon-2012 slides.

This webpage presents a brief pedagogical introduction to some of the probabilistic concepts and convergence results currently popular in discrete generation GA Monte Carlo particle-type samplers. It is by no means exhaustive and it is mainly based on our own work with close collaborators. More detailed references can be found on the complementary sites on Feynman-Kac probability measures and their genetic-type particle interpretations. The material covered includes unnormalized and unbiased measures, infinite population models, limiting quasi-invariant measures, parallel island models, genealogical tree models as well as backward ancestral measures. We also present some illustrations related to Markov chain conditioning, filtering and Boltzmann-Gibbs measures. The mathematical concepts described below provide a very natural and unifying mathematical basis for a large class of genetic type Monte Carlo algorithms. We hope that this unifying point of view will help to develop fruitfully this field further.

A Simple Genetic Algorithm with Mutation & Selection

Suppose we are given some Markov process $X_{n}$ evolving at every time step $n\in \mathbb {N} $ in some state/solution space $S_n$ which may vary with the time parameter $ n\in \mathbb {N}$. This first basic ingredient represents the mutation transitions of the genetic algorithm. Without selection, the GA with $N$ individuals behaves as $N$ independent copies of the Markov chain $X_{n}$.

Let $ g_{n}~:~x\in S_n\mapsto g_{n}(x)\in \mathbb {[} 0,\infty ]$ be some collection of bounded fitness functions on the solution space $S_n$ indexed by $ n\in \mathbb {N} $. This second ingredient represents the fitness of the selection transitions of the genetic algorithm.

Consider a population of $N$ individuals represented by $N$ independent copies $ \xi _{0}:=\left(\xi _{0}^{i}\right)_{1\leq i\leq N}\in S_0^N$ of some random variable $ X_{0}$ on $S_0$.

The simple genetic algorithm associated with the Markov chain $X_n$ and the fitness function $g_n$ is defined the selection-mutation transitions given by the synthetic diagram

$$\displaystyle \xi _{n}:=\left(\xi _{n}^{\,i}\right)_{1\leq i\leq N}\in S_n^{N}~~~~{\stackrel {\mbox{selection $~~~~~~~~~~~~$}}{-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!\longrightarrow }}~~~~{\widehat {\xi }}_{n}:=\left({\widehat {\xi }}_{n}^{\,i}\right)_{1\leq i\leq N}\in S_n^{N}~~~~~{\stackrel {\mbox{mutation $~~~~~~~~~~~~$}}{-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!\longrightarrow }}~~~\xi _{n+1}:=\left(\xi _{n+1}^{\,i}\right)_{1\leq i\leq N}\in S_{n+1}^{N}$$

The upper index $(.)^i$ represents the label of the $i$-th individual, the lower index $(.)_n$ represents the time parameter $n\in \mathbb {N}$, and $S_n^N=S_n\times\ldots\times S_n$ stands for the $N$-fold Cartesian product space of the set $S_n$.

During the selection transition:

Given the random variables $\xi _{n}:=\left(\xi _{n}^{\,i}\right)_{1\leq i\leq N}\in S_n^{N}$ at time $n\in \mathbb {N} $, sample N (conditionally) independent random variables $\widehat{\xi}_{n}:=\left(\widehat {\xi}_{n}^{\,i}\right)_{1\leq i\leq N}$ with common (conditional) distribution

$$ Proba\left({\widehat {\xi }}_{n}^{\,i}=\xi _{n}^{\,j}~|~\xi _{n}\right)={\frac {g_{n}(\xi _{n}^{\,j})}{\sum _{1\leq k\leq N}g_{n}(\xi _{n}^{\,k})}},~~~~{\mbox{for each}}~j\in \{1,\ldots ,N\}$$

In other words, if $ g_{n}(\xi _{n}^{\,j})$ is the fitness of the $j$-th individual $ \xi _{n}^{\,j}$ in the population $ \xi _{n}$, its probability of being selected in the next generation $\widehat {\xi }_{n}$ is equal to $\displaystyle\frac {g_{n}(\xi _{n}^{\,j})}{\sum _{1\leq k\leq N}g_{n}(\xi _{n}^{\,k})}$.

This selection transition is also called the fitness-proportionate selection a.k.a. the roulette-wheel selection. A proportion of the wheel is assigned to each of the $N$ possible selected individual $\xi _{n}^{\,j}$ based on their fitness values $g_{n}(\xi _{n}^{\,j})$, with $j\in\{1,\ldots,N\}$. As shown above, the proportion of the wheel is achieved by dividing the fitness of a selection by the total fitness of all the selections. Every random selection is made by rotating the wheel. Several branching variants are developed in the articles^7,24, see also chapter 11 in the book on Feynman-Kac formulae¹⁸

For $[0,1]$-valued fitness functions an alternative acceptance-rejection type selection transition can be underlined:

For each $ 1\leq i\leq N$, with a probability $g_{n}(\xi _{n}^{\,i})$ we accept the individual; that is, we set $\widehat {\xi }_{n}^{\,i}=\xi _{n}^{\,i}$. With a probability $(1-g_{n}(\xi _{n}^{\,i}))$ we sample a new individual ${\widetilde {\xi }}_{n}^{\,i}$ in the current population with the probability measure $$ Proba\left({\widetilde {\xi }}_{n}^{\,i}=\xi _{n}^{\,j}~|~\xi _{n}\right)={\frac {g_{n}(\xi _{n}^{\,j})}{\sum _{1\leq k\leq N}g_{n}(\xi _{n}^{\,k})}},~~~~{\mbox{for each}}~j\in \{1,\ldots ,N\}\qquad\mbox{and we set}\qquad \widehat {\xi }_{n}^{\, i}=\widetilde {\xi }_{n}^{\, i}.$$ This selection scheme can be interpreted as an acceptance-rejection sampler equipped with a recycling mechanism. It can also be seen as a killing/regeneration transition with killing rate $(1-g_n(x))$. In this interpretation, with probability $(1-g_{n}(\xi _{n}^{\,i}))$ the $i$-th individual is killed and instantly replaced by a new individual ${\widetilde {\xi }}_{n}^{\,i}$ chosen in the pool, with a probability proportional to the fitness of the individuals.

During the mutation transition:

From each selected particle $\widehat {\xi }_{n}^{\, i}\in S_n$ we sample independently a transition $\widehat {\xi }_{n}^{\, i}\in S_n~\longrightarrow \xi _{n+1}^{\, i}\in S_{n+1}$ of the Markov chain $ X_{n}\in S_n~\rightarrow ~X_{n+1}\in S_{n+1}$ starting at $X_{n}=\widehat {\xi }_{n}^{\, i}$, for $i\in \{1,\ldots ,N\}$.

Some Convergence Results

Under some regularity conditions, when the number of individuals (and the computational power) $ N\uparrow \infty $, for any time index $n\in \mathbb{N}$ and any function $ f_n~:~x\in S_n\mapsto f_n(x)\in \mathbb {R}$, we have the almost sure convergence results $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left(\xi _{n}^{\,i}\right)~\approx _{N\uparrow\infty }~{\frac {\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k<n}g_{k}(X_{k})\right)}}\qquad {\mbox{as well as}}\qquad {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left({\widehat {\xi }}_{n}^{\,i}\right)~\approx _{N\uparrow \infty }~{\frac {\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}$$ In addition, we have the unbiased estimate $$ \frac {1}{N} \sum _{1\leq i\leq N}f_n(\xi _{n}^{\,i}) ~~~\times~~~ \left( \prod _{0\leq k<n}{\frac {1}{N}}\sum _{1\leq j\leq N}g_{k}(\xi _{k}^{\,j})\right)~\approx _{N\uparrow \infty }~\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ The unbiasedness property refers to the fact that the expected values of the estimator coincides with the limiting value; that is, we have that the so-called unnormalized formulae $$ \mathbb{E}\left( \frac {1}{N} \sum _{1\leq i\leq N}f_n(\xi _{n}^{\,i}) ~~~\times~~~ \left( \prod _{0\leq k<n}{\frac {1}{N}}\sum _{1\leq j\leq N}g_{k}(\xi _{k}^{\,j})\right)\right)~=~\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ In statistics and probability theory the limiting expectations in the right hand side displays are called Feynman-Kac formulae. The first rigorous proof of the above convergence results including the unbiasedness property and the design of adaptive selection rules can be found in the article ¹ in the context of nonlinear filtering published in 1996, see also the series of articles ^2,3,4,5,6,12. Under some stability conditions, the above convergence results can be turned into uniform in time estimates^3,5,6, see also the time-uniform exponential concentration inequalities developed in the books^12,17,18

Mutation transitions that depend on the fitness functions (a.k.a. lookahead strategies) are also discussed in the article on nonlinear filtering², see also the interacting Kalman filter methodology (a.k.a. Rao-Blackwellized particle filter) presented in section 12.6.7 in the book on Feynman-Kac formulae¹⁸ as well as section 8.3 in the book on Mean field simulation techniques¹⁸.
The GA with roulette-wheel selection and unit fitness function coincides with the neutral genetic model¹⁰. In this context, it is possible to design an explicit, finite, exact, genealogical tree based representation of stationary populations that holds both for finite and infinite types (or alleles) models.
The link between Poisson branching processes and the multinomial branching interpretation of GA is provided in section 2.3 in the article discrete generation branching interpretations of the filtering equation²⁴, see also the branching random walk method introduced by J.B. Anderson for solving the Schrödinger equation for molecular wavefunctions as well as the continuous time branching models discussed in section 6.2 in the article⁷.
The stability properties of the positive semigroups associated with these infinite population models are developed in some details in the review article²⁷. Their concentration properties around the global minima of the fitness function are discussed in the article on annealed Feynman-Kac Models²⁶, see also chapter 6 in the book¹⁸.

The infinite population GA model

The limiting Feynman-Kac probability measures discussed above are closely related to the so-called infinite population model arising in evolutionary computation literature. In this connection, we underline that these probability measures and the filtering equations discussed in the articles on the convergence properties of particle filters^1,2,3,5,6 as well as the ones in the articles by Gordon-Salmon and Smith on the Bootstrap filter and by Kitagawa on the Monte Carlo filter coincide with the infinite population GA evolution equations (without crossover) discussed by Ermakov and Zhiglyavskii in an article on GA published in 1984 (see equation (6) as well as equations (2.5) and (2.13) in the earlier article by Karlin published in 1979), as well as in the article by Qi and Palmeri (see equation (4)), the one by Vose (see section 6) and more recently in the article by Song and Li . This article also discusses issues and wrong convergence proofs in a widely cited studies on genetic algorithms in continuous space (see equation (1)). The commonly made mistake is to apply de Finetti theorem for finite sequences without ensuring the asymptotic independence of finite sequences (a.k.a. propagations of chaos properties). As expected and as mentioned earlier, Particle filters, Bootstrap filters as well as Monte Carlo filters also coincide with the GA with roulette-wheel selection discussed above as well as in the series of articles presented above.

As the population size $N$ goes to infinity, the occupation measures of the individuals before and after selection converge to deterministic probability distributions denoted respectively by $\eta_n(dx)$ and $\widehat{\eta}_n(dx)$; that is, for any function $ f_n~:~x\in S_n\mapsto f_n(x)\in \mathbb {R}$, we have the almost sure convergence results $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left(\xi _{n}^{\,i}\right)~\approx _{N\uparrow\infty }~ \int~f_n(x)~\eta_n(dx) \qquad {\mbox{as well as}}\qquad {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left({\widehat {\xi }}_{n}^{\,i}\right)~\approx _{N\uparrow \infty }~\int~f_n(x)~\widehat{\eta}_n(dx) $$ For time homogeneous and finite type state spaces $S$, these probability are represented by vectors $x\in S\mapsto \eta_n(x)\in [0,1]$ and $x\in S\mapsto \widehat{\eta}_n(x)\in [0,1]$ such that $\sum_{x\in S} \eta_n(x)=1=\sum_{x\in S} \widehat{\eta}_n(x)$. In this context if we choose the indicator function $f_n(x)=1_{A}(x)$ of some subset $A$ of the state space $S$, then $$ \eta_n(A):=\sum_{x\in S}~1_{A}(x)~\eta_n(x)\quad\mbox{can be seen as the limiting proportion in the population consisting of individuals corresponding to types in $A$} $$ Obviously, the probability distributions $\eta_n(dx)$ and $\widehat{\eta}_n(dx)$ coincide with the Feynman-Kac measures discussed above. They are described sequentially by so-called infinite population evolution equations $$ \eta_n(dx)~~~~{\stackrel {\mbox{selection $~~~~~~~~~~~~$}}{-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!\longrightarrow }}~~~~ \widehat{\eta}_n(dx)~~~~~{\stackrel {\mbox{mutation $~~~~~~~~~~~~$}}{-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!\longrightarrow }}~~~ \eta_{n+1}(dx)$$ The probability distributions in the $n$-th generation described above are calculated sequentially by the integral formulae $$ \widehat{\eta}_n(dx)=\frac{g_n(x)~\eta_n(dx)}{\int~g_n(z)~\eta_n(dz)} \qquad {\mbox{and}}\qquad \eta_{n+1}(dx)=\int~\widehat{\eta}_n(dz)~M_{n+1}(z,dx) $$ with the transition probability of the mutation process $$ M_{n+1}(z,dx):= Proba(X_{n+1}\in dx~|~X_{n}=z) $$

Invariant/Limiting measures (a.k.a. quasi-invariant measures)

For time homogeneous models $(g_n,M_n)=(g,M)$, the infinite population evolution equations discussed above take the following form $$ \widehat{\eta}_n(dx)=\Psi_{g}(\eta_n)(dx):=\frac{g(x)~\eta_n(dx)}{\int~g(z)~\eta_n(dz)} \qquad {\mbox{and}}\qquad \eta_{n+1}(dx)=(\widehat{\eta}_nM)(dx):=\int~\widehat{\eta}_n(dz)~M(z,dx) $$ In a more synthetic form, we have $$ \widehat{\eta}_n=\widehat{\Phi}\left(\widehat{\eta}_{n-1}\right):=\Psi_{g}\left(\widehat{\eta}_{n-1}M\right) \quad\mbox{and}\quad \eta_n=\Phi\left(\eta_{n-1}\right):=\Psi_{g}(\eta_{n-1})M $$ Under some stability conditions ^{3,5,6,10,21,22} (see also the books ^17,18 and the more recent articles^27,30) whenever the time parameter $n\rightarrow\infty$ the above recursions converge (exponentially fast) toward a single limiting probability measure $$ \widehat{\eta}_n \longrightarrow_{n\rightarrow\infty} \widehat{\eta}_{\infty} =\widehat{\Phi}\left(\widehat{\eta}_{\infty}\right) \quad\mbox{and}\quad \eta_n\longrightarrow_{n\rightarrow\infty}\eta_{\infty}=\Phi\left(\eta_{\infty}\right) $$ These limiting probability measures are clearly connected by the formulae $$ \widehat{\eta}_{\infty} = \Psi_{g}\left(\eta_{\infty}\right) \quad\mbox{and}\quad \eta_{\infty}= \widehat{\eta}_{\infty}M $$ Assume that the mutation transition $M(x,dy)$ is reversible with respect to some probability measure $\mu(dx)$; that is, we have that $$ \mu(dx)~M(x,dy)=\mu(dy)~M(y,dx) $$ Also assume there exists some function $h(x)\geq 0$ and some parameter $\lambda>0$ such that $$ g(x)~\int~M(x,dz)~h(z)=\lambda~ h(x) $$ In this situation, as shown for instance in Section 12.4 in the book¹⁸ we have $$ \widehat{\eta}_{\infty}(dx)=\frac{h(x)}{\int h(z)\mu(dz)}~\mu(dx) $$

Parallel Island models

The unbiasedness property can be rewritten as follows: $$ \mathbb{E}\left( F_n(\chi_n) \prod _{0\leq k<n}G_k(\chi_k)\right)~=~\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ with the Markov chain $\chi_n=(\xi_n^i)_{1\leq i\leq N}\in S_n^N$, the function $$ F_n(\chi_n):= \frac {1}{N} \sum _{1\leq i\leq N}f_n(\xi _{n}^{i})\quad \mbox{and the fitness function}\quad G_n(\chi_n):= \frac {1}{N} \sum _{1\leq i\leq N}g_n(\xi _{n}^{i}) $$ The simple genetic algorithm associated with the Markov chain $\chi_n$ and the fitness function $G_n(\chi_n)$ can be interpreted as an island genetic algorithm^12,13,14 (a.k.a. double bootstrap of $SMC^2$ in filtering theory and Bayesian inference).

Genealogical trees

Tracing back in time the ancestral lines of each individual $\widehat {\xi }_{n}^{i}$ in the population after the $n$-th selection, we define the genealogical tree of the genetic population $$ \widehat {\xi }_{0,n}^{i}\longleftarrow ~{\widehat {\xi }}_{1,n}^{i}\longleftarrow ~\ldots \longleftarrow {\widehat {\xi }}_{k-1,n}^{i}\longleftarrow {\widehat {\xi }}_{k,n}^{i}\longleftarrow \ldots \longleftarrow{\widehat {\xi }}_{n-1,n}^{i}\longleftarrow {\widehat {\xi }}_{n,n}^{i}={\widehat {\xi }}_{n}^{i} $$ The lower indices $ (.)_{k,n}$ represent the level $k$ of the ancestor of $\widehat {\xi }_{n}^{i}$.

When the number of individuals (and the computational power) $ N\uparrow \infty $, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have the Monte Carlo approximation $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_{n}\left({\widehat {\xi }}_{0,n}^{i},{\widehat {\xi }}_{1,n}^{i},\ldots ,{\widehat {\xi }}_{n-1,n}^{i},{\widehat {\xi }}_{n,n}^{i}\right)~\approx _{N\uparrow \infty }~{\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}$$

In the same vein, tracing back in time the ancestral lines of each individual $\xi_{n}^{i}$ in the population after the $n$-th mutation, we define the genealogical tree of the genetic population $$ \xi_{0,n}^{i}\longleftarrow ~\xi_{1,n}^{i}\longleftarrow ~\ldots \longleftarrow\xi_{k-1,n}^{i}\longleftarrow \xi_{k,n}^{i}\longleftarrow \ldots \longleftarrow\xi_{n-1,n}^{i}\longleftarrow \xi_{n,n}^{i}=\xi_{n}^{i} $$ The lower indices $ (.)_{k,n}$ represent the level $k$ of the ancestor of $\xi_{n}^{i}$

When the number of individuals (and the computational power) $ N\uparrow \infty $, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have the Monte Carlo approximation $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_{n}\left(\xi_{0,n}^{i},\xi_{1,n}^{i},\ldots,\xi_{n-1,n}^{i},\xi_{n,n}^{i}\right)~\approx _{N\uparrow \infty }~{\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k< n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k< n}g_{k}(X_{k})\right)}}$$

In addition, we have the unbiasedness property $$ \mathbb{E}\left( \frac {1}{N} \sum _{1\leq i\leq N}f_{n}\left(\xi_{0,n}^{i},\xi_{1,n}^{i},\ldots,\xi_{n-1,n}^{i},\xi_{n,n}^{i}\right) ~~~\times~~~ \left( \prod _{0\leq k<n}{\frac {1}{N}}\sum _{1\leq j\leq N}g_{k}(\xi _{k}^{\,j})\right)\right)~=~\mathbb{E}\left(f_n(X_{0},\ldots ,X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ Choosing the historical process $$ {\cal X}_n:=(X_0,\ldots,X_n) \in {\cal S}_n:=(S_0\times\ldots \times S_n)\quad \mbox{and the fitness function}\quad {\cal G}_n({\cal X}_n):=g_n(X_n)$$ we clearly have $$ \mathbb{E}\left(f_{n}({\cal X}_n)~\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)=\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k\leq n}g_{k}(X_{k})\right)\quad \mbox{and}\quad \mathbb{E}\left(f_{n}({\cal X}_n)~\prod _{0\leq k< n} {\cal G}_k({\cal X}_k)\right)=\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k< n}g_{k}(X_{k})\right) $$

This shows that the genealogical tree evolution of the genetic population is itself a simple genetic algorithm with a mutation transition on path space representing the evolution of the historical process. In other words, the genealogical tree evolution of the simple genetic algorithm associated with the Markov chain $X_n$ and the fitness function $g_n$ on $S_n$ coincides with the simple genetic algorithm associated with the historical Markov chain ${\cal X}_n$ and the fitness function ${\cal G}_n$ on ${\cal S}_n$.

Backward ancestral lines

To avoid unnecessary measure theoretic considerations, consider a Markov chain $X_n$ on a finite state space $S$ and set $$ p_{n+1}(x_{n},x_{n+1}):= Proba(X_{n+1}=x_{n+1}~|~X_{n}=x_{n})\quad \mbox{and}\quad h_{n+1}(x_{n},x_{n+1}):=g_{n}(x_{n})~p_{n+1}(x_{n},x_{n+1}) $$ Also consider the random backward transitions $$ \mathbb{M}_{n+1,m(\xi_{n})}(x_{n+1},x_{n}):=\frac{m(\xi_{n})(x_n)~h_{n+1}(x_{n},x_{n+1})}{\sum_{z_n\in S}m(\xi_{n})(z_n)~ h_{n+1}(z_{n},x_{n+1})} \quad \mbox{with}\quad m(\xi_{n})(x_n):=\frac{1}{N}\sum_{1\leq i\leq N}~1_{\xi^i_n}(x_n)\quad \mbox{with}\quad 1_{\xi^i_n}(x_n)=\left\{ \begin{array}{rcl} 1 &\mbox{if} & x_n=\xi^i_n\\ 0 &\mbox{if} & x_n\not=\xi^i_n \end{array}\right. $$ Note that $$ \sum_{x_n\in S} \mathbb{M}_{n+1,m(\xi_{n})}(x_{n+1},x_{n})=1 \quad \mbox{and}\quad \sum_{x_n\in S} m(\xi_{n})(x_n)=1 $$ In this notation, given the populations of individuals $(\xi_0,\ldots,\xi_n)$, the conditional probability distribution of a randomly chosen ancestral line $(\mathbb{X}_{0,n},\mathbb{X}_{1,n},\ldots,\mathbb{X}_{n,n})$ in the genealogical tree after the $n$-th mutation is given by the backward Markov chain formula^11,15: $$ Proba\left(\mathbb{X}_{0,n}=x_0,\mathbb{X}_{1,n}=x_1,\ldots,\mathbb{X}_{n-1,n}=x_{n-1},\mathbb{X}_{n,n}=x_n~|~\xi_0,\ldots,\xi_n\right)= m(\xi_n)(x_n)~\mathbb{M}_{n,m(\xi_{n-1})}(x_{n},x_{n-1})\ldots \mathbb{M}_{1,m(\xi_0)}(x_{1},x_{0}) $$ In addition, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have $$ \mathbb{E}\left(f_n\left(\mathbb{X}_{0,n},\mathbb{X}_{1,n},\ldots,\mathbb{X}_{n,n}\right)~|~\xi_0,\ldots,\xi_n\right)= \mathbb{E}\left({\frac {1}{N}}\sum _{1\leq i\leq N}~f_{n}\left(\xi_{0,n}^{i},\xi_{1,n}^{i},\ldots,\xi_{n-1,n}^{i},\xi_{n,n}^{i}\right)~|~\xi_0,\ldots,\xi_n\right) ~\approx _{N\uparrow \infty }~\frac{\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k< n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k< n}g_{k}(X_{k})\right)} $$

Continuous time GA models

Let $Y_t$ be a continuous time Markov process evolving in some state space $S$, indexed by a continuous time index $ t\in \mathbb{R}_+:=[0,\infty[$. Consider an increasing time mesh $t_0=0< t_1<\ldots < t_{n-1}< t_{n} $ witth a time step $\Delta =t_n-t_{n-1}\approx 0$ and $V~:~x\in S\mapsto \mathbb{R}_+:=[0,\infty[$ some potential function on some state space $S$. We associate with these objects, the Markov chain $X_n$ and the $[0,1]$-valued potential function$g_n$ defined by $$ X_n:= Y_{t_n} \qquad \mbox{and}\qquad g_n(x):=e^{-v(x)\Delta } $$ In this case, choosing $t_n=[t/\Delta]\Delta \approx _{\Delta \downarrow 0} t$ we have the continuous time Feynman-Kac path integral formulae $$ \mathbb{E}\left(f(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) =\mathbb{E}\left(f(Y_{t_n})~ \exp{\left(-\sum_{0\leq k<n}v(Y_{t_k}) ~(t_k-t_{k-1})\right)}\right)\approx _{\Delta\downarrow 0} \mathbb{E}\left(f(Y_{t})~ \exp{\left(-\int_0^tv(Y_{s}) ~ds\right)}\right) $$ When the time step $\Delta \downarrow 0$, the GA equipped with the killing/regeneration transition converge to a continuous time GA process $\xi_t:=(\xi^i_t)_{1\leq i\leq N}$ with a killing rate $v(x)$. When an individual is killed, instantly another individual in the pool duplicates. More general killing/regeneration models associated with signed potential functions are discussed in the series of studies ^{5,16,17,21,23}. In theoretical physics, these killing/regeneration continuous time GA are often termed killing/cloning models , see the article by Angeli-Grosskinsky-Johansen and the one by Lecomte and Tailleur.

Next we state the continuous time versions of some convergence results stated above, more detailed statements and proofs are provided in^{5,17,18,21,23,31}, see also the article of Rousset on Schrödinger's ground states particle estimates . Under some regularity conditions, when the number of individuals (and the computational power) $ N\uparrow \infty $, for any continuous time index $t\in \mathbb{R}_+:=[0,\infty[$ and any function $ f~:~x\in S\mapsto f(x)\in \mathbb {R}$, we have the almost sure convergence results $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f\left(\xi _{t}^{\,i}\right)~\approx _{N\uparrow\infty }~{\frac {\mathbb{E}\left(f(Y_{t})~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}{\mathbb{E}\left(\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}} $$ In addition, we have the unbiased estimate $$ \frac {1}{N} \sum _{1\leq i\leq N}f(\xi _{t}^{\,i}) ~~~\times~~~ \exp{\left(-\int_0^t \frac {1}{N} \sum _{1\leq j\leq N}V(\xi^j_s) ~ds\right)}~\approx _{N\uparrow \infty }~\mathbb{E}\left(f(Y_{t})~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right) $$ The unbiasedness property can be rewritten as follows: $$ \mathbb{E}\left( F(\chi_t) \exp{\left(-\int_0^tV(\chi_{s}) ~ds\right)}\right)~=~\mathbb{E}\left(f(Y_{t})~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right) $$ with the Markov chain $\chi_t=(\xi_t^i)_{1\leq i\leq N}\in S^N$, the function $$ F(\chi_t):= \frac {1}{N} \sum _{1\leq i\leq N}f(\xi _{t}^{i})\quad \mbox{and the potential function}\quad V(\chi_t):= \frac {1}{N} \sum _{1\leq i\leq N} v(\xi _{t}^{i}). $$

Tracing back in time the ancestral lines of each individual $\widehat {\xi }_{t}^{i}$, we define the genealogical tree of the genetic population $$ \widehat {\xi }_{0,t}^{i}\longleftarrow ~\ldots \longleftarrow ~{\widehat {\xi }}_{s,t}^{i}\longleftarrow ~\ldots \longleftarrow {\widehat {\xi }}_{t,t}^{i}={\widehat {\xi }}_{t}^{i} $$ The lower indices $ (.)_{s,t}$ represent the level $s\leq t$ of the ancestor of $\widehat {\xi }_{t}^{i}$. For any function $f_t$ on the space of paths with terminal time horizon $t$ we have $$ \frac {1}{N} \sum _{1\leq i\leq N}f_t\left(\left(\xi _{s,t}^{\,i}\right)_{0\leq s\leq t}\right) \approx _{N\uparrow \infty }~ {\frac {\mathbb{E}\left(f_t\left((Y_{s})_{0\leq s\leq t}\right)~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}{\mathbb{E}\left(\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}} $$

Some Illustrations

Feynman-Kac path integrals arise a variety of scientific disciplines, including in computational physics, biology, information theory and computer sciences. Their different interpretations depend on the application domain. We illustrate these path integration models with some concrete examples taken from the slides MCQMC-2012 slides and the ENS-Lyon-2012 slides:

If we choose the indicator fitness function $g_{n}(x_{n})=1_{A_n}(x_{n})$ of some subset $A_n$ of the state space, they represent the conditional distribution of a Markov chain given it stays in a given tube; that is, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have that $$ \mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~|~X_{0}\in A_0,~\ldots ,X_{n}\in A_n\right)=\displaystyle {\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}\quad {\mbox{and}}\quad \mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)=P\left(X_{0}\in A_0,~\ldots ,X_{n}\in A_n\right)$$ (as soon as the normalizing constant is strictly positive). In addition, we have the unbiased estimate$$ \prod _{0\leq k\leq n}{\frac {1}{N}}\sum _{1\leq i\leq N}1_{A_n}(\xi _{k}^{i})~\approx _{N\uparrow \infty }~Proba\left(X_{0}\in A_0,\ldots ,X_{n}\in A_n\right)$$

Let $X_n$ be a simple random walk on the $2$-dimensional lattice $S:=\mathbb{Z}^2$ starting at the origin. Choosing the historical process $$ {\cal X}_n:=(X_0,\ldots,X_n) \in {\cal S}_n:=S^{(n+1)}\quad \mbox{and the fitness function}\quad {\cal G}_n({\cal X}_n):=1_{S-\{X_0,\ldots,X_{n-1}\}}(X_n)$$ we clearly have the self-avoiding Markov chain formulae $$ \mathbb{E}\left(f_{n}(X_0,\ldots,X_n)~|~\forall 0\leq p< q \leq n\quad X_p\not=X_q\right)= \frac{\mathbb{E}\left(f_{n}({\cal X}_n)~\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)}{ \mathbb{E}\left(\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)} \quad \mbox{and}\quad \mathbb{E}\left(\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)=Proba\left(\forall 0\leq p< q \leq n\quad X_p\not=X_q\right) $$

In filtering problems, the Markov chain $X_{n}$ represents the signal process evolving on some state space $S$. This Markov chain can represent the evolution of some target (missile, plane, boat, and so on). Suppose the state $X_{n}$ is partially observed by some sensor of the form $$ Y_{n}=h(X_{n})+V_{n}$$ for some function $h~:~x\in S\mapsto h(x)\in \mathbb {R} $ and some sequence of random variables with a probability density $g(v)$. We fix the observations $Y_{n}=y_{n}$ and we set $ g_{n}(x)=g\left(y_{n}-h(x)\right)$. In this situation, the Feynman-Kac probability measures defined above coincide with the conditional distributions of the signal given the observation sequence; that is, we have that $$ \mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~|~Y_{0}=y_{0},~\ldots ,Y_{n}=y_{n}\right)=\displaystyle {\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}$$ In addition, we have the unbiased estimate $$ \prod _{0\leq k\leq n}{\frac {1}{N}}\sum _{1\leq i\leq N}g_{k}(\xi _{k}^{i})~\approx _{N\uparrow \infty }~p(y_{0},\ldots ,y_{n})$$ where $p(y_{0},\ldots ,y_{n})$ stands for the probability density of the random variable $(Y_{0},\ldots ,Y_{n})$ evaluated at $(y_{0},\ldots ,y_{n})$. In signal processing and Bayesian inference, the mutation-selection genetic algorithm is also termed particle filter or Sequential Monte Carlo^1,5.

Consider a sequence of Gibbs probability measures $ \pi _{\beta }$ on a finite solution space S indexed by some inverse temperature parameter $ \beta >0$ and defined by $$ \pi _{\beta }(x)={\frac {e^{-\beta V(x)}}{\sum _{y\in S}e^{-\beta V(y)}}}\quad {\mbox{for some function}}~V~:~x\in S\mapsto V(x)\in [0,\infty [$$ These probability measures concentrate to the set of global minima of the function V(.). More precisely, we have $$ V_{\star }:=\inf _{x\in S}V(x)\quad {\mbox{and}}\quad {\mathcal {V}}:=\{x\in S~:~V(x)=V_{\star }\}\Rightarrow ~\forall x\in {\mathcal {V}}\quad \pi _{\beta }(x)={\frac {e^{-\beta \left(V(x)-V_{\star }\right)}}{\sum _{y\in S}e^{-\beta \left(V(y)-V_{\star }\right)}}}={\frac {1}{{\mbox{Card}}({\mathcal {V}})+\sum _{y\in S-{\mathcal {V}}}e^{-\beta \left(V(y)-V_{\star }\right)}}}\approx _{\beta \uparrow \infty }~{\frac {1}{{\mbox{Card}}({\mathcal {V}})}}$$ and for $ x\not \in {\mathcal {V}}$ $$ \pi _{\beta (x)}\approx _{\beta \uparrow \infty }0$$ This implies that $$ \pi _{\beta }(x)~\approx _{\beta \uparrow \infty }~{\frac {1}{{\mbox{Card}}({\mathcal {V}})}}~1_{\mathcal {V}}(x)$$ In other words, sampling a random variable with probability distribution $ \pi _{\beta }(x)$ is almost equivalent to that of sampling uniformly one of the global minima of the function V(.).
Let $ \beta _{n}\geq \beta _{n-1}$ be a increasing sequence of parameters with $ \beta _{0}=0$. Consider a Markov chain $X_{n}$ starting with the uniform distribution $ \pi _{0}(x)=1/{\mbox{Card}}(S)$ on $ S$, and such that $ \pi _{\beta _{n}}$ is an invariant probability measure of the transition $X_{n-1}\rightarrow X_{n}$ at time n; that is, for any $n\geq 1$ we have that $$ \pi _{\beta _{n}}(x)=\sum _{y\in S}~Proba(X_{n}=x~|~X_{n-1}=y)~\pi _{\beta _{n}}(y)$$ These Markov transitions can be designed using Markov Chain Monte Carlo (MCMC) methods such as the Metropolis-Hasting algorithm, or the Gibbs sampler. We consider the fitness function $$ g_{n}(x):=\exp {\left[-(\beta _{n+1}-\beta _{n})V(x)\right]}=\frac{\gamma_{\beta_{n+1}}(x)}{\gamma_{\beta_{n}}(x)} \quad\mbox{with}\quad \gamma _{\beta_n }(x)=e^{-\beta_n V(x)}~\lambda(x) \quad\mbox{and the counting measure}\quad \lambda(x):=\frac {1}{\mbox{Card}(S)} $$ In the above displayed formulae $ \mbox{Card}(S)$ stands for the cardinality of the set $S$. In this situation, for any function $f~:~x\in S\mapsto f(x)\in \mathbb {R} $, we have $$ \frac{ \mathbb{E}\left(f(X_{n})\prod _{0\leq k<n}g_{k}(X_{k})\right) }{ \mathbb{E}\left(\prod _{0\leq k<n}g_{k}(X_{k})\right) } = \sum _{x\in S}~f(x)~\pi _{\beta _{n}}(x) \quad \mbox{and}\quad \mathbb{E}\left(\prod _{0\leq k<n}g_{k}(X_{k})\right) =\frac {1}{\mbox{Card}(S)}~\sum _{x\in S}~e^{-\beta _{n}V(x)}=\sum _{x\in S}~\gamma_{\beta_{n}}(x) $$ The corresponding genetic algorithm can be seen as an interacting simulated annealing algorithm. It is defined in terms of MCMC mutations, the selection transitions consists of selecting the particles which are better adapted to the change of temperature. At every time step, we have the particle estimates $$ \frac {1}{N}\sum _{1\leq i\leq N}~f\left(\xi _{n}^{i}\right) ~\approx _{N\uparrow \infty } ~\sum _{x\in S}~f(x)~\pi _{\beta _{n}}(x) \quad \mbox{and the unbiased approximation}\quad \prod _{0\leq k<n}\frac {1}{N}\sum _{1\leq i\leq N}\exp { \left[-(\beta _{k+1}-\beta _{k})V(\xi _{k}^{i})\right] }~\approx _{N\uparrow \infty }~ \frac {1}{\mbox{Card}(S)}\sum _{x\in S}~e^{-\beta _{n}V(x)} $$ In particular, this implies that $$ \lim _{n\uparrow \infty }\beta _{n}=\infty ~\Longrightarrow ~\lim _{n\uparrow \infty }\lim _{N\uparrow \infty }{\frac {1}{N}}\sum _{1\leq i\leq N}~f\left(\xi _{n}^{i}\right)={\frac {1}{{\mbox{Card}}({\mathcal {V}})}}\sum _{x\in {\mathcal {V}}}~f(x) $$ The choice of the parameters $\beta_n$ as well as the number of MCMC/mutation moves can be done in an adaptive way²⁰.

Let $\lambda(x)=Proba(Z=x)$ be some probability distribution of some random variable $Z$ on a finite set $S$, that is $\lambda(x)\geq 0$ and $\sum_{x\in S}\lambda(x)=1$. Choose a decreasing sequence of subsets $A_{n+1}\subset A_n\subset A_0=S$ such that $\sum_{x\in A_n}\lambda(x)>0$, denote by $1_{A_n}$ the indicator function of the subset $A_n$ and set $$ \gamma_n(x):= 1_{A_n}(x)~\lambda(x)\qquad \eta_n(x):=\frac{ \gamma_n(x)}{\sum_{y\in S} \gamma_n(y)}=\frac{1_{A_n}(x)~\lambda(x)}{\sum _{y\in A_n}\lambda(y)}=Proba(Z=x~|~Z\in A_n) \quad \mbox{and the fitness function}\quad g_{n}(x):=\frac{\gamma_{n+1}(x)}{\gamma_{n}(x)}=1_{A_{n+1}}(x) $$ Consider a Markov chain $X_{n}$ starting with the uniform distribution $ \eta _{0}(x)$ on $ S$, and such that $ \eta_n(x)$ is an invariant probability measure of the transition $X_{n-1}\rightarrow X_{n}$ at time n; that is, for any $n\geq 1$ we have that $$ \eta_n(x)=\sum _{y\in S}~Proba(X_{n}=x~|~X_{n-1}=y)~ \eta_n(y)$$ These Markov chains are sometimes shakers of the sets $A_n$; starting from a random variable $X_{n-1}$ distributed with $\eta_n$ on the set $A_n$, after a local random move the random variable $X_{n}$ is also distributed with $\eta_n$ on the set $A_n$. In this situation, for any function $f~:~x\in S\mapsto f(x)\in \mathbb {R} $, we have $$ \frac{ \mathbb{E}\left(f(X_{n})\prod _{0\leq k<n}g_{k}(X_{k})\right) }{ \mathbb{E}\left(\prod _{0\leq k<n}g_{k}(X_{k})\right) } = \sum _{x\in S}~f(x)~\eta_n(x) \quad \mbox{and}\quad \mathbb{E}\left(\prod _{0\leq k<n}g_{k}(X_{k})\right) =\sum _{x\in S}~\gamma_{n}(x)=\sum _{x\in A_n}\lambda(x)=Proba(Z\in A_n) $$ The corresponding genetic algorithm can be seen as an interacting MCMC/subset shaker. It is defined in terms of mutation/shakers of the set $A_n$, the selection transitions consists of selecting the particles which have succeeded to enter into the next level $A_{n+1}$. At every time step, we have the particle estimates $$ \frac {1}{N}\sum _{1\leq i\leq N}~f\left(\xi _{n}^{i}\right) ~\approx _{N\uparrow \infty } ~\sum _{x\in S}~f(x)~\eta_n(x)= \mathbb{E}\left(f(Z)~|~Z\in A_n\right) \quad \mbox{and the unbiased approximation}\quad \prod _{0\leq k<n}\frac {1}{N}\sum _{1\leq i\leq N}~1_{A_{k+1}}(\xi^i_k)~\approx _{N\uparrow \infty }~Proba(Z\in A_n) $$ The above methodology applies to non necessarily finite state space models and general probability measures on decreasing subsets. For instance, we can choose a multidimensional random variable $Z$ on $S= \mathbb{R}^d$ and some observable function $\varphi~:~z\in \mathbb{R}^d\mapsto \varphi(x) \in [0,\infty[$ such that $S=\{z~:~\varphi(z)\geq 0\}$. Given a increasing sequence of parameters $\beta_0=0< \beta_n< \beta_{n+1}$ we set $$ A_n=\{z\in S~:~\varphi(z)>\beta_n\} $$ The above rather elementary example coincides with the so-called Subset simulation method discussed in reliability engineering, uncertainty propagation literature as well as in rare event simulation (see also the article¹⁴, section 1.5 in the article¹⁹, section 2.2. in the article ¹² as well as section 4.3.4 in the book¹⁷) and multi-level splitting in rare event simulation. The choice of the parameters $\beta_n$ as well as the number of MCMC/mutation moves can be done in an adaptive way²⁰.

^{1.
Non Linear Filtering: Interacting Particle Solution.
Del Moral, Pierre. Markov Processes and Related Fields, (4): pp. 555--580 (1996). [preprint]
↩}

^{2.
Measure Valued Processes and Interacting Particle Systems. Application to Non Linear Filtering Problems.
Del Moral, Pierre. Annals of Applied Probability, vol. 8 , no. 2, pp. 438-495 (1998).

↩}

^{3. On the stability of Measure Valued Processes with Applications to filtering.
Del Moral, Pierre; Guionnet Alice. C.R. Acad. Sci. Paris.
t. 329, Série I, pp. 429--434 (1998).
↩}

^{4. On the Convergence and the Applications of the Generalized Simulated Annealing.
Del Moral, Pierre; Miclo, Laurent. SIAM Control and Opt.,
vol. 37, no. 4, pp. 1222--1250 (1999).
↩}

^{5. Branching and Interacting Particle Systems Approximations of Feynman-Kac Formulae.
Del Moral, Pierre; Miclo, Laurent. Séminaire de Proba. XXXIV. Lecture Notes in Maths, Springer, Vol. 1729, pp. 1-145 (2000).

↩}

^{6. On the stability of interacting processes with applications to filtering and genetic algorithms. Del Moral, Pierre; Guionnet Alice. Annales de l'Institut H. Poincaré Prob. & Stats., vol. 37, No. 2, pp. 155--194 (2001).
↩}

^{7. Asymptotic Results for Genetic Algorithms.
Del Moral, Pierre; Miclo, Laurent. (Natural Computing Series) - Theoretical Aspects of Evolutionary Computing, Springer (2001).
↩}

^{8. Open-loop regulation and tracking control based on a genealogical decision tree.
Najim, Karim, Ikonen, Enzo; Del Moral, Pierre. Neural Computing and Applications.
vol. 15, no. 3/4, pp. 339--349 [ IFAC (2005). ]
↩}

^{9. Genetic Genealogical models in rare event analysis. Cérou, Frederic; Del Moral, Pierre; Le Gland Francois; Lézaud, Pascal Alea 1, pp. 181-203 (2006).
↩}

^{10. The convergence to equilibrium of neutral genetic models
Del Moral, Pierre; Miclo, Laurent; Patras, Frédéric; Rubenthaler, Sylvain. Stochastic Analysis and Applications.
vol. 28, no. 1, pp. 123-143 (2009).
↩}

^{11. A Backward Particle Interpretation of Feynman-Kac Formulae.
Del Moral, Pierre; Doucet, Arnaud; Singh, Sumeet. M2AN.
vol 44, no. 5, pp. 947--976 (2010).
↩}

^{12. On the concentration properties of Interacting particle processes. Del Moral, Pierre; Hu, Peng; Wu, Li Ming. Foundations and Trends in Machine Learning, Vol. 3, No. 3 - 4, 225--389 (2012).

↩}

^{13. On the Foundations and the Applications of Evolutionary Computing Del Moral, Pierre; Tantar, Alexandru-Adrian; Tantar, Emilia. Studies in Computational Intelligence, Springer vol 447. pp. 3-89 (2013).

↩}

^{14. An island particle Markov chain Monte Carlo algorithm for safety analysis. Vergé, Christelle; Morio, Jérôme; Del Moral, Pierre. Reliability Engineering & System Safety.
vol. 149, pp. 63-75 (2016) .

↩}

^{15. On particle Gibbs samplers. Del Moral, Pierre; Kohn, Robert; Patras, Frédéric. Annales de l'Institut Henri Poincaré Probab. & Statist.
vol 52, no. 4, pp. 1687--1733 (2016).
↩}

^{16. Stochastic Processes: From Applications to Theory. Del Moral, Pierre; Penev, Spiridon. Chapman & Hall/CRC Press (2014). [some slides]

↩}

^{17.
Mean field simulation for Monte Carlo integration. Del Moral, Pierre. Chapman & Hall/CRC Press (2013).

↩}

^{18.
Feynman-Kac formulae. Genealogical and interacting particle approximations. Del Moral, Pierre. Springer, series: Probability and Its Applications (2004).

↩}

^{19. Particle Methods: An introduction with applications. Del Moral, Pierre. ESAIM Proceedings, Journées MAS (2012).

↩}

^{20.
On Adaptive Resampling Procedures for Sequential Monte Carlo Methods. Del Moral, Pierre; Doucet, Arnaud; Jasra, Ajay. Bernoulli.
vol. 18, no. 1, pp. 252--278 (2012).

↩}

^{21.
Particle approximations of Lyapunov exponents connected to Schrödinger operators and Feynman-Kac semigroups. Del Moral, Pierre; Miclo, Laurent. ESAIM: Probab. and Stats.
t. 7, pp. 171--208 (2003).

↩}

^{22. Particle Motions in Absorbing Medium with Hard and Soft Obstacles. Del Moral, Pierre; Doucet, Arnaud. Stochastic Analysis and Applications.
vol. 22, no. 5, pp. 1175--1207 (2004). [ [Preprint version]

↩}

^{23.

A duality formula and a particle Gibbs sampler for continuous time Feynman-Kac measures on path spaces. Arnaudon, Marc; Del Moral, Pierre. Electronic Journal of Probability, 25, pp. 1--54 (2004).

↩}

^{24.
Discrete Filtering Using Branching and Interacting Particle Systems. Crisan, Dan; Del Moral, Pierre; Lyons, Terry. Markov Processes and Related Fields.
vol. 5, no. 3, pp. 293--318 (1999).

↩}

^{25.
A Moran particle system approximation of Feynman-Kac formulae. Del Moral, Pierre; Miclo, Laurent. Stochastic Processes and their Applications,
vol. 86, no.2, pp. 193--216 (2000). [preprint]

↩}

^{26.
Annealed Feynman-Kac Models. Del Moral, Pierre; Miclo, Laurent. Communications in Mathematical Physics (2003). [preprint]

↩}

^{27.
A Lyapunov approach to stability of positive semigroups: An overview with illustrations Arnaudon, Marc; Del Moral, Pierre; Ouhabaz, El Maati . Stochastic Analysis and Applications (2023). [preprint]

↩}

^{28.
Branching and interacting particle interpretation of rare event probabilities.
Del Moral, Pierre; Lézaud, Pascal Stochastic Hybrid Systems. Springer-Verlag, Heidelberg.
Theory and Safety Critical Applications, eds. H. Blom and J. Lygeros (2006).
↩}

^{29.
Modeling genetic algorithms with interacting particle systems.
Del Moral, Pierre; Kallel, Leila; Rowe, Jonathan E. Revista de Matematica, Teoria y aplicaciones, vol.8, no. 2 (2001).
↩}

^{30.
On the Stability of Positive Semigroups. Del Moral, Pierre; Horton, Emma; Jasra, Ajay. Annals of Applied Probability (to appear).
(2023).
↩}

^{31. Strong propagations of chaos in Moran's type particle interpretations of Feynman-Kac measures. Del Moral, Pierre; Miclo, Laurent. Stochastic Analysis and Applications.
vol. 25, no. 3, 519--575 (2007).
↩}

^{32. Sequential Monte Carlo Samplers. Del Moral, Pierre; Doucet, Arnaud; Jasra, Ajay. Journal of the Royal Statistical Society, Series B.,
vol. 68, no. 3, pp. 411--436 (2006).
↩}

^{33. On a class of genealogical and interacting Metropolis Models. Del Moral, Pierre; Doucet, Arnaud. J Séminaire de Probabilités. Lecture Notes in Mathematics.
no. XXXVII, 1832, Springer, pp. 415--446 (2003).
↩}

^{34. The Monte-Carlo Method for filtering with discrete-time observations. Del Moral, Pierre; Jacod, Jean; Philip Protter Probability Theory and Related Fields.
vol. 120, no. 3, pp. 346--368 (2001).
↩}

^{35. Some recent improvements to importance splitting. F Cerou, P Del Moral, A Guyader, F Le Gland, P Lezaud, H Topart Proceedings of 6th International Workshop on Rare Event Simulation, Bamberg (2006).
↩}