[ Main Page | Research | Publications | Management | Grants | Industrial contracts | Courses | Conferences | Collaborators ] |
Evolutionary type algorithms are an essential part of the Artificial Intelligence as well as Computational Intelligence and Machine Learning toolkits^{7,13,29}. The first record of the proposal to mutation/selection learning machines is probably that of Alan Turing in the article on Computing Machinery and Intelligence published in 1950. Since this period, Genetic Algorithms (abbreviated GA) and their variants have been used with success in a variety of fields; a detailed list of genetic algorithm real-world applications is provided in Wikipedia . Since 2004, the Genetic and Evolutionary Computation Conference boasts the "Humies" competition awarding prizes to human-competitive results produced by any form of genetic and evolutionary computation. Besides their numerical ability to generate high-quality solutions to mathematical optimization and solution search problems, evolutionary algorithms are very often presented on specific problem instances with limited regard to their rigorous mathematical foundations, without any rigorous analysis of the underlying random evolutionary processes. Their performance often relies on intuition and empirical evidence driven by numerical observations, so that evolutionary computing research sometimes falls in obscure inspired paradigms and a crucial question arise:
In the context of optimization problems this question doesn't seem so critical. Population-based algorithms are used as hole punchers for drilling and perforating complex solution state spaces. If you can move mountains you can move molehills; the central idea here is to "mimic" natural-evolution-type mechanisms "inspired by Darwin's theory of evolution in Nature" to improve gradually step by step the population adaptation without the need of the fitness derivative information. For convex optimization problems, the performance and convergence of the algorithm follows from standard analysis. For more complex optimization problems with local minima, we expect large population explorations to escape from suboptimal or inadequate solutions.
When the population size is sufficiently large but finite, Genetic Algorithms equipped with an inverse temperature pressure and vanishing mutations behave as a generalized simulated annealing algorithm on product spaces^{4}, see also sections 4 and 5 in the review article on genetic algorithms^{7}. This approach allows to develop a global optimization theory based on generalized simulated annealing theory on product spaces, see for instance the articles^{4,7} and the pioneering article by Cerf published in 1995.
Leaving aside computational cost considerations, the population size can be seen as a precision parameter, larger population increases the chance to find the optimal solution and prevent being stuck in local optima. Increasing the computational power with the population size should significantly improve the optimization performance. The concentration properties of the infinite population GA are discussed in the article on annealed Feynman-Kac Models^{26}.
Most of genetic type evolutionary computing algorithms can be seen as an advanced sophisticated Monte Carlo method that converges towards some particular probability distributions (a.k.a. the infinite population model) as the number of individuals/samples tends to $\infty$.
For instance, as shown in the first rigorous article on this subject^{1} published in 1996,
when the number of samples tends to $\infty$, the occupation measures of the simple Genetic Algorithm , the most popular type of evolutionary type algorithm , converge towards a conditional probability measure w.r.t. a series of conditioning events that depends on the fitness functions. Simple GA with mutation/selection transitions can be interpreted as a
mean-field particle methodology
to sample Feynman-Kac probability measures . These probability measures represent the distribution of the paths of a reference Markov process, weighted by a collection of potential/fitness functions^{5,12,16}. These functional models are natural mathematical extensions of the traditional change of probability measures, commonly used in importance sampling. For regulation problems, and open loop optimal control problems, these two viewpoints can also be encapsulated in a single mathematical framework^{8}.
This GA Monte Carlo sampling methodology and their limiting Feynman-Kac probability measures (in discrete time as well as in continuous time^{5,16,17,21,23}) encapsulates in a single mathematical framework a variety of algorithms known by many other names and often used as natural heuristics since the beginning of the 1950s in different scientific disciplines, to name a few:
Most of the terminology encountered in the literature arises from the application domains as well as with the branching/genetic evolution of the Monte Carlo methodology. As underlined in the articles^{25,30}, from a mathematical viewpoint, only the terminologies "Fleming-Viot" and "Moran" are misleading. The reasons are two-fold: firstly, the Moran particle model is a finite population model that converges as the number of particles tends to infinity towards a stochastic Fleming Viot super process and secondly, the genetic noise arising in the limit requires a Moran finite population process with symmetric-selection jump rates. In the context of GA, the selection/killing-jump rates are clearly far from being symmetric, the empirical measures of the finite population model are biased and the limiting Feynman-Kac semigroup, as the number of particles tends to infinity, is purely deterministic.
Contrary to much of what is written in the literature on evolutionary algorithms, the GA methodologies discussed above belong to the class of (mean field type) Monte Carlo methods. They involve generating random samples/particles from a probability distribution. As the law of large numbers or as the ergodic theorem describing the behavior of sample averages, these GA Monte Carlo methodologies cannot be classified as metaheuristics.
These GA Monte Carlo techniques have been successfully applied in many different fields including biology, risk analysis, computer vision, signal processing, tracking, optimal control, econometrics, finance, robotics, and statistical inference. For a more detailed discussion we refer to the books^{16,17,18}, see also the applications to statistical learning and rare event simulation discussed in the slides: MCQMC-2012 slides and the ENS-Lyon-2012 slides.
This webpage presents a brief pedagogical introduction to some of the probabilistic concepts and convergence results currently popular in discrete generation GA Monte Carlo particle-type samplers. It is by no means exhaustive and it is mainly based on our own work with close collaborators. More detailed references can be found on the complementary sites on Feynman-Kac probability measures and their genetic-type particle interpretations. The material covered includes unnormalized and unbiased measures, infinite population models, limiting quasi-invariant measures, parallel island models, genealogical tree models as well as backward ancestral measures. We also present some illustrations related to Markov chain conditioning, filtering and Boltzmann-Gibbs measures. The mathematical concepts described below provide a very natural and unifying mathematical basis for a large class of genetic type Monte Carlo algorithms. We hope that this unifying point of view will help to develop fruitfully this field further.
Suppose we are given some Markov process $X_{n}$ evolving at every time step $n\in \mathbb {N} $ in some state/solution space $S_n$ which may vary with the time parameter $ n\in \mathbb {N}$. This first basic ingredient represents the mutation transitions of the genetic algorithm. Without selection, the GA with $N$ individuals behaves as $N$ independent copies of the Markov chain $X_{n}$.
Let $ g_{n}~:~x\in S_n\mapsto g_{n}(x)\in \mathbb {[} 0,\infty ]$ be some collection of bounded fitness functions on the solution space $S_n$ indexed by $ n\in \mathbb {N} $. This second ingredient represents the fitness of the selection transitions of the genetic algorithm.
Consider a population of $N$ individuals represented by $N$ independent copies $ \xi _{0}:=\left(\xi _{0}^{i}\right)_{1\leq i\leq N}\in S_0^N$ of some random variable $ X_{0}$ on $S_0$.
The simple genetic algorithm associated with the Markov chain $X_n$ and the fitness function $g_n$ is defined the selection-mutation transitions given by the synthetic diagram
The upper index $(.)^i$ represents the label of the $i$-th individual, the lower index $(.)_n$ represents the time parameter $n\in \mathbb {N}$, and $S_n^N=S_n\times\ldots\times S_n$ stands for the $N$-fold Cartesian product space of the set $S_n$.
During the selection transition:
Given the random variables $\xi _{n}:=\left(\xi _{n}^{\,i}\right)_{1\leq i\leq N}\in S_n^{N}$ at time $n\in \mathbb {N} $, sample N (conditionally) independent random variables $\widehat{\xi}_{n}:=\left(\widehat {\xi}_{n}^{\,i}\right)_{1\leq i\leq N}$ with common (conditional) distribution
In other words, if $ g_{n}(\xi _{n}^{\,j})$ is the fitness of the $j$-th individual $ \xi _{n}^{\,j}$ in the population $ \xi _{n}$, its probability of being selected in the next generation $\widehat {\xi }_{n}$ is equal to $\displaystyle\frac {g_{n}(\xi _{n}^{\,j})}{\sum _{1\leq k\leq N}g_{n}(\xi _{n}^{\,k})}$.
This selection transition is also called the fitness-proportionate selection a.k.a. the roulette-wheel selection. A proportion of the wheel is assigned to each of the $N$ possible selected individual $\xi _{n}^{\,j}$ based on their fitness values $g_{n}(\xi _{n}^{\,j})$, with $j\in\{1,\ldots,N\}$. As shown above, the proportion of the wheel is achieved by dividing the fitness of a selection by the total fitness of all the selections. Every random selection is made by rotating the wheel. Several branching variants are developed in the articles^{7,24}, see also chapter 11 in the book on Feynman-Kac formulae^{18}
For $[0,1]$-valued fitness functions an alternative acceptance-rejection type selection transition can be underlined:
For each $ 1\leq i\leq N$, with a probability $g_{n}(\xi _{n}^{\,i})$ we accept the individual; that is, we set $\widehat {\xi }_{n}^{\,i}=\xi _{n}^{\,i}$. With a probability $(1-g_{n}(\xi _{n}^{\,i}))$ we sample a new individual ${\widetilde {\xi }}_{n}^{\,i}$ in the current population with the probability measure $$ Proba\left({\widetilde {\xi }}_{n}^{\,i}=\xi _{n}^{\,j}~|~\xi _{n}\right)={\frac {g_{n}(\xi _{n}^{\,j})}{\sum _{1\leq k\leq N}g_{n}(\xi _{n}^{\,k})}},~~~~{\mbox{for each}}~j\in \{1,\ldots ,N\}\qquad\mbox{and we set}\qquad \widehat {\xi }_{n}^{\, i}=\widetilde {\xi }_{n}^{\, i}.$$ This selection scheme can be interpreted as an acceptance-rejection sampler equipped with a recycling mechanism. It can also be seen as a killing/regeneration transition with killing rate $(1-g_n(x))$. In this interpretation, with probability $(1-g_{n}(\xi _{n}^{\,i}))$ the $i$-th individual is killed and instantly replaced by a new individual ${\widetilde {\xi }}_{n}^{\,i}$ chosen in the pool, with a probability proportional to the fitness of the individuals.
During the mutation transition:
From each selected particle $\widehat {\xi }_{n}^{\, i}\in S_n$ we sample independently a transition $\widehat {\xi }_{n}^{\, i}\in S_n~\longrightarrow \xi _{n+1}^{\, i}\in S_{n+1}$ of the Markov chain $ X_{n}\in S_n~\rightarrow ~X_{n+1}\in S_{n+1}$ starting at $X_{n}=\widehat {\xi }_{n}^{\, i}$, for $i\in \{1,\ldots ,N\}$.
Under some regularity conditions, when the number of individuals (and the computational power) $ N\uparrow \infty $, for any time index $n\in \mathbb{N}$ and any function $ f_n~:~x\in S_n\mapsto f_n(x)\in \mathbb {R}$, we have the almost sure convergence results $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left(\xi _{n}^{\,i}\right)~\approx _{N\uparrow\infty }~{\frac {\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k<n}g_{k}(X_{k})\right)}}\qquad {\mbox{as well as}}\qquad {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left({\widehat {\xi }}_{n}^{\,i}\right)~\approx _{N\uparrow \infty }~{\frac {\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}$$ In addition, we have the unbiased estimate $$ \frac {1}{N} \sum _{1\leq i\leq N}f_n(\xi _{n}^{\,i}) ~~~\times~~~ \left( \prod _{0\leq k<n}{\frac {1}{N}}\sum _{1\leq j\leq N}g_{k}(\xi _{k}^{\,j})\right)~\approx _{N\uparrow \infty }~\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ The unbiasedness property refers to the fact that the expected values of the estimator coincides with the limiting value; that is, we have that the so-called unnormalized formulae $$ \mathbb{E}\left( \frac {1}{N} \sum _{1\leq i\leq N}f_n(\xi _{n}^{\,i}) ~~~\times~~~ \left( \prod _{0\leq k<n}{\frac {1}{N}}\sum _{1\leq j\leq N}g_{k}(\xi _{k}^{\,j})\right)\right)~=~\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ In statistics and probability theory the limiting expectations in the right hand side displays are called Feynman-Kac formulae. The first rigorous proof of the above convergence results including the unbiasedness property and the design of adaptive selection rules can be found in the article ^{1} in the context of nonlinear filtering published in 1996, see also the series of articles ^{2,3,4,5,6,12}. Under some stability conditions, the above convergence results can be turned into uniform in time estimates^{3,5,6}, see also the time-uniform exponential concentration inequalities developed in the books^{12,17,18}
.Mutation transitions that depend on the fitness functions (a.k.a. lookahead strategies) are also discussed in the article on nonlinear filtering^{2}, see also the interacting Kalman filter methodology (a.k.a. Rao-Blackwellized particle filter) presented in section 12.6.7 in the book on Feynman-Kac formulae^{18} as well as section 8.3 in the book on Mean field simulation techniques^{18}.
The GA with roulette-wheel selection and unit fitness function coincides with the neutral genetic model^{10}. In this context, it is possible to design an explicit, finite, exact, genealogical tree based representation of stationary populations that holds both for finite and infinite types (or alleles) models.
The link between Poisson branching processes and the multinomial branching interpretation of GA is provided in section 2.3 in the article discrete generation branching interpretations of the filtering equation^{24}, see also the branching random walk method introduced by J.B. Anderson for solving the Schrödinger equation for molecular wavefunctions as well as the continuous time branching models discussed in section 6.2 in the article^{7}.
The stability properties of the positive semigroups associated with these infinite population models are developed in some details in the review article^{27}. Their concentration properties around the global minima of the fitness function are discussed in the article on annealed Feynman-Kac Models^{26}, see also chapter 6 in the book^{18}.
The limiting Feynman-Kac probability measures discussed above are closely related to the so-called infinite population model arising in evolutionary computation literature. In this connection, we underline that these probability measures and the filtering equations discussed in the articles on the convergence properties of particle filters^{1,2,3,5,6} as well as the ones in the articles by Gordon-Salmon and Smith on the Bootstrap filter and by Kitagawa on the Monte Carlo filter coincide with the infinite population GA evolution equations (without crossover) discussed by Ermakov and Zhiglyavskii in an article on GA published in 1984 (see equation (6) as well as equations (2.5) and (2.13) in the earlier article by Karlin published in 1979), as well as in the article by Qi and Palmeri (see equation (4)), the one by Vose (see section 6) and more recently in the article by Song and Li . This article also discusses issues and wrong convergence proofs in a widely cited studies on genetic algorithms in continuous space (see equation (1)). The commonly made mistake is to apply de Finetti theorem for finite sequences without ensuring the asymptotic independence of finite sequences (a.k.a. propagations of chaos properties). As expected and as mentioned earlier, Particle filters, Bootstrap filters as well as Monte Carlo filters also coincide with the GA with roulette-wheel selection discussed above as well as in the series of articles presented above.
As the population size $N$ goes to infinity, the occupation measures of the individuals before and after selection converge to deterministic probability distributions denoted respectively by $\eta_n(dx)$ and $\widehat{\eta}_n(dx)$; that is, for any function $ f_n~:~x\in S_n\mapsto f_n(x)\in \mathbb {R}$, we have the almost sure convergence results $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left(\xi _{n}^{\,i}\right)~\approx _{N\uparrow\infty }~ \int~f_n(x)~\eta_n(dx) \qquad {\mbox{as well as}}\qquad {\frac {1}{N}}\sum _{1\leq i\leq N}~f_n\left({\widehat {\xi }}_{n}^{\,i}\right)~\approx _{N\uparrow \infty }~\int~f_n(x)~\widehat{\eta}_n(dx) $$ For time homogeneous and finite type state spaces $S$, these probability are represented by vectors $x\in S\mapsto \eta_n(x)\in [0,1]$ and $x\in S\mapsto \widehat{\eta}_n(x)\in [0,1]$ such that $\sum_{x\in S} \eta_n(x)=1=\sum_{x\in S} \widehat{\eta}_n(x)$. In this context if we choose the indicator function $f_n(x)=1_{A}(x)$ of some subset $A$ of the state space $S$, then $$ \eta_n(A):=\sum_{x\in S}~1_{A}(x)~\eta_n(x)\quad\mbox{can be seen as the limiting proportion in the population consisting of individuals corresponding to types in $A$} $$ Obviously, the probability distributions $\eta_n(dx)$ and $\widehat{\eta}_n(dx)$ coincide with the Feynman-Kac measures discussed above. They are described sequentially by so-called infinite population evolution equations $$ \eta_n(dx)~~~~{\stackrel {\mbox{selection $~~~~~~~~~~~~$}}{-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!\longrightarrow }}~~~~ \widehat{\eta}_n(dx)~~~~~{\stackrel {\mbox{mutation $~~~~~~~~~~~~$}}{-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!-\!\!\!\!\longrightarrow }}~~~ \eta_{n+1}(dx)$$ The probability distributions in the $n$-th generation described above are calculated sequentially by the integral formulae $$ \widehat{\eta}_n(dx)=\frac{g_n(x)~\eta_n(dx)}{\int~g_n(z)~\eta_n(dz)} \qquad {\mbox{and}}\qquad \eta_{n+1}(dx)=\int~\widehat{\eta}_n(dz)~M_{n+1}(z,dx) $$ with the transition probability of the mutation process $$ M_{n+1}(z,dx):= Proba(X_{n+1}\in dx~|~X_{n}=z) $$
For time homogeneous models $(g_n,M_n)=(g,M)$, the infinite population evolution equations discussed above take the following form $$ \widehat{\eta}_n(dx)=\Psi_{g}(\eta_n)(dx):=\frac{g(x)~\eta_n(dx)}{\int~g(z)~\eta_n(dz)} \qquad {\mbox{and}}\qquad \eta_{n+1}(dx)=(\widehat{\eta}_nM)(dx):=\int~\widehat{\eta}_n(dz)~M(z,dx) $$ In a more synthetic form, we have $$ \widehat{\eta}_n=\widehat{\Phi}\left(\widehat{\eta}_{n-1}\right):=\Psi_{g}\left(\widehat{\eta}_{n-1}M\right) \quad\mbox{and}\quad \eta_n=\Phi\left(\eta_{n-1}\right):=\Psi_{g}(\eta_{n-1})M $$ Under some stability conditions ^{3,5,6,10,21,22} (see also the books ^{17,18} and the more recent articles^{27,30}) whenever the time parameter $n\rightarrow\infty$ the above recursions converge (exponentially fast) toward a single limiting probability measure $$ \widehat{\eta}_n \longrightarrow_{n\rightarrow\infty} \widehat{\eta}_{\infty} =\widehat{\Phi}\left(\widehat{\eta}_{\infty}\right) \quad\mbox{and}\quad \eta_n\longrightarrow_{n\rightarrow\infty}\eta_{\infty}=\Phi\left(\eta_{\infty}\right) $$ These limiting probability measures are clearly connected by the formulae $$ \widehat{\eta}_{\infty} = \Psi_{g}\left(\eta_{\infty}\right) \quad\mbox{and}\quad \eta_{\infty}= \widehat{\eta}_{\infty}M $$ Assume that the mutation transition $M(x,dy)$ is reversible with respect to some probability measure $\mu(dx)$; that is, we have that $$ \mu(dx)~M(x,dy)=\mu(dy)~M(y,dx) $$ Also assume there exists some function $h(x)\geq 0$ and some parameter $\lambda>0$ such that $$ g(x)~\int~M(x,dz)~h(z)=\lambda~ h(x) $$ In this situation, as shown for instance in Section 12.4 in the book^{18} we have $$ \widehat{\eta}_{\infty}(dx)=\frac{h(x)}{\int h(z)\mu(dz)}~\mu(dx) $$
The unbiasedness property can be rewritten as follows: $$ \mathbb{E}\left( F_n(\chi_n) \prod _{0\leq k<n}G_k(\chi_k)\right)~=~\mathbb{E}\left(f_n(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ with the Markov chain $\chi_n=(\xi_n^i)_{1\leq i\leq N}\in S_n^N$, the function $$ F_n(\chi_n):= \frac {1}{N} \sum _{1\leq i\leq N}f_n(\xi _{n}^{i})\quad \mbox{and the fitness function}\quad G_n(\chi_n):= \frac {1}{N} \sum _{1\leq i\leq N}g_n(\xi _{n}^{i}) $$ The simple genetic algorithm associated with the Markov chain $\chi_n$ and the fitness function $G_n(\chi_n)$ can be interpreted as an island genetic algorithm^{12,13,14} (a.k.a. double bootstrap of $SMC^2$ in filtering theory and Bayesian inference).
Tracing back in time the ancestral lines of each individual $\widehat {\xi }_{n}^{i}$ in the population after the $n$-th selection, we define the genealogical tree of the genetic population $$ \widehat {\xi }_{0,n}^{i}\longleftarrow ~{\widehat {\xi }}_{1,n}^{i}\longleftarrow ~\ldots \longleftarrow {\widehat {\xi }}_{k-1,n}^{i}\longleftarrow {\widehat {\xi }}_{k,n}^{i}\longleftarrow \ldots \longleftarrow{\widehat {\xi }}_{n-1,n}^{i}\longleftarrow {\widehat {\xi }}_{n,n}^{i}={\widehat {\xi }}_{n}^{i} $$ The lower indices $ (.)_{k,n}$ represent the level $k$ of the ancestor of $\widehat {\xi }_{n}^{i}$.
When the number of individuals (and the computational power) $ N\uparrow \infty $, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have the Monte Carlo approximation $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_{n}\left({\widehat {\xi }}_{0,n}^{i},{\widehat {\xi }}_{1,n}^{i},\ldots ,{\widehat {\xi }}_{n-1,n}^{i},{\widehat {\xi }}_{n,n}^{i}\right)~\approx _{N\uparrow \infty }~{\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}$$
In the same vein, tracing back in time the ancestral lines of each individual $\xi_{n}^{i}$ in the population after the $n$-th mutation, we define the genealogical tree of the genetic population $$ \xi_{0,n}^{i}\longleftarrow ~\xi_{1,n}^{i}\longleftarrow ~\ldots \longleftarrow\xi_{k-1,n}^{i}\longleftarrow \xi_{k,n}^{i}\longleftarrow \ldots \longleftarrow\xi_{n-1,n}^{i}\longleftarrow \xi_{n,n}^{i}=\xi_{n}^{i} $$ The lower indices $ (.)_{k,n}$ represent the level $k$ of the ancestor of $\xi_{n}^{i}$
When the number of individuals (and the computational power) $ N\uparrow \infty $, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have the Monte Carlo approximation $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f_{n}\left(\xi_{0,n}^{i},\xi_{1,n}^{i},\ldots,\xi_{n-1,n}^{i},\xi_{n,n}^{i}\right)~\approx _{N\uparrow \infty }~{\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k< n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k< n}g_{k}(X_{k})\right)}}$$
In addition, we have the unbiasedness property $$ \mathbb{E}\left( \frac {1}{N} \sum _{1\leq i\leq N}f_{n}\left(\xi_{0,n}^{i},\xi_{1,n}^{i},\ldots,\xi_{n-1,n}^{i},\xi_{n,n}^{i}\right) ~~~\times~~~ \left( \prod _{0\leq k<n}{\frac {1}{N}}\sum _{1\leq j\leq N}g_{k}(\xi _{k}^{\,j})\right)\right)~=~\mathbb{E}\left(f_n(X_{0},\ldots ,X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) $$ Choosing the historical process $$ {\cal X}_n:=(X_0,\ldots,X_n) \in {\cal S}_n:=(S_0\times\ldots \times S_n)\quad \mbox{and the fitness function}\quad {\cal G}_n({\cal X}_n):=g_n(X_n)$$ we clearly have $$ \mathbb{E}\left(f_{n}({\cal X}_n)~\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)=\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k\leq n}g_{k}(X_{k})\right)\quad \mbox{and}\quad \mathbb{E}\left(f_{n}({\cal X}_n)~\prod _{0\leq k< n} {\cal G}_k({\cal X}_k)\right)=\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k< n}g_{k}(X_{k})\right) $$This shows that the genealogical tree evolution of the genetic population is itself a simple genetic algorithm with a mutation transition on path space representing the evolution of the historical process. In other words, the genealogical tree evolution of the simple genetic algorithm associated with the Markov chain $X_n$ and the fitness function $g_n$ on $S_n$ coincides with the simple genetic algorithm associated with the historical Markov chain ${\cal X}_n$ and the fitness function ${\cal G}_n$ on ${\cal S}_n$.
To avoid unnecessary measure theoretic considerations, consider a Markov chain $X_n$ on a finite state space $S$ and set $$ p_{n+1}(x_{n},x_{n+1}):= Proba(X_{n+1}=x_{n+1}~|~X_{n}=x_{n})\quad \mbox{and}\quad h_{n+1}(x_{n},x_{n+1}):=g_{n}(x_{n})~p_{n+1}(x_{n},x_{n+1}) $$ Also consider the random backward transitions $$ \mathbb{M}_{n+1,m(\xi_{n})}(x_{n+1},x_{n}):=\frac{m(\xi_{n})(x_n)~h_{n+1}(x_{n},x_{n+1})}{\sum_{z_n\in S}m(\xi_{n})(z_n)~ h_{n+1}(z_{n},x_{n+1})} \quad \mbox{with}\quad m(\xi_{n})(x_n):=\frac{1}{N}\sum_{1\leq i\leq N}~1_{\xi^i_n}(x_n)\quad \mbox{with}\quad 1_{\xi^i_n}(x_n)=\left\{ \begin{array}{rcl} 1 &\mbox{if} & x_n=\xi^i_n\\ 0 &\mbox{if} & x_n\not=\xi^i_n \end{array}\right. $$ Note that $$ \sum_{x_n\in S} \mathbb{M}_{n+1,m(\xi_{n})}(x_{n+1},x_{n})=1 \quad \mbox{and}\quad \sum_{x_n\in S} m(\xi_{n})(x_n)=1 $$ In this notation, given the populations of individuals $(\xi_0,\ldots,\xi_n)$, the conditional probability distribution of a randomly chosen ancestral line $(\mathbb{X}_{0,n},\mathbb{X}_{1,n},\ldots,\mathbb{X}_{n,n})$ in the genealogical tree after the $n$-th mutation is given by the backward Markov chain formula^{11,15}: $$ Proba\left(\mathbb{X}_{0,n}=x_0,\mathbb{X}_{1,n}=x_1,\ldots,\mathbb{X}_{n-1,n}=x_{n-1},\mathbb{X}_{n,n}=x_n~|~\xi_0,\ldots,\xi_n\right)= m(\xi_n)(x_n)~\mathbb{M}_{n,m(\xi_{n-1})}(x_{n},x_{n-1})\ldots \mathbb{M}_{1,m(\xi_0)}(x_{1},x_{0}) $$ In addition, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have $$ \mathbb{E}\left(f_n\left(\mathbb{X}_{0,n},\mathbb{X}_{1,n},\ldots,\mathbb{X}_{n,n}\right)~|~\xi_0,\ldots,\xi_n\right)= \mathbb{E}\left({\frac {1}{N}}\sum _{1\leq i\leq N}~f_{n}\left(\xi_{0,n}^{i},\xi_{1,n}^{i},\ldots,\xi_{n-1,n}^{i},\xi_{n,n}^{i}\right)~|~\xi_0,\ldots,\xi_n\right) ~\approx _{N\uparrow \infty }~\frac{\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~\prod _{0\leq k< n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k< n}g_{k}(X_{k})\right)} $$
Let $Y_t$ be a continuous time Markov process evolving in some state space $S$, indexed by a continuous time index $ t\in \mathbb{R}_+:=[0,\infty[$. Consider an increasing time mesh $t_0=0< t_1<\ldots < t_{n-1}< t_{n} $ witth a time step $\Delta =t_n-t_{n-1}\approx 0$ and $V~:~x\in S\mapsto \mathbb{R}_+:=[0,\infty[$ some potential function on some state space $S$. We associate with these objects, the Markov chain $X_n$ and the $[0,1]$-valued potential function$g_n$ defined by $$ X_n:= Y_{t_n} \qquad \mbox{and}\qquad g_n(x):=e^{-v(x)\Delta } $$ In this case, choosing $t_n=[t/\Delta]\Delta \approx _{\Delta \downarrow 0} t$ we have the continuous time Feynman-Kac path integral formulae $$ \mathbb{E}\left(f(X_{n})~\prod _{0\leq k<n}g_{k}(X_{k})\right) =\mathbb{E}\left(f(Y_{t_n})~ \exp{\left(-\sum_{0\leq k<n}v(Y_{t_k}) ~(t_k-t_{k-1})\right)}\right)\approx _{\Delta\downarrow 0} \mathbb{E}\left(f(Y_{t})~ \exp{\left(-\int_0^tv(Y_{s}) ~ds\right)}\right) $$ When the time step $\Delta \downarrow 0$, the GA equipped with the killing/regeneration transition converge to a continuous time GA process $\xi_t:=(\xi^i_t)_{1\leq i\leq N}$ with a killing rate $v(x)$. When an individual is killed, instantly another individual in the pool duplicates. More general killing/regeneration models associated with signed potential functions are discussed in the series of studies ^{5,16,17,21,23}. In theoretical physics, these killing/regeneration continuous time GA are often termed killing/cloning models , see the article by Angeli-Grosskinsky-Johansen and the one by Lecomte and Tailleur.
Next we state the continuous time versions of some convergence results stated above, more detailed statements and proofs are provided in^{5,17,18,21,23,31}, see also the article of Rousset on Schrödinger's ground states particle estimates . Under some regularity conditions, when the number of individuals (and the computational power) $ N\uparrow \infty $, for any continuous time index $t\in \mathbb{R}_+:=[0,\infty[$ and any function $ f~:~x\in S\mapsto f(x)\in \mathbb {R}$, we have the almost sure convergence results $$ {\frac {1}{N}}\sum _{1\leq i\leq N}~f\left(\xi _{t}^{\,i}\right)~\approx _{N\uparrow\infty }~{\frac {\mathbb{E}\left(f(Y_{t})~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}{\mathbb{E}\left(\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}} $$ In addition, we have the unbiased estimate $$ \frac {1}{N} \sum _{1\leq i\leq N}f(\xi _{t}^{\,i}) ~~~\times~~~ \exp{\left(-\int_0^t \frac {1}{N} \sum _{1\leq j\leq N}V(\xi^j_s) ~ds\right)}~\approx _{N\uparrow \infty }~\mathbb{E}\left(f(Y_{t})~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right) $$ The unbiasedness property can be rewritten as follows: $$ \mathbb{E}\left( F(\chi_t) \exp{\left(-\int_0^tV(\chi_{s}) ~ds\right)}\right)~=~\mathbb{E}\left(f(Y_{t})~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right) $$ with the Markov chain $\chi_t=(\xi_t^i)_{1\leq i\leq N}\in S^N$, the function $$ F(\chi_t):= \frac {1}{N} \sum _{1\leq i\leq N}f(\xi _{t}^{i})\quad \mbox{and the potential function}\quad V(\chi_t):= \frac {1}{N} \sum _{1\leq i\leq N} v(\xi _{t}^{i}). $$
Tracing back in time the ancestral lines of each individual $\widehat {\xi }_{t}^{i}$, we define the genealogical tree of the genetic population $$ \widehat {\xi }_{0,t}^{i}\longleftarrow ~\ldots \longleftarrow ~{\widehat {\xi }}_{s,t}^{i}\longleftarrow ~\ldots \longleftarrow {\widehat {\xi }}_{t,t}^{i}={\widehat {\xi }}_{t}^{i} $$ The lower indices $ (.)_{s,t}$ represent the level $s\leq t$ of the ancestor of $\widehat {\xi }_{t}^{i}$. For any function $f_t$ on the space of paths with terminal time horizon $t$ we have $$ \frac {1}{N} \sum _{1\leq i\leq N}f_t\left(\left(\xi _{s,t}^{\,i}\right)_{0\leq s\leq t}\right) \approx _{N\uparrow \infty }~ {\frac {\mathbb{E}\left(f_t\left((Y_{s})_{0\leq s\leq t}\right)~\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}{\mathbb{E}\left(\exp{\left(-\int_0^tV(Y_{s}) ~ds\right)}\right)}} $$
Feynman-Kac path integrals arise a variety of scientific disciplines, including in computational physics, biology, information theory and computer sciences.
Their different interpretations depend on the application domain. We illustrate these path integration models with some concrete examples taken from the slides MCQMC-2012 slides and the ENS-Lyon-2012 slides:
If we choose the indicator fitness function $g_{n}(x_{n})=1_{A_n}(x_{n})$ of some subset $A_n$ of the state space, they represent the conditional distribution of a Markov chain given it stays in a given tube; that is, for any function $ f_{n}~:~(x_{0},\ldots ,x_{n})\in (S_0\times\ldots \times S_n)\mapsto f_{n}(x_{0},\ldots ,x_{n})\in \mathbb {R} $ we have that $$ \mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})~|~X_{0}\in A_0,~\ldots ,X_{n}\in A_n\right)=\displaystyle {\frac {\mathbb{E}\left(f_{n}(X_{0},\ldots ,X_{n})\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}{\mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)}}\quad {\mbox{and}}\quad \mathbb{E}\left(\prod _{0\leq k\leq n}g_{k}(X_{k})\right)=P\left(X_{0}\in A_0,~\ldots ,X_{n}\in A_n\right)$$ (as soon as the normalizing constant is strictly positive). In addition, we have the unbiased estimate$$ \prod _{0\leq k\leq n}{\frac {1}{N}}\sum _{1\leq i\leq N}1_{A_n}(\xi _{k}^{i})~\approx _{N\uparrow \infty }~Proba\left(X_{0}\in A_0,\ldots ,X_{n}\in A_n\right)$$
Let $X_n$ be a simple random walk on the $2$-dimensional lattice $S:=\mathbb{Z}^2$ starting at the origin. Choosing the historical process $$ {\cal X}_n:=(X_0,\ldots,X_n) \in {\cal S}_n:=S^{(n+1)}\quad \mbox{and the fitness function}\quad {\cal G}_n({\cal X}_n):=1_{S-\{X_0,\ldots,X_{n-1}\}}(X_n)$$ we clearly have the self-avoiding Markov chain formulae $$ \mathbb{E}\left(f_{n}(X_0,\ldots,X_n)~|~\forall 0\leq p< q \leq n\quad X_p\not=X_q\right)= \frac{\mathbb{E}\left(f_{n}({\cal X}_n)~\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)}{ \mathbb{E}\left(\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)} \quad \mbox{and}\quad \mathbb{E}\left(\prod _{0\leq k\leq n} {\cal G}_k({\cal X}_k)\right)=Proba\left(\forall 0\leq p< q \leq n\quad X_p\not=X_q\right) $$