The bounding box
Traditional crossover operators in genetic algorithms, such as one-point, two-point, and uniform crossover, are not global mutation techniques. When two vectors are crossed, it's not possible to generate just any new vector. Instead, think of each vector as a point in space. The result of a crossover is restricted to points on the corners of the hypercube formed by these vectors (for example, in 3D space, this forms the corners of a cube). If we treat a population Pop as a set of points in space, crossover will only produce offspring that lie within the bounding box surrounding Pop. This bounding box cannot expand, meaning crossover limits the search space to within these bounds.
As crossover and selection are applied repeatedly, some alleles (specific values at positions within the vector) may disappear, causing the bounding box to shrink in certain dimensions. Over time, this process often leads the population to converge prematurely into genetically identical individuals. At this point, crossing an individual with itself generates no diversity, making it impossible to explore new solutions. To avoid this, a mutation operation is essential for introducing fresh genetic material and enabling exploration outside the current bounds.
Why use crossover at all?
The purpose of crossover is rooted in the idea that highly fit individuals tend to share common features, known as building blocks. In fixed-length vectors, a building block is a subset of genes with specific alleles. For instance, in the boolean vector 10110101, the sequence **1011 could represent a building block (where * indicates positions outside the block). In problems where crossover is effective, an individualโs fitness is often linked to the presence of these building blocks. Crossover serves to spread these building blocks across the population quickly, helping to improve overall fitness. This idea is central to early genetic algorithm analysis and is formalised in schema theory.
However, the effectiveness of crossover is also tied to the concept of epistasis, which refers to how much genes interact with each other to influence fitness. Highly epistatic genes are strongly interdependent: for example, genes G1 and G2 might only contribute to fitness when both are set to 1; if either is 0, the other gene's effect is nullified. On the other hand, genes with low epistasis act independently, meaning their effects on fitness are additive rather than interdependent.
The impact of crossover on epistatic genes depends on their linkage, the likelihood that linked genes will be inherited together by offspring. In one-point or two-point crossover, this likelihood is influenced by how close the genes are on the vector, as closely positioned genes are less likely to be split apart. In general, if genes are highly epistatic, you want them to be closely linked to maintain their beneficial interactions. By contrast, uniform crossover tends to break apart epistatic genes, regardless of their position, increasing the chance of disrupting these relationships. Therefore, it's crucial to carefully organise vectors to preserve important gene linkages and avoid breaking apart beneficial combinations during crossover.