TL;DR Groups often make better decisions than individuals due to the diversity of biases, and this idea applies to machine learning. In genetic programming (GP), the use of ensembles of individuals (aka teams) can offer several interesting advantages, many of which parallel the benefits observed in traditional ensemble machine learning methods.

L'unione fa la forza (There is strength in numbers)
Generalities
Groups often make better decisions than individuals due to the diversity of biases, and this idea applies to machine learning. Ensemble methods combine multiple learners to improve performance, often allowing the use of simpler models. Ensembles are inherently parallel, making training and testing more efficient with multiple processors. Instead of learning a single classifier, multiple classifiers (decision tree, perceptron, KNN...) are trained, and predictions are made by voting (there are alternatives, see below).
The advantage of ensembles is that itβs unlikely all classifiers make the same error, leading to better results as long as errors are made by a minority. Ensembles also reduce variance, making them useful for classifiers sensitive to data changes. Voting schemes work for multi-class classification, but not regression or ranking, where averaging or other methods are better.
To build an ensemble, instead of using different classifiers, you can train one classifier on fragmented subsets of data, however this risks poor performance since each subset is small.
Ensembles in Genetic Programming
In genetic programming (GP), the use of ensembles of individuals (aka teams) can offer several interesting advantages, many of which parallel the benefits observed in traditional ensemble machine learning methods.
In GP, each individual in the population represents a potential solution to a problem. By evolving an ensemble of individuals, you can capture a wide variety of solutions, each with its own strengths and weaknesses. This diversity of solution reduces the risk of the population converging on a suboptimal solution, a common problem in GP known as premature convergence.
Similar to traditional ensemble methods, an ensemble of individuals in GP can combine predictions through voting mechanisms, for classification tasks, or averaging, for regression tasks. This allows for a more robust solution, as the ensemble can effectively reduce the impact of any one individual's poor performance. The wisdom of the crowd effect helps mitigate the individual biases and errors, as not all individuals are likely to make the same mistake.
Also consider how certain problems may be too complex for a single GP individual to solve effectively. By leveraging an ensemble, different individuals can specialise in different aspects of the problem, thus breaking down complex tasks into more manageable components. This increases the likelihood of finding a near-optimal solution.
Like in traditional machine learning, ensembles in GP can be highly parallelisable. Different individuals can be evaluated and evolved independently (or in parallel), leading to more efficient training, especially in computational environments that support parallel processing.
In genetic programming, as in other machine learning paradigms, individual models can have high variance, especially when evolved on limited or noisy data sets. Ensembles of GP individuals help smooth out these high-variance errors by combining diverse models, reducing the overall variance while maintaining accuracy.
In GP, solutions are evolved over time, and ensembles allow for adaptation during the evolutionary process. You can modify how individuals are combined, introduce or remove members from the ensemble, or even evolve the ensemble structure itself. This adaptability can lead to more effective problem-solving as the algorithm evolves and adapts to the problem landscape.
Alternatives to the voting method
There are several alternatives to the voting mechanism in ensemble methods. These alternatives often provide more nuanced ways of combining predictions from multiple classifiers, depending on the type of problem (classification, regression...).
Here we briefly describe two alternatives used in the Ultra framework:
- for certain classifiers that provide a confidence score along with their prediction, you can select the prediction with the highest confidence score across all models. This allows the final decision to be based on the model that is most confident in its output (maximising confidence);
- for regression tasks, where the output is a continuous value, you can use mean or median averaging to combine the predictions of multiple models. Each model predicts a value, and the final output is the average or median of these predictions.
References
- A Course in Machine Learning by Hal DaumΓ© III (see Ensemble Methods).
- Evolving Teams of Predictors with Linear Genetic Programming by Markus Brameier and Wolfgang Banzhaf.