Replacing division in genetic programming, how the analytic quotient (AQ) operator can improve your models

Premise

If you've ever used genetic programming (GP) for symbolic regression, you'll know that the standard arithmetic operators (addition, subtraction, multiplication, and division) form the backbone of most function sets. Division, in particular, feels essential: it allows trees to split quantities, create ratios, and capture complex relationships. But there's a hidden catch: division can introduce instability into your models.

In traditional GP, division by zero is a well-known problem. Some systems use unprotected division (UPD), which blindly follows IEEE 754 standards: division by zero returns NaN (not a number), while small denominators can lead to extreme spikes in the function output. Others prefer protected division (PD), where dividing by zero simply returns a fixed value (like 1). While PD prevents outright crashes, it introduces discontinuities in the function: sudden, artificial jumps that can mislead the evolutionary search process. These quirks lead to erratic behaviour: one GP run might give you a good model, while the next explodes into huge test errors. It's a bit like building on quicksand.

This is where the analytic quotient (AQ) operator comes in. Introduced by Ji Ni, Russ Drieberg, and Peter Rockett in their paper 'The Use of an Analytic Quotient Operator in Genetic Programming', AQ is a simple but elegant replacement for traditional division. Defined as:

\[AQ(x_1, x_2) = {x_1 \over \sqrt{1 + x_2^2}}\]

the AQ is smooth, continuous, and everywhere differentiable. It behaves like division when \(x_2\) is small, but avoids the spike to infinity when \(x_2\) approaches zero. No NaNs. No discontinuities. Just a stable, well-behaved operator.

The authors tested AQ on six benchmark regression problems, using both single-objective and multiobjective GP, and compared it against both PD and UPD. Their experiments were thorough: multiple GP setups (generational and steady-state), both single and multiobjective formulations, 100 training runs per problem, and a huge 100,000-instance test set for evaluation.

The results were striking:

AQ consistently delivered lower test errors across all problems and setups;
test errors were more stable. The variance was much smaller, making results more predictable and repeatable;
AQ avoided the problem of occasional catastrophic runs where PD or UPD trees would produce huge errors;
even though AQ trees tended to be slightly larger, the increase in tree size was a small price for the gain in stability and accuracy.

Why does AQ perform well? The authors suggest it's all about stability: AQ smooths out the wild spikes and discontinuities that (U)PD can introduce. This variance stabilisation means AQ trees generalise better; low training error translates to low test error, which is not always the case with (U)PD. Think of it like taming a wild animal: AQ turns unpredictable GP models into reliable performers.

If you're working on symbolic regression or data-driven modelling with GP, replacing division with AQ could be a simple yet powerful upgrade. AQ is a drop-in replacement that can improve the quality of your models, with minimal changes to your existing code.

Of course, there's still room for exploration. The paper suggests that tuning the AQ's internal parameters (like the constant 1 in the denominator) could further refine its behaviour. But even in its basic form, AQ offers a powerful, principled solution to a long-standing problem in genetic programming.

Ready to stabilise your GP models? Give the analytic quotient a try, your trees (and your test errors) will thank you.

Implementing the Analytic Quotient (AQ) operator in Ultra

Incorporating the analytic quotient (AQ) operator in Ultra is remarkably simple: no need to define it yourself! Ultra already includes an aq operator in the ultra::real namespace.

Here's how to set up a symbolic regression problem using AQ:

#include "kernel/ultra.h"
#include <sstream>
#include <iostream>

int main() {
  // Define the training data
  std::istringstream training_data(R"(
    -9.456,-10.0
    -8.989, -8.0
    -5.721, -6.0
    -3.243, -4.0
    -2.909, -2.0
     0.000,  0.0
     2.909,  2.0
     3.243,  4.0
     5.721,  6.0
     8.989,  8.0
  )");

  // Create the symbolic regression problem
  ultra::src::problem prob(training_data);

  // Insert basic operators and AQ
  prob.insert<ultra::real::add>();
  prob.insert<ultra::real::sub>();
  prob.insert<ultra::real::mul>();
  prob.insert<ultra::real::sin>();
  prob.insert<ultra::real::aq>();  // analytic quotient operator

  // Run the search
  ultra::src::search search(prob);
  auto result = search.run();

  // Output the best individual and its fitness
  std::cout << "Best individual: " << result.best_individual << std::endl;
  std::cout << "Fitness: " << *result.best_measurements.fitness << std::endl;
}

That's it! By using prob.insert<ultra::real::aq>();, you get the benefits of the analytic quotient operator - smoothness, stability, and reliable generalisation - without extra coding effort.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search