
When performance and correctness collide, the usual fear is that one must be sacrificed for the other. This article tells the story of a refactoring in ULTRA where tightening correctness guarantees around interpreter lifetimes did not slow the system down... it made it faster.
This was not obvious in advance. It only became clear after carefully rethinking invariants, ownership, cache semantics and then validating everything with a concrete benchmark.
The original design problem
In ULTRA (a legacy of VITA), a gp::interpreter executes a gp::individual over many examples. Internally, the interpreter owns:
- a pointer to the program (
gp::individual); - a cache matrix;
- an instruction pointer.
The cache is expensive to initialise but cheap to reuse, so interpreters are typically constructed once and reused across many evaluations.
To support this, early versions of reg_oracle_storage embedded an interpreter directly:
gp::individual ind_;
mutable src::interpreter int_;
This worked... until it didn't.
The hidden invariant
The interpreter contains a pointer to the individual it evaluates. That pointer must always refer to the ind_ stored inside the same reg_oracle_storage object.
Unfortunately, C++ does not enforce that invariant automatically.
After a copy, move or container reallocation, the interpreter could still point to the old individual. The object looked valid, but calling run() could explode spectacularly.
This was not a theoretical concern: std::vector reallocation alone was enough to break the invariant.
For example consider:
struct reg_oracle_storage
{
gp::individual ind_;
src::interpreter int_;
explicit reg_oracle_storage(const gp::individual& ind)
: ind_(ind), int_(ind_)
{}
value_t run(const std::vector<value_t>& input) const
{
return int_.run(input);
}
};
At construction time, everything looks correct:
ind_stores the individual;int_stores a pointer toind_;run()simply delegates to the interpreter.
The implicit invariant is:
int_must always refer toind_stored in the same object.
Unfortunately, this invariant is not preserved automatically by the language. Placing these objects inside a std::vector:
std::vector<reg_oracle_storage> v;
v.emplace_back(ind1);
v.emplace_back(ind2);
v.emplace_back(ind3); // may trigger reallocation
During reallocation:
reg_oracle_storageobjects are moved (or copied);ind_is relocated to a new address;int_is bitwise-moved and still points to the oldind_
After reallocation, the object looks valid but the invariant is broken:
&v[i].int_.program() != &v[i].ind_; // dangling reference
Calling run() now results in undefined behaviour. The bug is subtle because all members are private, the type is trivially copyable/movable, the failure depends on container growth patterns, the crash may occur far from the actual mistake.
This is a classic example of a hidden lifetime invariant: correctness depends on an unstated relationship between two members.
First reactions (and why they were unsatisfying)
Several obvious fixes were considered:
- delete copy and move constructors;
- store everything behind
std::unique_ptr; - reconstruct the interpreter on every call.
All of these "worked" but at a cost:
- reduced composability;
- more heap allocations;
- unnecessary interpreter rebuilds in hot paths.
What we really wanted was simpler, treat the interpreter as a cache, not as part of the logical state.
That idea turned out to be the key.
A better invariant: the interpreter is ephemeral
The breakthrough was to stop trying to keep the interpreter always valid.
Instead, we adopted a much stronger and simpler rule: the interpreter is only required to be valid inside run().
Every other member function must behave correctly even if no interpreter exists at all.
This immediately suggested a new design. The interpreter is stored as an optional, lazily constructed on first use, and rebound on every execution:
template<>
class reg_oracle_storage<gp::individual, true>
{
public:
explicit reg_oracle_storage(const gp::individual &ind) : ind_(ind)
{
Ensures(is_valid());
}
reg_oracle_storage(std::istream &in, const symbol_set &ss)
{
if (!ind_.load(in, ss))
throw exception::data_format("Cannot load individual");
int_.reset();
Ensures(is_valid());
}
template<class ...Args>
[[nodiscard]] value_t run(Args&&... args) const
{
if (int_) [[likely]]
int_->rebind(ind_);
else
int_.emplace(ind_);
return int_->run(std::forward<Args>(args)...);
}
[[nodiscard]] bool is_valid() const
{
return ind_.is_valid();
}
bool save(std::ostream &out) const
{
return ind_.save(out);
}
private:
gp::individual ind_;
mutable std::optional<src::interpreter> int_{};
};
Key properties of this design:
- the interpreter is not part of the observable state;
- copying or moving the object is always safe;
- the invariant is restored every time it matters;
- cache reuse is preserved;
- future maintainers cannot "forget" to rebind.
Correctness is now enforced by structure, not by convention.
What about performance?
This is where things got interesting.
To ensure this refactoring did not introduce a regression, a dedicated benchmark was written to compare four strategies:
- Embedded interpreter (baseline)
- Interpreter constructed on the fly
- Single interpreter with explicit rebind
- Optional interpreter with lazy construction and rebind
Each strategy evaluated:
- 20.000 examples;
- 1.000 GP individuals;
- identical programs and inputs.
The results (Xeon E-2146G, clang++ -O2):
Embedded On the fly Rebind Rebind optional
( 100, 100) 1ms 3ms 1ms 1ms
( 100, 1000) 18ms 40ms 18ms 18ms
( 100,10000) 372ms 412ms 339ms 374ms
( 100,30000) 1241ms 1501ms 1216ms 1248ms
( 100,50000) 2140ms 2625ms 2133ms 2154ms
( 500, 100) 8ms 18ms 8ms 8ms
( 500, 1000) 87ms 199ms 88ms 87ms
( 500,10000) 1756ms 1995ms 1665ms 1762ms
( 500,30000) 6115ms 7187ms 6067ms 6103ms
( 500,50000) 10611ms 12943ms 10592ms 10531ms
(1000, 100) 16ms 38ms 17ms 17ms
(1000, 1000) 172ms 410ms 174ms 172ms
(1000,10000) 3391ms 4077ms 3249ms 3389ms
(1000,30000) 12116ms 14490ms 12080ms 12130ms
(1000,50000) 21263ms 25998ms 21324ms 21168ms
(5000, 100) 78ms 192ms 80ms 78ms
(5000, 1000) 852ms 2026ms 858ms 853ms
(5000,10000) 16801ms 20575ms 16694ms 16723ms
(5000,30000) 60418ms 71138ms 60482ms 60190ms
(5000,50000) 105140ms 128479ms 105374ms 104777ms
This is the representative mid-scale configuration (1000 examples, 10000 individuals). It's large enough to amortise overheads, small enough to stay readable and it matches the pattern seen everywhere else.
So this really scales, the gap widens with scale!
Embedded, rebind and rebind optional are statistically indistinguishable; on-the-fly is consistently and increasingly worse. Why?
- interpreter construction is not free;
- cache reuse dominates execution cost;
- the branch on
std::optionalis perfectly predictable; - rebinding is cheaper than reconstruction.
In other words tightening the correctness model has zero structural cost.
This is a rare and satisfying outcome. Across all tested scales, from hundreds to billions of evaluations, lazy interpreter rebinding matches the performance of an embedded interpreter while eliminating a fragile hidden invariant. The cost of restoring correctness is below measurement noise, while the cost of reconstructing interpreters is substantial and scales with population size.
Note on thread safety
Although run() is marked const, it mutates a mutable std::optional<src::interpreter> used as an execution cache.
As a result, reg_oracle_storage is not thread-safe for concurrent calls to run() on the same instance.
In a genetic programming context (including ULTRA), this is usually acceptable: individuals are evaluated in parallel by giving each worker its own oracle or by partitioning the population.
Sharing a single reg_oracle_storage across threads would require additional synchronisation (e.g. a std::mutex or thread-local interpreters), which would reintroduce overhead in the hot path.