Premise

In ULTRA Genetic programming individuals are internally represented as a linear genome but semantically they expose two very different views:

all genes, stored contiguously;
active genes (exons), dynamically determined by dependency analysis starting from the output locus.

ULTRA (and VITA before) handled this distinction by exposing:

standard container-style iteration (begin(), end(), operator[]) over the full genome;
a bespoke exons() helper returning a pair of special iterators.

While functional, this design predated C++20 ranges and left several questions unanswered (what kind of range is exon traversal? Can it be composed with standard algorithms? What guarantees does the iterator provide?).

In Genetic Programming introns are extraneous regions in an individual which neither add nor detract from its fitness, because they are ignored or do nothing. They are "unnecessary" since they can be removed from the program without altering the solution the program represents (reality is more complex :).

Exons and introns in biology

This notion of introns stems from microbiology. In real chromosomes, genes consist both of regions which are expressed in final protein end-products and regions which are edited out somewhere in the translation and transcription process. The expressed areas of these genes are known as exons, whereas the ignored regions are introns.

Exons aren't a container

A key realisation is that exons are not a container and should never pretend to be one.

Active genes:

are discovered incrementally;
depend on graph traversal, not indices;
have no cheap size();
cannot be iterated twice without recomputation.

In other words, exon traversal is inherently single-pass. Treating it like a forward or random-access range is misleading at best and dangerous at worst.

This insight guided every design choice that followed.

A deliberate choice: `input_range`

The new design models exon traversal as a std::ranges::input_range:

single-pass;
sentinel-terminated;
no promise of multi-pass stability;
no accidental use of algorithms requiring stronger guarantees.

This is not a limitation, it's a precise specification of behaviour:

static_assert(std::ranges::input_range<const_exon_view>);
static_assert(!std::ranges::forward_range<const_exon_view>);

These assertions are not decoration, they document intent and protect future maintainers from "optimising" semantics away.

Views instead of helper functions

Instead of returning a custom range object or iterator pair, ULTRA now exposes:

These are lightweight, non-owning views that:

store only a pointer to the individual;
construct iterators lazily on begin();
integrate seamlessly with <ranges> algorithms.

This enables expressive, modern usage:

for (auto &g : ind.exons())
  mutate(g);

auto n = std::ranges::distance(ind.cexons());

Crucially, this works without materialising exon lists or caching results.

Sentinel-based termination

End-of-range is represented by a dedicated sentinel type rather than an "empty iterator".

It avoids the need for a default-constructed iterator; it also matches the logical model ("iteration ends when no active loci remain"). Last it reduces iterator state and simplifies correctness reasoning.

This aligns with modern standard-library practice (e.g. istream_view).

One iterator, two constness modes

The underlying iterator is implemented once as:

template<bool is_const> class basic_exon_iterator;

This avoids code duplication while preserving correct const propagation, identical traversal semantics and minimal surface area.

The iterator explicitly declares:

using iterator_concept = std::input_iterator_tag;

making its capabilities unambiguous to the ranges framework.

Why `begin()` is const

Both exon_view::begin() and const_exon_view::begin() are const member functions.

This is intentional and correct:

views are cheap, immutable adapters;
calling begin() conceptually creates an iterator, it doesn't mutate the view;
it enables use with const views and temporary expressions.

The iterator itself owns the traversal state, the view does not.

Performance considerations

The iterator currently maintains a std::set of loci to explore, ordered by descending locus index.

This choice prioritises correctness, determinism and simplicity.

However, the new structure makes optimisation easier and safer:

alternative containers (flat_set, small_vector + heap) can be explored;
caching strategies can be added at the view level;
traversal policies can be changed without touching user code.

The abstraction boundary is now correct, which is the most important optimisation of all.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search