In the ivory tower of algorithm analysis, the order of operations often doesn't matter. But in the trenches of C++ systems programming, where cache lines and allocator overhead dictate reality, the order is everything.
Here is a common scenario: you are managing a std::set (typically implemented as a Red-Black Tree). You have a node A that has been processed and you need to replace it with N new nodes.
You have two choices:
- insert first: insert the
Nnew nodes, then deleteA; - delete first: delete
A, then insert theNnew nodes.
If you look at the Big-O complexity, both approaches are \(O(N \log M)\) (\(M\) is the total number of nodes in the std::set). Academically, they are identical. Practically, "delete first" is almost always the superior strategy.
Here is why.
The geometry of the tree
A Red-Black Tree's performance is tied to its height. Every operation requires a traversal from the root (or a nearby rebalance point), costing roughly \(2 \log_2 (M+1)\).
If you insert first, you are inflating the tree. If N is large, you might increase the tree's height before you ever get around to removing A. When you finally delete A, you are forcing the CPU to perform the most expensive tree operation (deletion and rebalancing) on the largest, deepest version of that structure.
If you delete first, you prune the tree immediately. You perform the complex rebalancing logic on the smallest possible version of the dataset. The subsequent insertions then occur on a "lighter" tree.
The "hot cache" optimization
This is where the hardware comes into play. Memory allocators and CPU caches are not abstract concepts, they are physical bottlenecks.
When you call set.erase(iterator), the node containing A is deallocated. Modern memory allocators (like glibc's malloc or jemalloc) often place recently freed blocks into a "free list" for immediate reuse.
If you immediately follow this with a set.insert(...):
- the allocator is very likely to hand you back the same (or nearby) memory address you just freed;
- that memory block is likely still resident in the L1 or L2 cache of your CPU.
By deleting first, you create a "recycle bin" effect that saves allocator search time and reduces cache misses.
The C++ secret weapon: insertion hints
Another argument for delete first isn't about the hardware, it's about the STL API itself.
In C++, std::set::erase returns an iterator to the element following the erased one (or end()). This is a gift. If your new data is topologically close to the old data (which is very common in update logic), you can use this iterator as a hint.
In the insert first way the tree is traversed from the root for every insertion. The final cost is \(N × O(\log M)\).
With the delete first way, we use the iterator returned by erase to tell the tree where to start looking.
// Find the old node.
auto it = my_set.find(old_value);
// Delete it and capture the 'hint'.
// 'hint' now points to the successor of `old_value`.
auto hint = my_set.erase(it);
// Insert new values using the hint.
for (const auto &new_val : new_values)
{
// If new_val belongs right before `hint`, this becomes O(1).
hint = my_set.insert(hint, new_val);
}
If the hint is correct, the insertion complexity drops from logarithmic to amortized constant time (\(O(1)\)). By deleting first, you gain a navigational anchor that speeds up everything that follows.
While asymptotic analysis says "it doesn't matter," system architecture begs to differ. Remember, out with the old, then in with the new.