An example for this "over allocating" strategy can be found in std::vector, which, like most dynamically sized array implementations, will allocate more memory than necessary; capacity >= size. Otherwise repeatedly calling push_back would result in new memory allocation AND copy of whole content for each push_back call. This would make the expected runtime for inserting N values O(N²), which is a lot!
An interesting side effect is that naive use of std::vector::reserve can turn an O(N) algorithm into O(N^2). If you just do push_back() N times, it's guaranteed O(N). If you have a function that inserts 10 elements, and prefix it with reserve(v.size() + 10), then it will only reserve ten more slots - if you keep calling it, the result is O(N^2).
The amortized runtime for N push_back will only be O(N) if the underlying array is resized to a constant factor > 1 times the required size (e.g. doubling it each time it's full). When prefixing with reserve of the actual required size, then, as you say, the amortized runtime for the N push_backs will be O(N^2).
I once investing a particularly slow routine to import csv to numpy-array of a million lines or so.
It read csv lines in a numpy array, calling hstack for each new line, resulting in huge runtimes.
Since then I have seen similar misuses of hstack and the like.
When you do similar, it is important to resize (by hstack or whatever) to a constant factor > 1 times the required size. Don't forget to remember the current size, as the size of the underlying array is now the capacity.
Yeah, the tricky part is to figure out how much to over allocate. Its often even harder to know when to scale it down.
An advantage of using realloc, is that a smart memory allocate can sometimes get away with not moving the memory, either by extending the allocation if the memory beyond the allocation happens to be free, or by doing memory address remapping.
A factor of two is always a good start. There was a claim that 1.5 is better realloc-hole-wise, but the tested difference was shown to be negligible. Sorry for no links, it's early morning. Scaling down or preallocating is simple: expose fit(n) and regrow() functions which fit the capacity to at least max(used, n) items or reset your guess back to natural. There is end user somewhere up the call stack who knows when bufwriting is over and no growth/shrinkage will happen for a while. If automatic shrinkage is implemented, it should have a significant hysteresis to prevent push/pop/push/pop realloc jitter. The really tricky part is that generic containers often skip this api and make a guesswork that sucks.