Current superscalar processors use a reorder buffer (ROB) to track the instructions in flight. The ROB is implemented as a FIFO queue where instructions are inserted in program order after decoded, and from which they are extracted when they commit, also in program order. The use of this hardware structure provides a simple support for speculation, precise exceptions, and register reclamation. However, retiring instructions in program order may lead to a significant performance degradation if a long-latency operation blocks the ROB head. Several proposals have been published dealing with this problem. Most of them allow instructions to be retired out of order in a speculative manner, so they require checkpoints in order to roll back the processor to a precise state when speculation fails. Checkpoints management usually involves costly hardware and causes an enlargement of other major processor structures, which in turn might impact the processor cycle. This problem affects most state-of-the-art microprocessors, regardless of whether they are single- or multithreaded, or whether they implement one or multiple cores. This thesis spans the study of non-speculative out-of-order retirement of instructions in superscalar, multithreaded, and multicore processors.

First, the Superscalar Validation Buffer architecture is proposed as a processor pipeline design where instructions are retired out of program order in a non-speculative manner, hence without checkpoints. The ROB is replaced with a smaller FIFO queue, called Validation Buffer (VB), which can be left by instructions just after they are classified either as non-speculative or mispeculated, irrespective of their execution state. The management of the VB is complemented with an aggressive register reclamation technique that decouples physical register release from instructions retirement. The VB architecture largely alleviates the ROB performance bottleneck, and reduces complexity of other processor structures. For example, a ROB can be outperformed by a half as large VB, while decreasing its hardware cost.

Second, the Multithreaded Validation Buffer architecture is extended with different multithreading organizations, namely coarse-grain, fine-grain, and simultaneous multithreading. Multithreaded processors became popular as an evolution of superscalar processors to increase the issue bandwidth utilization. Likewise, out-of-order retirement of instructions contributes to reduce the issue waste by avoiding frequent pipeline stalls due to a full ROB. The evaluation of the VB architecture on multithreaded processors shows again significant performance gains and/or a reduction of complexity. For example, the number of supported hardware threads can be reduced, or the multithreading paradigm can be simplified, without affecting performance.

Finally, the Multicore Validation Buffer architecture is presented as an out-of-order retirement approach on multicore processors, which define the dominant trend in the current market. Wide instruction windows are very beneficial to multiprocessors that implement a strict memory model, especially when both loads and stores encounter long latencies due to cache misses, and whose stalls must be overlapped with instruction execution to overcome the memory gap. The extension of the VB architecture to work on a multiprocessor environment allows core pipelines to retire instructions out of program order, while still enforcing sequential consistency. This proposal provides similar performance to ROB-based multiprocessor architectures implementing a relaxed memory model, and it outperforms in-order retirement, sequentially consistent multiprocessors.