In HPC, there are two ways to look at the reliability of a supercomputer.
Top-down reliability is where you start with what a full-scale system job would experience in practice and begin breaking that down. Top-down reliability is governed by metrics that characterize job reliability.
Bottoms-up reliability is where you start with individual components and build a reliability model by connecting those components in series and in parallel. Bottoms-up reliability is governed by metrics that characterize component reliability.