SEMP: Suburban Emergency Management Project

Contact UsSite Map
Home About Us Publications
Publications: Gulf Coast near New Orleans, Louisians, USA
in Publications:
Font size:
SmallMediumLargeExtra large

What Is Hidden Failure in Critical Infrastructure?

Biot Report #112: September 06, 2004 Printer Printer Friendly

Why is the integrity of critical infrastructures at risk worldwide? The answer is, in part, because of local disturbances and hidden failures, as described below.*

Characteristics of Networked Infrastructures

“Critical infrastructure systems such as water energy, telecommunications, computer, and networked banking, are becoming increasingly interdependent as the digital society matures on a global scale. Consequently, the vulnerability of these stratified networks is raising major concerns worldwide. For instance, the normal operation of water, telecommunications, and banking systems is maintained only if there is a steady supply of electric energy. On the other hand, the generation and delivery of electric power cannot be ensured without the provision of fuel, water, and various telecommunications and computer services for data transfer and control purpose to the power plants and networks. These interdependencies are strengthening their grip as the usage of the Internet and other computer networks become prevalent.

“Energy and telecommunications infrastructures each form a complex interconnected networked system that stretches over a large geographical area to reach in principle every household and economic entity in a region. Their functions in a society are similar to those of arteries and veins that branch out through the human body to nourish every cell with vital nutrients. Infrastructure’s covered geographical areas may include a continent or even the whole world. In fact, the public telephone system can be regarded as the first infrastructure to reach a global scale since a point-to-point connection can be established between any pair of telephones around the world. This is achieved via an adequate combination of cables and wireless technologies, including several constellations of satellite systems at low, medium, and geostationary earth orbits (see Biot #110). Computer networks have a similar propensity to globalization as more and more countries equip themselves with Internet technologies.

While being recognized as the most complex system every built by human beings, electrical power systems have not yet attained a global scale. However, some such as the North American and European interconnected power systems, reach continental size.”

Increasing Risks of Cascading Failures Triggered by Local Disturbances

“The integrity of critical infrastructures is at risk worldwide…because they are increasingly vulnerable to local disturbances. This is in part due to the strong reliance of critical infrastructure systems on one another, which may turn a local disturbance in one system into a large-scale failure via cascading events that have catastrophic consequences on society as a whole. It is also in part due to the current trend to operate critical networked systems closer to their stability or capacity limits. One compelling reason for this practice is, of course, economics. Providing this infrastructure with some degree of resiliency comes at [the price of achieving the required level of redundancy in the equipment]. This is even more true in developing countries, where the expansion of critical infrastructure systems does not keep pace with rapid growth in demand.

“Another reason for the degradation infrastructure reliability is the detrimental role played by hidden failures in the equipment. Hidden failures are hardware or software failures that are only exposed when a system or a portion of a system is highly stressed due to congestion or fault. In other terms, hidden failures cannot be revealed before the system is perturbed. In particular, routine maintenance testing may not detect them or even worse, may induce them. This was precisely the case in the 1977 New York blackout where a protection relay was damaged during a testing procedure a few weeks before the power system failure. Another cause of hidden relay failures is the present practice in electric power systems to favor dependability over security in relay settings to ensure the isolation of a fault with high probability while allowing the tripping of non-faulty devices from time to time. Hence, it should not come as a surprise that a North American Electric Reliability Council (1988) blamed hidden failures in the protection systems for aggravating the situation in 73.5 percent of significant disturbances that were investigated in the U.S. electric power transmission network. This is a sizable portion of major failures in power systems that should call out for the development of mitigation measures not only in the U.S., but also in other developed and developing countries.

“Another example of a hidden failure that wreaked havoc with the normal operation of critical infrastructure was a software bug that existed in the switching systems of the AT&T long-distance public telephone network before its general breakdown via cascading failures on January 15, 1990. The triggering event was a failure in a switch whose detection was passed to all its neighbors. As a result, the latter switches correctly took action to not forward calls through the failed one. Unfortunately, a software bug in all of the switches prompted each of them to mistakenly notify their neighbors that they were filing and calls were not to have been forwarded through them. This domino effect resulted in the interruption of the long-distance telephone service in the U.S. for nine hours.”

*Hazard Risk Unit publications of the World Bank: “Mitigating the Vulnerability of Critical Infrastructure in Developing Countries” by Lamine Mili (2002). Full text available at: http://www.worldbank.org/hazards/files/conference_papers/mili.pdf.


Summary Points

Characteristics of Networked Infrastructures, according to the International Institute for Critical Infrastructure (CRIS)

  1. These are complex large-scale networked systems that have a continental or global size.
  2. A local disturbance may cascade into a wide-system failure within and across infrastructures.
  3. CRIS are more and more operated at the limit of their capacity. They are increasingly vulnerable to catastrophic failures.


Equipment Hidden Failures

  1. The majority of the blackouts in electric power networks are due to misoperation of the protection systems, called hidden failures.
  2. Telecommunication networks are also exposed to hidden failures. An example is the failure of the ATT long-distance public telephone system of Jan. 5, 1990, due to a bug in the switching relays.
  3. Hidden failures are hardware or software failures that are only exposed when a subsystem is highly stressed, for example due to a congestion or a fault.