Can High-Reliability Organisations Ever Be Error Free? Critically Evaluate Using Appropriate Concepts, Models and Frameworks From the Unit and Real-world Examples to Support Your Argument.
The idea that any organisation can be error free sounds like a pipe dream when first considered. The kneejerk responses referring to Murphy’s Law and the Titanic being declared unsinkable usually come up in any ensuing conversation. However, there are some organisations that deal with high risk situations or products where any errors could have catastrophic consequences that have, on examination, had remarkably few accidents. This paper examines these highly reliable organisations and the possibility of attaining the ultimate goal of error free operations. We will look briefly at the various theories pertaining to High-Reliability Organisations (HROs) and examine their common characteristics. We will then critically investigate the question that HROs can ever truly be error free. We conclude that there is no possible way for error-free organizations to exist, due to individuals and organizations’ inherent fallibility. However, it is important to keep in mind that HRO theory provides a framework in which errors can be anticipated and damage contained so that these organizations are safer and more resilient.
There are two dominant schools of thought that try to explain what causes accidents in complex, high-risk organizations: Normal Accident Theory (NAT) and High Reliability Organisation Theory (HROT) (Lekka 2011). More recently the concept of Resilience Engineering (RE) has also become popular with safety scientists (Hopkins 2014). However there is a lot of overlap between the concepts put forward by RE and the principals of HROT (Lekka 2011) so much so that, as (Le Coze and Dupre 2006) and others have done, this paper will deal with both concepts as more or less interchangeable.
According to Perrow’s theory of normal accidents, accidents are inevitable in complex organisations that operate high-risk technologies. In particular, Perrow contends that there are several major characteristics which make accidents in complex high-risk organisations practically inevitable (Perrow 1984). These features include tight coupling and interactive complexity. Tight coupling refers to the amount of interdependence between a system’s components such as people, equipment and processes Interactive complexity refers to the extent to which the interactions among the system’s components are unexpected and invisible. For instance, a system is considered tightly coupled where one process follows rapidly or invariably from another, these are generally highly automated systems with little or no chance for human intervention (Hopkins 1999). Because these tasks or processes are often interdependent, a failure that occurs in one part of the system has the potential to quickly cascade to other parts of the system. The presence of these characteristics means that there is insufficient time and knowledge (due to the system’s complexity) to fully understand and intervene in potential failures (Perrow 1984). Perrow suggests that where systems are tightly coupled, complex and in a high risk organisation accidents are inevitable (Hopkins 1999). Perrow went on to classify systems such as nuclear weapons and aircraft into high-risk categories. On the other hand, manufacturing plants such as oil refineries and chemical plants were classified as lower-risk categories (Lekka 2011).
HROT has gained popularity in academia and the “real world” having been used in the investigation of the Columbia space shuttle explosion and the Buncefield Incident in the UK among many others (Hopkins 2014). Some of its concepts have also bee incorporated into other frameworks such as the ITIL Service Management framework. Early HRO research focused on an originations ability to maintain an error free record over long periods of time (Roberts 1990) however the field of study has evolved and now looks more closely at how HROs manage their risks through organisational control of hazards and probability (Rochlin 1993). (Weick and Sutcliffe 2001) describe a model by which HROs manage their risks through five processes, these include preoccupation with failure, sensitivity to operations, commitment to resilience and deference to expertise (particularly in a crisis).
The concepts behind the HRO perspective and the principals of resilience engineering overlap extensively (Lekka 2011). Resilience engineering has been applied across several industries, including the aviation, petrochemical, and nuclear industries. (Nemeth and Cook 2007) define resilience is as ‘the ability of systems to survive and return to normal operations after encountering a challenge’. Wreathall includes in his definition of resilience an organization’s ability to maintain a safe mode of operation and resume normal operations after an incident occurs (Lekka 2011). Thus, an organization’s ability to restore operations to normalcy is an essential trait of resilience.
The question of whether HROs can ever be error free needs to be examined with the above theories / frameworks in mind. Each of these theories has their own drawbacks when applied to real world situations. If one were to only apply NAT to the question the answer would be an automatic “no” as the theory states that errors/ accidents are unavoidable.
NAT as put forward by Perrow states that a system tightly coupled and complex enough will inevitably experience an accident, this will happen regardless of how well the system is managed (Perrow 2011). The problem with this is that is fails to take into account that many systems are in fact badly managed. Perrow himself states that most of the major accidents of the past decade or so have resulted from such mundane reasons as poor maintenance, cost pressures, negligence or incompetence (Perrow 1994). Perrow uses the Three Mile Island nuclear reactor as his model for a complex tightly coupled system and the reactors near meltdown is his example of a natural accident (Perrow 1984). However, Hopkins points out that there were in fact several incidents before the accident which could have served as barriers as described in James Reasons Swiss Cheese Model (Reason 1997), proper reporting and action taken at any of these points would have prevented the accident (Hopkins 2001) thus its was not inevitable at all. NAT has been further criticised as being too limited in the its definitions of complexity and tight coupling as well as the narrow scope of organisations it applies to (Lekka 2011) furthermore while trying to explain causes of accidents it does little to suggest how these types of accidents can be prevented (Hopkins 1999)
Given the shortcomings of the theory we cannot answer our question reliably using NAT as a framework. The narrow scope, focus on causation and pronouncements of inevitability render it an inadequate tool for our purpose.
Looking at the question through the lens of HROT and RE provides a different perspective that is focused on understanding the conditions whereby complex systems do not fail and is informed by research in technologically complex organisations that are able to sustain high levels of safety performance. HRO researchers argue that accidents are not inevitable because processes can be put in place that can significantly lower the probability of and contain catastrophic errors (La Porte and Consolini 1998). HROT argues that organisations can improve their reliability and resilience by engineering a positive safety culture that embraces inquiry, bottom up communications, fault reporting, a just culture and management commitment to safety. Significantly HROT and RE perspectives acknowledge that failures will occur and instead have a focus on learning from incidents and “near misses” in order to improve processes. In this way accidents can either be avoided completely by early intervention at a lower level of the Swiss Cheese model or the effects can be contained to less than disastrous levels (Weick, Sutcliffe and Obstfeld 1999).
The HROT / RE lens is more realistic and useful to real world situations in that it recognises that organisations are dynamic in nature, that failures can occur that are not catastrophic and offers a framework to reduce the risk of accidents. In using HROT to in relation to our question the answer would have to be “No, but…”. HROT does not profess to make organisations error free. Instead it provides a framework whereby potential failures can be anticipated, contained and recovered from in a timely manner. It should be noted that the HRO model is an ideal that (Weick and Sutcliffe 2001) admit no real organisation could completely live up to due to social, environmental or political implications (Weick, Sutcliffe and Obstfeld 1999).
Our analysis of NAT and HROT in relation to producing error free organisations has shown that NAT is an unsuitable model as it automatically assumes that disasters are inevitable in complex organisations and provides no guidance as to preventing them. HROT and RE similarly assume that error will occur but also provides a theoretical framework that organisations can try to implement that will help anticipate and more importantly contain the effects of errors to a less than catastrophic level. While it is acknowledged that no organisation could completely live up to the HROT /RE models, even partial implementation is better than nothing in a world where anything that can go wrong, will go wrong.
References
Hopkins, A. 2014. “Issues In Safety Science.” Safety Science 67: 6-14.
Hopkins, A. 1999. “The Limits Of Normal Accident Theory.” Safety Science 32: 93-102.
Hopkins, A. 2001. “Was Three Mile Island A ‘Normal Accident’?” Journal of Contingencies And Crisis Management 9 (2): 65-72.
La Porte, T, and P Consolini. 1998. “Theoretical and Operational Challenges of ‘High Reliability Organisations’: Air Traffic Control and Aircraft Carriers.” nternational Journal of Public 21 (6-8): 847-852.
Le Coze, Jean-Christophe, and Michele Dupre. 2006. “How to Prevent a Normal Accident in a High Reliable Organisation ? The Art of Resilience, a Case Study In The Chemical Industry.” Resilience engineering. Juan-les-Pins, France. 181-190. Accessed 10 13, 2022. https://hal-ineris.archives-ouvertes.fr/ineris-00973243.
Lekka, Chrysanti. 2011. High Reliability Organisations: A Review Of The Literature. Health and Safety Executive, 1-34.
Muhren, Wilem J, Gerd Van Den Eede, and Bartel Van de Walle. 2007. “Organizational Learning for the Incident Management Process: Lessons from High Reliability Organizations.” ECIS 2007 Proceedings. 65. Accessed 10 20, 2002. https://aisel.aisnet.org/ecis2007/65/
Nemeth, C, and R Cook. 2007. “Reliability Versus Resilience: What Does Healthcare Need?” Proceedings of the Human Factors and Ergonomics Society Annual Meeting 51 (11): 621-625.
Pedram, Shiva , Pascal Perez, and Bruce Dowsett. 2013. “Assessing The Impact Of Virtual Reality-Based Training On Health And Safety Issues In The Mining Industry.”
Perrow, C. 1984. Normal accidents: Living With High-Risk Technologies. New York: Basic Books.
Perrow, C. 1994. “The Limits Of Safety: The Enhancement Of A Theory Of Accidents.” Journal of Contingencies And Crisis Management 2 (4): 212-220.
Perrow, C. 2011. The Next Catastrophe: Reducing Our Vulnerabilities To Natural, Industrial And Terrorist Disasters. Princeton, New Jersey: Princeton University Press.
Reason, J. 1997. Managing The Risks of Organizational Accidents. Aldershot: Ashgate. Accessed 10 12, 2002.
Roberts, K. 1990. “Some Characteristics Of One Type Of High Reliability Organisation.” Organization Science 5 (2): 160-176.
Rochlin, Gene I. 1993. “Defining “High Reliability” Organizations In Practice: A Taxonomic Prologue.” New challenges To Understanding Organizations 11: 32.
Weick, K E, K M Sutcliffe, and D Obstfeld. 1999. “Organizing For High Reliability: Processes of Collective Mindfulness.” Research in Organizational Behavior 1: 81-124.
Weick, Karl E, and Kathleen M Sutcliffe. 2001. “Managing the Unexpected: Assuring High Performance in an Age of Complexity.” JosseyBass, a John Wiley & Sons Inc. Company.