Justifying IEC 61511 Spend
Business Development, Software
aeSolutions, Dallas, Texas, USA
EVP - Global Process Safety Technology
aeSolutions, Anchorage, Alaska, USA
Many companies subscribe to the thought process that simply completing compliance documentation identified by IEC 61511 is the end goal. Anything more than that is deemed too tedious and represents a substantial cost center. Unfortunately, documentation is just one aspect of the lifecycle, and one that isn’t substantially making your assets safer from one day to the next. We believe the essence of the standard is to not only generate documentation, but to monitor the performance of protections layers vs. assumptions made in the front of the lifecycle. As poor assumptions are identified, companies can sustain their business by eliminating the root cause, therefore removing the previously invisible risk.
In this paper, we advocate that one should generate compliance documentation as efficiently as possible, but really focus on the impact of bad assumptions and putting a financial basis behind its meaning. This information can then be benchmarked on a monthly basis to set company targets, monitor improvement, and understand the impacts financially.
IEC 61511, Safety Instrumented Systems (SIS), Safety Instrumented Function (SIF), Independent Protection Layers (IPL), Process Hazard Analysis (PHA), Layer of Protection Analysis (LOPA), Safety Requirement Specification (SRS), Safety Lifecycle, Safety Integrity Level (SIL), SIL Calculations and Verification, Risk Reduction Factor (RRF), Financial Risk, Life Cycle Cost (LCC), Return On Investment (ROI), Key Performance (KPI), Functional Safety Index™ (FSI)
Unfortunately, process safety and protections layers are not treated like traditional equipment at a manufacturing facility. If a company wants to expand a process unit or purchase equipment for profitability, they will do a complete lifecycle cost analysis, Return On Investment (ROI) analysis, and whatever other economic analysis it takes to justify the investment. Process safety has a void here, therefore, making it difficult to compete for capital. Process safety does not generate more throughput, and many in management subscribe to the thought process that “we haven’t had a large accident at this facility in the last 30 years. Can’t we defer this cost until next year?” There simply has not been a great approach to determine the right amount of spend or effective ways to communicate the necessity throughout the business. This paper is going to attempt to fill that void by making the argument for performance-driven functional safety metrics.
As stated in the abstract, many folks look at IEC 61511 and just see the importance of creating the best PHA, LOPA, and SRS Reports as the end goal of the safety lifecycle. Unfortunately, those reports are a snapshot in time and do not make your facility safer on a daily basis. We believe the essence of the lifecycle standard is that you need to monitor the performance of credited protection layers to validate assumptions made at that snapshot in time. Traditionally, monitoring is a tedious task because you have to synchronize operational data with the free text in the reports that are included in a large amount of paper that is several inches thick.
Figure 1: Traditional Data Disconnect
Lifecycle tools on the market eliminate the disconnect illustrated in figure 1, making it simple to identify bad actors whether in the form of assumptions or poorly performing equipment. This visibility allows you to take action to remove risks and sustain your business.
Assumptions vs Actual Performance
So, let’s talk about assumptions surrounding an individual hazardous scenario. The PHA identified a hazardous scenario requiring a Layer of Protection Analysis (LOPA). The following assumptions form the basis for this scenario:
- Consequence severity of B (e.g., potential single fatality)
- Initiating cause frequency¹ for a BPCS failure assumed to fail once every 10 years, which corresponds on the Risk Matrix below to the yellow circle on the likelihood scale between a 4 and 5
- Credit for an Independent Protection Layer (IPL), alarm with operator response²,
- Requirement to design a SIF³ to ensure the scenario meets the company’s tolerable mitigated event likelihood, which corresponds on the Risk Matrix below on the likelihood scale between a 1 and 2
Figure 2: LOPA Assumptions
Over time, the facility experiences trip/demand events that trigger incident investigations. Data is pointing to an initiating cause frequency of 1/5 years instead of 1/10 years as can be seen on the Risk Matrix in Figure 3. Therefore, the starting point has moved to the right, creating a gap from the start. The maintenance records indicate the alarm is tested as required, and the operator is properly trained to respond, so its credited amount is validated. However, the maintenance records also show that the field devices associated with the SIF were tested much later than the test intervals defined in the SIL Verification Calculations, leading to a degraded RRF to 75% of the achieved design. This equates to an additional gap as distance on a risk matrix that can be tied to both time and dollars.
Figure 3: Actual Performance Based on Operational Data
The Simple Math and Bad Actor Criteria
The fundamental building block to summarize the overall strength of the assumptions in the LOPA is a single unit of risk. Since LOPA math is a log-scale, we kept with that same approach with the exception of SIL 0s (i.e., alarms, BPCS interlocks, etc.). The following table shows the risk units (or fractions of) that were used in our math.
Table 1: Risk Units per SIL
The risk units in the table will increase as the strength of the RRF increases, depending on the SIL calculation of the design in use. The SIL 0s were used as a fraction simply because the log of one equals zero and many credited alarms and interlocks are paramount in risk reduction and should be monitored as well.
The following variables (x) can impact an assumption in the LOPA and / or the design criteria in the SRS or Non-SIF IPL Requirement Specification:
- Initiating Cause Frequencies – can only be identified through incident investigations
- Demand Rates – compared to what is assumed in the LOPA based on the order of actuation of IPLs (historian)
- Excessive Bypass – aggregated time in bypass over a period of time compared to the integrity level of design (historian)
- On-Time Testing – if devices are tested late compared to the test interval in SIL Calculation (CMMS)
- Frequent Failure – devices are failing more frequently than the failure rates (lambdas) using in the SIL calculations (CMMS and analytics)
Each one of the variables above has a weighted impact on the overall calculation that may differ. In addition, if a hazardous scenario is over-protected, a gap will not be identified until a true gap occurs. This means that facilities should also consider over-protection as an opportunity to reallocate resources to focus on more critical scenarios.
Having defined all of these variables, we can now calculate what we call the Functional Safety Index™ (FSI). Once again, the FSI measures how accurate the LOPA is. For example, if a number of bad actors are identified, then the LOPA is only a fraction of what is assumed (100%). The math for this is:
where n = hazardous scenarios, and actual risk units (see formula below) account for performance degradation. The minimum function is in place to not give you extra credit in the event of over-protection that would be misrepresented in the summation of each scenario.
In order to understand the impact on the financial risk profile of a facility, the FSI can be used to quantify the additional risk of the bad actors:
where n = hazardous scenarios and the Targeted Mitigated Event Likelihood (TMEL) corresponds with the LOPA scenario unmitigated consequence.
Generic Case Study
Based on the equations identified in the previous section, a facility must be able to account for the following information:
- Number of Hazardous Scenarios taken to LOPA
- Each scenario assumes a TMEL of 1e-4 for simplicity
- Initiating Cause Frequencies
- Number of LOPA credited IPLs (target RRF and credited RRF) (e.g., alarm with operator action, BPCS control loop, BPCS interlock, etc.)
- Number of SIL 1 SIFs (target RRF and achieved RRF)
- Number of SIL 2 SIFs (target RRF and achieved RRF)
- Number of SIL 3 SIFs (target RRF and achieved RRF)
- Average catastrophic event based on risk matrix
In order to simplify this example, we will assume that the initiating cause frequencies remain the same and that the average catastrophic event cost equals that as was recently published by the EPA ($265MM per event – see references).
After monitoring performance, the facility has identified:
- 10% of protection layers are being tested late.
- 3% of protection layers contain devices that fail more frequently that what is assumed in the failure rates used in the SIL calculation.
- 14% of protection layers were being demanded more of than assumed.
- 8% of critical devices on protection layers were in bypass excessively.
It is important to note that even though the sources of data are database systems, an SIS-competent individual must review and validate the CMMS and historian events to ensure that they are truly bad actors. In some cases, training of maintenance technicians may be required to ensure the proper definition of failure is used and documented appropriately.
Based on the bad actors depicted in Figure 3 above, the facility’s FSI was 76%. This translates to their LOPA only being 76% as stout as assumed. Based on this information, the company should target a higher FSI and will be able to achieve it by driving the poor performance identified to zero.
Using the calculated FSI the facility can now understand the financial profile based on the performance of their protection layers. With 400 hazardous scenarios taken to LOPA, the impact per year is an additional $3.4MM/year, which translates to $85MM over the life of the asset (25 years).
Now, armed with a Functional Safety Index and a Functional Safety Cost Impact, facilities can effectively communicate to those who hold the purse strings about the right amount of spend associated with functional safety-related activities (e.g., functional testing, root cause analysis, SIF installation, etc.). This will allow an end user to justify functional safety spend on a $ / Risk Unit basis and compete for capital on equal footing within the organization. By focusing efforts on the overall holistic goal of the safety lifecycle of Execute, Monitor, and Sustain, one can use the safety lifecycle to cost effectively remove risk from the business.
- ANSI/ISA-84.00.01-2004 Part 1 (IEC 61511-1: Mod). Functional Safety: Safety Instrumented Systems for the Process Industry Sector – Part 1: Framework, Definitions, System, Hardware and Software Requirements. The Instrumentation, Systems, and Automation Society. Research Triangle Park, NC.
- ENVIRONMENTAL PROTECTION AGENCY 40 CFR Part 68
[EPA-HQ-OEM-2015-0725; FRL-9954-46-OLEM] RIN 2050-AG82
Accidental Release Prevention Requirements: Risk Management Programs under the Clean Air Act. AGENCY: Environmental Protection Agency (EPA).
ACTION: Final rule.