Skip to the content

Safety System Bad Actor Identification – Frequent Failures

Your company has started the journey for compliance to the IEC 61511 safety lifecycle, but it’s a long and arduous path to get organizational alignment around the benefits of the safety lifecycle.  So, what’s a relatively quick win you could get that will help demonstrate to management some of the end game benefits?  You’ve identified all of your SIFs and are testing them, and so far you’ve collected 3 failures.  Is this good?  Is this bad?  How do we quickly assess?  If we simply calculate the percentage of failures out of total devices tested, it’s a very small figure, and management might incorrectly jump to the mindset of “let’s start extending test intervals.”  What if we could quickly and easily identify a simple pass/fail target of number of failures/year for your SIF field devices?  

Below are your simple steps to success:

  1. First, make a decision where you are going to collect information on detected and undetected failures from:
    1. Functional Test Plans [Undetected] – if so, you need to make sure your maintenance technicians can provide enough information to identify the source of the failure
    2. Historian Data [Detected] – if so, make sure you have the appropriate logic tags for what could show up as a failure (i.e.- Bad signal quality, stuck signal, failed automated partial stroke test, etc.)

Note: For the purposes of this blog, we will assume both. Initially, we are only looking at dangerous as those are impacting SIL Calculations.

  1. Select the largest count of similar instrument types at your facility (potential sibling assets as well) that use the same generic failure rate.

Note: For the purposes of this blog, we will assume 1,000 generic pressure transmitters. The round numbers are used to simplify the example.

  1. Back calculate the number of failures from the generic failure rates used. Be sure your units are correct (i.e. - #/hr. or FITs). These failure counts are what we assume as “acceptable”. If we find more or less failures, we want to flag generic pressure transmitters as potential bad actors.

Figure 1 – Targeted Failures

  1. Make sure your maintenance technicians fully understand the definition of a failure.
  1. Make sure to review historian records on at least a monthly basis to ensure the event captured is accurate and ultimately corrected by maintenance instead of bad data.

After the year, figure 2 below is a sample of what one might find.

Figure 2 – Simple Results after Analysis

Figure 2 illustrates that we have frequent dangerous undetected failures on our generic pressure transmitters when compared to the failures rates used in the original SIL Calculations. This could mean that you have more unaccounted for risk than assumed. Having said that, there are a number of questions that you should be asking with these findings (See Figure 3 for advanced analytics to support the answers):

  • Were my SIFs over designed such that the gap introduced is acceptable?
  • How can I quickly generate a new failure rate for dangerous undetected?
  • How could I cost effectively perform a sensitivity analysis on hundreds of SIFs?
  • Before adopting new failure rates as proven in use, do I need to take into consideration any of the following to split my devices and extend the analysis period?
    • Service or duty
    • Installation detail
    • Manufacturer
    • Make
    • Model number
  • What if the results were opposite, meaning significantly lower failure counts? We should be able to do similar analysis to help us extend/optimize manual testing intervals.

Figure 3 – Analytics to Further Support Business Decisions

All of these questions and ideas pose challenges that a safety lifecycle database, such as aeShield®, simplifies and reduces the overall analysis. With aeShield as our own application, we have participated in a number of projects to begin failure data collection. Every time we’ve been involved with the above review process, some eye opening issues are discovered, which help focus awareness on the benefits of the safety lifecycle.  By highlighting some potential frequent failure bad actors and showing how this financially impacts the business, one can start to get management alignment and begin an understanding of your IEC 61511 compliance efforts.   

comments powered by Disqus