Redundant Systems – Fully Redundant, No Single Point Failure, Seamlessly Switched

Application Note – 16

Redundant Systems – Fully Redundant, No Single Point Failure, Seamlessly Switched

Introduction

In many systems, extremely high “up time” for the fully operational system is critical. Additionally, within many of these systems where a precision frequency reference is used (telecommunications, satellite communications, radio and TV broadcasting and many others) phase continuity of the frequency reference  is also required, as a phase “jump” can cause client equipment to lose system lock, causing a temporary system outage.

This paper first addresses the reliability benefits of using a redundant system, and then addresses a system solution to providing a fully redundant, no single point failure, seamlessly switched, redundant system. A single 10MHz reference is addressed, however the concept is expandable to include additional signals if required.

Mean Time Between Failures (MTBF)

The most common measure used for estimating how reliable a system will be is Mean Time Between Failures, more commonly referred to simply as MTBF. MTBF is actually the inverse of the measure for failure rate referred to as Lambda, ( λ ) . MTBF for a system can be calculated by combining the individual MTBF’s of all the system components. The way the system components are combined depends upon the configuration. In a normal (non-redundant) system, the individual components can be considered as being in series. In this case, system MTBF will be less than the individual MTBF of the components.

Calculation of the system MTBF is as follows:

For component A, failure rate Fa = 1/1000 = 0.001                     For component B, failure rate Fb = 1/2000 = 0.0005

Overall failure rate Ft = Fa +Fb = 0.001 + 0.0005 = 0.0015

MTBF = 1/Ft = 667 hrs

Now considering the redundant solution, where the components are in parallel, and both need to fail for the system to fail;

again for component A, failure rate Fa = 1/1000 = 0.001             For component B, failure rate Fb = 1/2000 = 0.0005

however this time the overall failure rate is calculated as the product of the individual failure rates:

Overall failure rate Ft = Fa x Fb = 0.001 x 0.0005 = 0.0000005     

MTBF = 1/Ft = 2,000,000 hrs or over 200 years,  a significant difference !

Practical Application

In order to take advantage of this improved MTBF through parallel systems, it is important to remember that a vastly  improved overall system MTBF is ONLY realizable provided that there is NO SINGLE POINT FAILURE mechanism within the system, i.e. that there is no single component within the system that if it fails can cause failure of the system.

If such a component does exist, overall MTBF of the system can never be better than the MTBF of the single component, and in fact would be calculated as in the serial case above.

In the system described below, we consider two (i.e. redundant) 10MHz precision frequency sources, 10MHz source 1 and 10MHz source 2, feeding into a triple auto switch and dual PLL (Phase locked loop) system. Technically speaking the output auto switch, AS 3, could be considered as a single point failure, however this is designed as a mechanically latching mechanical relay, and therefore even if it fails, it continues to pass through the last selected signal, thus does not impede system operation. Similarly the crystal filter, which is used to maintain output during the switching time of AS 3 could also be considered a single point failure, however again it is a passive component, and therefore the MTBF is extremely high and has a negligible impact on the overall system MTBF calculation.

Considering the system schematic below ;

Both source 1 and source 2 have 2 x 10MHz outputs in order to drive the two auto switches AS1 and AS2. The outputs from source 1 are used to drive the A inputs on AS 1 and AS 2 while the outputs from source 2 are used to drive the B inputs on AS 1 and AS 2.

The AS 1 and AS 2 outputs are used to drive two Phase locked loops, PLL 1 and PLL 2. During operation AS 1 and AS 2 switch together, i.e. they will either both be using input A or both be using input B. Therefore, because the PLL 1 and 2 inputs both come from the same source (either source 1 or source 2) the outputs from the PLL’s that go to AS 3, will both be in phase.

The result of this is as follows;

Initially both AS 1 and AS 2 will select source 1 on the A inputs

PLL 1 and PLL 2 outputs will both be in phase (with source 1)

AS 3 will select input A, from PLL 1

If source 1 fails, both AS 1 and AS 2 will switch to input B, coming from source 2

PLL1 and PLL2 will now both slew to lock to source 2 (slew rate can be defined by system requirements)

AS 3 will continue to use input A, from PLL1

If PLL1 fails, AS 3 will switch to input B from PLL 2 which is already in phase with PLL 1, avoiding any phase discontinuity.

Finally, in order to be complete, it is necessary for the system power to be redundant also, as clearly a power failure would inevitably stop system operation.

The above system describes a very robust solution to providing redundancy, and ideally all elements of the parallel system paths would be housed in separate physical chassis, to avoid system disruption in case of a failure and replacement of a module, however this can significantly increase system cost, and also size, both of which are important factors in determining an appropriate system solution.

Additionally, there are many other redundancy schemes available that provide significantly enhanced reliability without necessarily going to the ultimate solution, thus saving cost and space utilization.

Precise Time and Frequency, Inc. has the most comprehensive selection of  redundancy and distribution  instruments and modules for time and frequency applications in the world, and is continually providing custom solutions to meet system performance and budget requirements.

Scroll to Top