Single event upset

A single event upset (SEU) is a change of state caused by one single ionizing particle (ions, electrons, photons...) striking a sensitive node in a micro-electronic device, such as in a microprocessor, semiconductor memory, or power transistors. The state change is a result of the free charge created by ionization in or close to an important node of a logic element (e.g. memory "bit"). The error in device output or operation caused as a result of the strike is called an SEU or a soft error.

The SEU itself is not considered permanently damaging to the transistor's or circuits' functionality unlike the case of single event latchup (SEL), single event gate rupture (SEGR), or single event burnout (SEB). These are all examples of a general class of radiation effects in electronic devices called single event effects.

History

Single event upsets were first described during above ground nuclear testing, from 1954 to 1957, when many anomalies were observed in electronic monitoring equipment. Further problems were observed in space electronics during the 1960s, although it was difficult to separate soft-failures from other forms of interference. In 1978, the first evidence of soft errors from alpha particles in packaging materials was described by Timothy C. May and M.H. Woods. In 1979, James Ziegler of IBM, along with W. Lanford of Yale, first described the mechanism whereby a sea level cosmic ray could cause a single event upset in electronics.

Cause

Terrestrial SEU arise due to cosmic particles colliding with atoms in the atmosphere, creating cascades or showers of neutrons and protons, which in turn may interact with electronic circuits. At deep sub-micron geometries, this affects semiconductor devices in the atmosphere.

In space, high energy ionizing particles exist as part of the natural background, referred to as galactic cosmic rays (GCR). Solar particle events and high energy protons trapped in the Earth's magnetosphere (Van Allen radiation belts) exacerbate this problem. The high energies associated with the phenomenon in the space particle environment generally render increased spacecraft shielding useless in terms of eliminating SEU and catastrophic single event phenomena (e.g. destructive latch-up). Secondary atmospheric neutrons generated by cosmic rays can also have sufficiently high energy for producing SEUs in electronics on aircraft flights over the poles or at high altitude. Trace amounts of radioactive elements in chip packages also lead to SEUs.

Testing for SEU sensitivity

The sensitivity of a device to SEU can be empirically estimated by placing a test device in a particle stream at a cyclotron or other particle accelerator facility. This particular test methodology is especially useful for predicting the SER (soft error rate) in known space environments, but can be problematic for estimating terrestrial SER from neutrons. In this case, a large number of parts must be evaluated, possibly at different altitudes, to find the actual rate of upset.

Another way to empirically estimate SEU tolerance is to use a chamber shielded for radiation, with a known radiation source, such as Caesium-137.

When testing microprocessors for SEU, the software used to exercise the device must also be evaluated to determine which sections of the device were activated when SEUs occurred.

SEUs and circuit design

By definition, SEUs are non-destructive events. However, under proper circumstances (of both circuit design, process design, and particle properties) a "parasitic" thyristor inherent to CMOS designs can be activated, effectively causing an apparent short-circuit from power to ground. This condition is referred to as latchup, and in absence of constructional countermeasures, often destroys the device due to thermal runaway. Most manufacturers design to prevent latch-up, and test their products to ensure that latch-up does not occur from atmospheric particle strikes. In order to prevent latch-up in space, epitaxial substrates, silicon on insulator (SOI) or silicon on sapphire (SOS) are often used to further reduce or eliminate the susceptibility.

In digital and analog circuits, a single event may cause one or more voltages pulses (i.e. glitches) to propagate through the circuit, in which case it is referred to as a single-event transients (SET). Since the propagating pulse is not technically a change of "state" as in a memory SEU, one should differentiate between SET and SEU. If a SET propagates through digital circuitry and results in an incorrect value being latched in a sequential logic unit, it is then considered an SEU.

In space-based microprocessors, one of the most vulnerable portions is often the 1st and 2nd-level cache memories, because these must be very small and have very high-speed, which means that they do not hold much charge.[1] Often these caches are disabled, if terrestrial designs are being configured to survive SEUs. Another point of vulnerability is the state machine in the microprocessor control, because of the risk of entering "dead" states (with no exits), however, these circuits must drive the entire processor and are not as vulnerable as one might think. Another vulnerable processor component is the RAM. To ensure resilience to SEUs, often an error correcting memory is used, together with circuitry to periodically read (leading to correction) or scrub (if reading does not lead to correction) the memory of errors, before the errors overwhelm the error-correcting circuitry.[1]

See also

References

  1. 1 2 "A Survey of Techniques for Modeling and Improving Reliability of Computing Systems", Mittal et al., IEEE TPDS, 2015

Further reading

General SEU
SEU in programmable logic devices
SEU in microprocessors
SEU related masters theses and doctoral dissertations
This article is issued from Wikipedia - version of the 11/13/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.