# Porting & Scaling Strategies for Nanoscale CMOS RHBD

Robert L. Shuler robert.l.shuler@nasa.gov or mc1soft@yahoo.com NASA Johnson Space Center, Houston, TX 77058

ABSTRACT: Techniques are described for minimizing the number of cells in a digital logic library, scaling and porting the cells to process nodes that do not nominally support scaling, and increasing the separation of critical node pairs without unduly disrupting the design process. A new compact modular 8T self-voting latch reduces circuitry by over half, allowing modular redundancy to approach theoretical efficiency limits. The result is allows investment in low volume designs, such as but not limited to radiation hardened by design (RHBD) applications for mission critical components, to provide returns over decades-long time periods.

#### I. INTRODUCTION

This paper addresses the need for high speed, low power and highly complex integrated circuits to support manned missions assisted by intelligent subsystems that are able to find landing sites in real time and perform other mission critical applications. Current prototypes such as the Morpheus Lander developed at the Johnson Space Center incorporating Automated Landing and Hazard Avoidance Technology (ALHAT) developed as a multi-center collaboration are now proving the value and feasibility of such subsystems. The ability to implement them depends on large scale multicore integrated circuits. It is likely that there exist commercial applications as well which have similar problems, as foundries target more and more expensive processes toward the largest volume and most profitable mass consumer applications.

The design of complex subsystems must proceed incrementally over a longer period of time than in the past. In the case of space systems, this is due to government-level decisions to fund deep space exploration as a long-term incremental project rather than a crash effort as was Apollo. The long-term nature

means that designs must remain valid for not just years but decades. In the case of specialized commercial systems, incremental development would also be desirable.

From the rise of significance of radiation induced single event effects (SEE) in circuits in the mid-1980s, until the end of scaling at approximately the 180nm node in the mid-2000s, three factors supported the availability of circuits: re-use from military and unmanned programs, qualification testing commercial parts, and the porting or scaling of designs from one generation of integrated circuit technology to another. The former and latter have nearly disappeared as viable options, while of commercial qualification parts traditionally been, and still is, applicable mostly to non-critical systems. The large cost of newer processes, together with the inability to re-use existing designs, threatens the cost effective availability of devices designed especially for use in space. Investment in a new design is made obsolete by disappearing fabrication facilities all too quickly. Even if scaling were still available, the prevalence of multi-node upset in nanoscale technologies prevents use of older designs.

Commercial developers of high reliability and high altitude applications are already concerned with SEE, though they can use weaker techniques. At some point, with gate lengths perhaps just a few nanometers, it is likely that such applications will need stronger techniques, such as those described in this paper.

We describe five techniques for addressing these issues:

- 1. An extremely small cell count digital logic library with radiation hardened by design (RHBD) features.
- 2. Multi-parameter scaling techniques for porting between similar technology nodes.

- 3. RHBD basic cells designed to partition critical node pairs into different cells.
- 4. A new RHBD cell which minimizes the routing required for arbitrary node separation and full modular redundancy.
- 5. A block placement-routing strategy which allows efficient node pair separation.

Items 1 and 2 and the general parts of item 5 may be of interest to all application specific integrated circuit (ASIC) developers, whereas 3 and 4 are targeted specifically at circuits that must operate in radiation environments. We consider mainly SEE and not total dose, though the library technique can certainly be used in conjunction with known total does mitigation techniques such as annular gates (these can be used at nanoscale with design rule waivers). SEE include:

- Single Event Latchup (SEL) the triggering of parasitic devices in bulk complementary metal oxide semiconductors (CMOS) which act like a silicon controlled rectifier (SCR).
- 2. Single Event Upset (SEU) the unwanted change of state of a stored bit in registers, control flip flops, or memory.
- 3. Single Event Transient (SET) propagating transient waveforms which may become an upset if latched, which happens with much greater frequency in modern high speed designs than in previous eras.

## II. SEE/RHBD BACKGROUND

Dodd et. al. outline current and future trends and challenges in radiation effects in CMOS. [1] While SET and SEU threshold decrease as feature size decreases, they decrease relatively more slowly than feature size in bulk CMOS. Silicon on insulator (SOI) trends are better, but as SOI continues to be somewhat of a niche technology and less available than bulk, especially in leading technologies, we elect to focus on bulk CMOS in this paper.

Glorieux, Lin, Huang, et. al. [2] [3] [4] [5] clearly show a trend in current RHBD research to focus on terrestrial and lower atmospheric effects in nanoscale technologies, approximately 40nm to 20nm [Ibid. 5] for RHBD published research at present. Modest improvements such as from hysteresis latches which offer less

hardening than the standard dual interlocked cell (DICE) latch are featured. These efforts are worthwhile, more easily funded, more easily gain access to ever more restrictive new processes, and provide career opportunities for the researchers, but do not address the focus of our paper.

We suggest that while the prediction that RHBD technology will make its way into the commercial, terrestrial world has come to pass, that there will always be a strong differential between commercial applications, and critical space applications that require special Very likely if Moore's law techniques. continues and the commercial world moves into the sub-nanometer realm, not presently so very far away as 7 nm processes have already come into production, that new exotic techniques used to make nanoscale circuits suitable for critical space applications will also one day migrate into the commercial world.

Some RHBD efforts certainly attempt to make circuits that perform better than the reference DICE latch. D'Alesio et. al use extra transistors in the inter-stage connects, essentially resistors, reminiscent of much older hardening techniques used with single-string latches and having similar drawbacks, such as a 41% increase in delay at 32 nm. [6] All dual coupled designs, and some triple coupled designs such as the temporal latch [7] depend on a longer clock cycle to allow time for SETs to dissipate, even fully dual rail logic systems that do not insert explicit delays. [8]

More recent improvements over the basic DICE latch involve identification and separation of pairs of critical nodes which must both be affected to cause an SEU. Amusan et. al. showed in 2008 for a 90 nm process, larger in dimension than our target process range, that such nodal separation offered an "order of magnitude decrease in upset cross section." [9] They also note that charge sharing is worse in the same well, a finding only partly mitigated by Ahlbin et. al. who note that sharing a well improves pulse quenching [10], because taking advantage of pulse quenching on a large scale, not just within a particular layout cell but in the general case of cell abutments, imposes further constraints that complicate the design process.

The DICE "Dual Interlocked CEII" architecture is fundamentally separable even though the schematic is often not drawn that way. The author showed that certain other high performance RHBD latches such as the Single Topology (SERT) Resistant Dooley/TAG4 (Transition nAnd Gate 4) cells share the same basic topology as the DICE, with the TAG4 being the "fully populated" version and the DICE having the maximum number of missing transistors. [Ibid. 8] [11] Haghi et. al. take advantage of the separable aspect to interleave parts of two separate latches so that the "halves" of any one particular DICE are not adjacent. [12] Cabanas-Holmen et. al. evaluate such an arrangement in a 32 nm SOI process [13] and find a two order-of-magnitude improvement over unprotected flip flops. This still might not meet the requirements of some space systems without further redundancy, and strays outside our bulk CMOS "goal." Shambhulingaiah et. al. generalize the approach with methods to systematically identify node pairs which must be separated to implement interleaving. [14] Presumably these methods could be applied to improvements on the DICE such as the Quatro-latch developed by Jagnnathan et. al. [15] which is not drawn such that it is obvious how to separate it, though the purpose of the Quatro appears to be to improve performance for a compact commercial layout rather than for critical space applications.

Several triple cross coupled latches have appeared, including one by Cameron et. al. at ICS in 110 nm [16] and Tianwen Li's triple latch at 130 nm with SEU threshold of LET 42 MeV cm<sup>2</sup>/mg [17], both of which appear to be intended for space critical applications. Unlike Triple Modular Redundancy (TMR) circuits, these still require the extra delay of similar dual coupled circuits. The advantage over TMR isn't therefore clear to the author, and does not appear to be stated in the literature. Huang's survey of classic circuits [Ibid. 4] shows a count of 8T (8 transistors) for a classic TMR, times 3 of course giving 24T, but not counting the actual voting logic. The DICE latch is an 8T circuit without input mux, or generally considered to be 12T with mux, without the need for voting since that is handled by the coupling. Jagannathan's improvement on the DICE is 26T.

Shiyanovskii et. al. address "functional separation" for SRAM cells at 32 nm. [18] Interleaving in one way or another to alleviate multi-bit upsets is a long tradition in memory design, and further interleaving to separate critical nodes is a logical extension when applied to the highly regular structure of memories. Interleaving is not so readily extensible for non-regular logic circuits. It may work well at 32 nm. The same circuit at 7 nm would bring the node pairs three and a half times closer together.

Furthermore, the findings of Amusan et. al. of directional sensitivity even at 90 nm [19] imply that in certain spacecraft orientations, exposure to a burst event might significantly exceed the designers assumed conditions. Adequate testing of angle sensitive circuits is even more expensive than fabrication in many state of the art processes for R&D runs, making it a significant obstacle.

The author finds a promising direction in the work of Petrovic et. al. who rather than addressing the details of latch circuits, strive for a methodology for building dual coupled circuits which has less impact on the ordinary digital design flow, and which provides physically separate layout blocks for larger subcircuits, perhaps a latch and some associated circuitry. [20]

While triple modular redundancy with voting (TMR) has been used for many decades, and is used for some modern space parts (e.g. Leon2 and Leon3), and is popular on field programmable gate array (FPGA) platforms where the basic circuitry is already fixed, the goal of RHBD researchers has been largely to avoid this brute force method by all means, including now apparently in at least two cases using triple interlocked circuitry, or to apply it only selectively, usually in FPGAs. [21]

#### III. 10-CELL LIBRARY

An extremely small cell count RHBD library minimizes porting effort for basic layout. In cell library design, basic NAND/NOR cells follow a straightforward pattern. The cells which cost most of the designer's time are the MUX, XOR, and various latches and flip flops, especially when considering combinations of preset and clear inputs, and RHBD and unprotected versions. The principle of our

minimal library, sometimes called 10-Cell though it can have one or two more cells in it, is to recognize the following:

- The latch is a MUX with a feedback loop
- The XOR or any arbitrary function can be implemented with a MUX if needed
- If the requirement for separating critical node pairs trumps layout efficiency, then critical nodes can be placed in different "half latches," reducing by half the layout effort of custom latch cells, and by ¼ the layout effort of dual-latch flip flop circuits
- Only a small layout penalty in comparison to cell porting costs is incurred for implementing latch type variations (preset, clear, etc.) with external gates.



**Fig. 1** – 10-cells plus delay RHBD library base cells

Figure 1 shows the basic 10-cell library symbols, plus delay if desired for SET filtering in dual input circuits. Inverter, buffer, strong buffer, and 4 variations of NAND account for 7 of the 10 cells. Only one variation of NOR is provided because NOR gates with more inputs suffer difficulties balancing rise/fall times (needed to prevent SETs from widening as they propagate), and become physically large. Of course these can be added, but often a multi-gate implementation will be faster. The remaining three cells include two types of 2-input MUX, one for combination logic use, and one for implementation of dual interlocked cells such as DICE, SERT or TAG4, whichever the designer prefers and which we'll discuss later. The final cell is a tri-state buffer, which also has multiple uses, including a TAG or Guard Gate.



Fig. 2 – latch and XOR from base cells

Figure 2 shows implementations of latch (LAT) and XNOR2 from these basic cells.

#### IV. MULTI-PARAMETER SCALING

Older scaling techniques constructed most features out of multiples of one scaling parameter, lambda, which was half the minimum polysilicon line width. In newer processes the metal line widths often bear little relation to the poly line width. Spacing and surround requirements vary widely. Exact size contact and via requirements are arbitrary.

Furthermore, vendors have little interest in promoting scaling for two reasons. Internally, the enormous fab cost for each new generation dwarfs their library conversion costs, so they may as well go with new libraries. Externally, it is to their advantage for customers to invest heavily in tying their designs to a vendor's process. Vendors have imposed additional constraints on even accessing process specs, and in some cases require that users *not* also have access to other similar processes for comparison.

We use the following techniques to make designs portable between processes of the same technology node, and reduce the re-design effort when migrating to a new node. So far the work has been used to scale from 90 nm to 65 nm, and to support two different 65 nm processes, with successful fabrications at 90 nm and 65 nm. The techniques have been applied at 28 nm but it is not yet fabricated.

- Use replaceable sub-cells for contacts and vias so these need to be changed only once.
- Use max of minimum space/extension, rounding up to nearest lambda.
- Pick a scaling factor convenient for drawing metal and diffusion rather than gates.
- With a limited number of cells in the library, the gate lengths can be manually adjusted if needed.

Often the process "name" such as 65 nm does not exactly describe gate length. It may be some other value such as 60 nm, or 70 nm or even larger. Non-disclosure agreements prevent us from naming which process is which (and otherwise inhibit the completeness of this section). But for example, a process called 65

nm but having 60 nm gate length and relatively larger dimensions for other objects might work well with a lambda of 40 nm instead of the 30 nm one might expect. Let's say there are two vendors A and B. Only A has the 60 nm gate length, with B being slightly more, maybe 65 nm. Cells are drawn with most objects at multiples of 20 nm (half lambda). Contacts and vias are not directly drawn, but placed in lower level cells CONTACT and VIA1. The exact size of a contact to polysilicon or diffusion might be 90 nm in one process and 100 nm in the other. Design rules are adjusted to the most restrictive case. Gates are drawn to the longest required length. Then to move from one to the other, the gate lengths of 10 cells are adjusted, which usually takes only a few minutes, and the basic cells CONTACT and VIA1 are replaced.

For the 28 nm process considered, the actual drawn gate length was 30 nm. In most other respects also, required dimensions were half or less of the worst case 65 nm dimensions.

There are of course layer differences between processes. This type of porting is disadvantageous in one respect. It requires a nominal set of design rules which the user must define, to be used in the master library layout. However, the vendor supplied design rules can still be used by copying the design into the vendor supplied setup file, with vendor layers, and providing a layer map. Only about a dozen layers are used for the library-level cells. Once the library is verified, further development (routing, simulation, design checking) can be done entirely in the vendor design kit setup.



**Fig. 3** – NAND2 with row\_cap

Figure 3 shows overall structure of a typical cell (NAND2). All cells must have not only Vdd and Gnd busses at top and bottom as usual, but in the middle for use in guard ring structures to prevent SEL. Sometimes you will see each cell's PFETs and NFETs separately enclosed in a "ring." However, this is not necessary to prevent latchup, and at nanoscale has little value in preventing charge sharing. Instead, the ends of each row are capped with the ROW\_CAP cell, shown at right. Use of a macro to automatically place these is suggested.

The author fabricated a 65 nm chip in a non-epi process without the central guard structures and found that even with a 1 volt Vdd, it was still possible to get latch-up in core cells, even in bench testing. In a complex digital design, row spacing is likely to be determined by routing density, even with over-cell routing, so the extra cell height for the guard structures is not as large a penalty as it might appear.

With this set of strategies, bond pads will require more effort than the basic library. The same techniques can be applied to pad circuits, but due to thick oxide rules it is a completely new effort. Hopefully the bonding structure can be obtained from the vendor, but increasingly pads are treated as IP.

## V. CRITICAL NODE SEPARATION

A typical circuit diagram of a DICE cell is given by Menuoni, et. al. [22] in Figure 4.



Fig. 4 – Typical DICE schematic

While the DICE was originally designed as a "dual interlocked" circuit, there is a long running trend to minimize its area by drawing both halves as one circuit, as in Figure 4. This is counter-productive at nanoscale as it brings the critical node pairs too close together, and in fact the point of Menuoni's investigation is that: "... the tolerance to SEU is affected by the charge sharing between sensitive nodes for DICE latches designed with highly scaled processes. ... By reorganizing the layout of the studied latch, we obtained an improvement of a factor 3 in the SEU tolerance showing that the layout has a great importance."

Methods are available for separating critical node pairs into separate cells, which increases the spacing. [Ibid. 8, 11, 12, 13] Figure 5, adapted from Shuler et. al. [Ibid. 11] shows the DICE, SERT or Dooley-TAG4 cell drawn so that it can be split between a top and bottom section, with 4 interlock coupling connections between them. In the logic library described above, the RHBD MUX is one half of this diagram, whichever version the designer chooses (without the feedback). In this way, critical node separation can be arbitrarily large.



Fig. 5 – DICE family drawn for node separation

The DICE cell is the weakest of this family, being populated with the smallest number of blocking transistors. However, the "impression" that the versions with more transistors are larger is not the reason for their infrequent use. Since the additional transistors are stacked in series without intermediate contacts, they add little area. The problem has been that only the DICE was IP-free. That

situation changes in 2015 as the Dooley cell, a variation of the author's cell TAG4 tested and reported previously [Ibid. 8, 11] is now IP-free.

The Dooley cell was intended for SRAM, but multiplexers for use as a latch are easy to add. Previously the author used transconductance multiplexers, but below in Figure 6 is a variation we call the Compact TAG4 (CTAG4) which has conventional multiplexers.



Fig. 6 – ½ Compact TAG4 MUX

The CTAG4 is shown in MUX configuration, with connection according to Figure 2 (left side) required for latch function. It is also shown as only half the circuit. Another identical circuit is used to form a dual interlocked set, with the B2 and OB2 inputs coming from the B and OB outputs of the other half, and vice versa, for a total of 4 coupling lines for a latch, or 8 for a flip flop (two latches). (Dual latches, unlike TMR latches, require coupling in both stages of a flip flop.) The second half latch is either embedded in a second logic string for full dual-rail implementation, or more commonly fitted with an SET filter (delay) of appropriate length for the process used and reliability desired in the design environment. For commercial circuits, this is still viable.

### VI. COMPACT VOTING LATCH

At nanoscale, for use in space where SETs may be both frequent and of long duration compared to modern clock cycles, the dual interlocked strategies are starting to become irrelevant, even for dual-rail which still requires timing margin for SETs to settle. Processor

chips marketed for use in space typically use the less elegant technique of TMR (with voting).

The dual interlocked cells actually require more cross coupled connections than TMR architectures, at four per latch, or 8 per flip flop. TMR requires 3 per flip flop. A slave latch in a voting scheme doesn't need to be voted. The more node separation required, the greater the routing overhead of the dual interlock.

The cell design shown in Figure 7, not previously described in the literature as far as the author can discover, is based on the TAG or Guard Gate. [Ibid. 11] This circuit, which we call the Compact Voting Latch (CVL), already "votes" three things: its current output node, and the two inputs. It always assumes a state in which three of those agree (taking into account the natural inversion of CMOS). By using a transconductance multiplexer to set the cell, we can take advantage of this property and obtain an 8T cell which includes input and self-voting.



**Fig. 7** –  $\frac{1}{3}$  Compact Voting Latch (CVL)

Figure 8 shows a conventional self-correcting latch. The three NAND2 cells in the voter take three times the area of the one TAG. The NAND3 is about twice the area of the inverter of Figure 7, and the mux in Figure 8 is a double gate. Total area reduction for the CVL is half or better. It is comparable to a conventional latch or even an SRAM cell. It allows a TMR design to approach the theoretical minimum of 3x the area of single string. For small memories it eliminates the need to refresh to purge accumulating errors, and the timing overhead of error correcting circuitry (ECC).



**Fig. 8** –  $\frac{1}{3}$  conventional Voting Latch (VL)

The use of an "apparent" node fight to set the latch does not dissipate power the way an SRAM write circuit does, because the unconventional TAG or Guard Gate circuit is simultaneously interrupted on its inputs and does not fight the node except in a fraction of the error cases. Simulations show that even the latch sets reliably and quickly. By comparison, setting a conventional latch circuit this way does not work.

#### VII. AUTOMATIC INTERLOCK PORTS

A modular block technique can be used for both schematic and layout blocks so that interlock connections are automatically made when blocks are abutted. The idea is to use identical blocks so that only one block is designed. The three signal wires for a voting interlock signal set are ported on both sides of the block, but shifted circularly one position on one side vs. the other. This is shown for both schematic and layout in Figure 9. Suppose signal 1 is tied to input logic in the block, and signals 2 and 3 are assumed to be from other blocks. By rotating in this way, signal 1 in the first block becomes 3 in the second and 2 in the third.

Routing internal to the blocks can be done with lower level routing layers, perhaps one beyond that used for cell layout to allow overcell wires, leaving upper metal free for block routing. Depending on tool capabilities, it can require some tricks to obtain matching positions between the left and right ports. The author's method was to use a small dummy pad to route the block, which caused the router to evenly space the signals, then delete the dummy pad ring and the block is ready to place beside itself three times and make into a modular redundant block.



**Fig. 9** – Coupling by block adjacency, schematic (L) and layout (R).

The layout blocks in Figure 9 were quickly routed for demonstration, without using over-cell routing. A slightly larger block is more efficient. The separation of critical nodes in a group of three is exactly the block width. The narrow blocks shown are probably adequate for 65 nm very high reliability applications. For deeper nanoscale, the aspect ratio of the blocks can be changed by constraining the number of cell rows. In this way, the design can be ported to a new process by (1) replacing the basic cell library, and (2) re-routing with the blocks at a lower aspect ratio (i.e. wider).

Incidentally, 15 years of experience with this block routing methodology has produced many successful chips that route very quickly. Hand routed blocks, or blocks with special routing constraints for analog, can be easily mixed in. At the top level, one can either use a block router, or by making all the blocks the same height route them as just very large standard cells to make a very large design quickly. Detailed verification can also proceed block by block, with top level layout verification reduced mostly to verifying block connectivity.

## VIII. RESULTS

Texas A&M University (TAMU) heavy ion test results for 180 nm, shown in Figure 10, indicate that integrated design (i.e. using critical node pair separation by cells but not by blocks) at this process node gives roughly the same performance for the dual interlocked and TMR methods, though they have different angle sensitivities.



**Fig. 10** – Dual interlocked & triple voting vs. beam angle 180 nm heavy ion test results

Figure 11 shows the 10-cell library scaled to run on several 65 nm processes with only substitution of the appropriate contact and via cells.



Fig. 11 – 10-Cell library generic 65 nm

The row\_cap and various row crossers and tie\_low/high cells are shown on the upper left. The RHBD CTAG4 mux is in the lower right. It is shown with side spacing, which can be eliminated or increased as desired to control critical node separation in an integrated block route. The core of it is barely larger than the unprotected mux, which is the second cell from the lower left. However, of course, for RHBD one has to use two of them, and possibly an SET filter.

The new Compact Voting Latch design is shown below in Figure 12. It is not a fromscratch layout but uses a mux/latch cell (for 2<sup>nd</sup>

stage), guard gate (for 1st stage), inverter and transconductance gate. With clocking and mux, it is estimated to be about the size of the unprotected mux/latch. No extra spacing and no delays for SET filtering are needed when it is used in a modular block design, depending on the blocks for separation. That means that for modest sized blocks, where most of the routing goes over cells, the layout area for a true TMR design will approach 3x the area of an unprotected single string design, the highest theoretical efficiency.



Fig, 12 – CVL in D-type flip flop layout

From Shuler, et. al. [23] Figure 13 provides a comparison of over-cell integrated routes of a 24 bit cascade counter with an adder and 4 to 1 mux per bit used as a radiation test circuit. With ordinary voting overhead, TMR may take 4 times the area (and presumably 4 times the power) of unprotected logic.



Fig. 13 – Conventional TMR layout comparison

The author attempted to make a comparison of a single string layout of the previous test circuit limited to 12 flip flops, with the same combinational logic, vs. CVL TMR.

The author's router was not capable of a full comparison, since even with over-cell routing only about 60% of the available M3 (horizontal) routing channels are utilized. As shown in Figure 14, the single string route on left cannot be compacted much further due to cell abutment, but the CVL route on right could be compacted much further with a better router.

Both blocks have 16 rows. Hand drawn rectangles mark areas with no horizontal routing in which there could be on the two rightmost segments, in which only M2 and M3 are shown. M2 (vertical and ports) routing illustrates the interlock connection ports. Even with the router issue, a longstanding problem with the old channel router in the author's Tanner Tools, overhead beyond the theoretical minimum for TMR has been reduced from 25% for the conventional approach to approximately 10%.



**Fig. 14** – Single string (L) vs. CVL TMR (R) (only M2, M3 & unused routing space shown)

A 27  $\mu$ m critical node pair separation is obtained at 65 nm, implying a 3  $\mu$ m separation at 7 nm, which could be easily tripled to around 10  $\mu$ m by reducing the block's aspect ratio.

Setting up the 36 voting ports on each side (12 flip flops x 3 each) took only about half an hour, but for complex designs a script or macro to do this would be advisable. Other than the ports issue, the design of the CVL block is identical to a single string block, i.e. simpler than a traditional TMR or dual-interlocked block. In fact, if the designer wished to separate blocks to obtain multi-core performance for non-critical tasks, this would be possible with the method described in the author's previous work on reconfigurable SEU tolerance. [Ibid. 23]

#### IX. CONCLUSIONS

The porting and scaling techniques we have discussed would allow designs to be ported to many processes with little change, and the automatic interlock routing combined with the Compact Voting Latch make modular redundancy more efficient and easier than

conventional TMR methods. Modular block designs provide an order of magnitude greater node pair separation. They can probably be used, with only appropriate library substitution, through or below the existing 7 nm processes over perhaps 20 years, preserving design investment.

#### REFERENCES

- [1] Dodd, P.E.; Shaneyfelt, M.R.; Schwank, J.R.; Felix, J.A., "Current and Future Challenges in Radiation Effects on CMOS Electronics," Nuclear Science, IEEE Transactions on , vol.57, no.4, pp.1747,1763, Aug. 2010
- [2] Glorieux, M.; Clerc, S.; Gasiot, G.; Autran, J.-L.; Roche, P., "New D-Flip-Flop Design in 65 nm CMOS for Improved SEU and Low Power Overhead at System Level," Nuclear Science, IEEE Transactions on , vol.60, no.6, pp.4381,4386, Dec. 2013
- [3] Sheng Lin; Yong-Bin Kim; Lombardi, Fabrizio, "Analysis and Design of Nanoscale CMOS Storage Elements for Single-Event Hardening With Multiple-Node Upset," Device and Materials Reliability, IEEE Transactions on, vol.12, no.1, pp.68,77, March 2012
- [4] Zhengfeng Huang. 2014. A high performance SEU-tolerant latch for nanoscale CMOS technology. In Proceedings of the conference on Design, Automation & Test in Europe (DATE '14). European Design and Automation Association, 3001 Leuven, Belgium, Belgium, , Article 162, 5 pages.
- [5] Narasimham, B.; Chandrasekharan, K.; Wang, J.K.; Djaja, G.; Gaspard, N.J.; Mahatme, N.N.; Assis, T.R.; Bhuva, B.L., "High-speed pulsed-hysteresis-latch design for improved SER performance in 20 nm bulk CMOS process," Reliability Physics Symposium, 2014 IEEE International, vol., no., pp.5F.4.1,5F.4.5, 1-5 June 2014
- [6] D'Alessio, M.; Ottavi, M.; Lombardi, F., "Design of a Nanometric CMOS Memory Cell for Hardening to a Single Event With a Multiple-Node Upset," Device and Materials Reliability, IEEE Transactions on , vol.14, no.1, pp.127,132, March 2014
- [7] P. Eaton , J. Benedetto , D. Mavis , K. Avery , M. Sibley , M. Gadlage and T. Turflinger "Single event transient pulsewidth measurements using a variable temporal latch technique", IEEE Trans. Nuc.Sci., vol. 51, no. 6, pp.3365-3368 2004
- [8] Shuler, R.L.; Bhuva, B.L.; O'Neill, P.M.; Gambles, J.W.; Rezgui, S., "Comparison of Dual-Rail and TMR Logic Cost Effectiveness and Suitability for FPGAs With Reconfigurable SEU Tolerance," Nuclear Science, IEEE Transactions on , vol.56, no.1, pp.214,219, Feb. 2009
- [9] O. A. Amusan, L. W. Massengill, M. P. Baze, A. L. Sternberg, A. F. Witulski, B. L. Bhuva, and J. D. Black, "Single event upsets in deep-submicrometer technologies due to charge sharing," IEEE Trans. Device Mater. Rel., vol. 8, no. 3, pp. 582-589, Sep. 2008.
- [10] Ahlbin, J.R.; Gadlage, M.J.; Ball, D.R.; Witulski, A.W.; Bhuva, B.L.; Reed, R.A.; Vizkelethy, G.; Massengill, L.W., "The Effect of Layout Topology on Single-Event Transient Pulse Quenching in a 65 nm Bulk CMOS Process," Nuclear Science, IEEE Transactions on , vol.57, no.6, pp.3380,3385, Dec. 2010
- [11] Shuler, R.L.; Balasubramanian, A.; Narasimham, B.; Bhuva, B.L.; Neill, P.M.O.; Kouba, C., "The Effectiveness

- of TAG or Guard-Gates in SET Suppression Using Delay and Dual-Rail Configurations at 0.35  $\mu$  m," Nuclear Science, IEEE Transactions on , vol.53, no.6, pp.3428,3431, Dec. 2006
- [12] M. Haghi and J. Draper, "The 90 nm Double-DICE storage element to reduce Single-Event upsets," IEEE Int. Midwest Symp. on Circuits and Syst., Cancun, Mexico, 2009, pp. 463-466.
- [13] Cabanas-Holmen, M.; Cannon, E.H.; Rabaa, S.; Amort, T.; Ballast, J.; Carson, M.; Lam, D.; Brees, R., "Robust SEU Mitigation of 32 nm Dual Redundant Flip-Flops Through Interleaving and Sensitive Node-Pair Spacing," Nuclear Science, IEEE Transactions on , vol.60, no.6, pp.4374,4380, Dec. 2013
- [14] Shambhulingaiah, S.; Chellappa, S.; Kumar, S.; Clark, L.T., "Methodology to optimize critical node separation in hardened flip-flops," Quality Electronic Design (ISQED), 2014 15th International Symposium on , vol., no., pp.486,493, 3-5 March 2014
- [15] Jagannathan, S.; Loveless, T.D.; Bhuva, B.L.; Wen, S.-J.; Wong, R.; Sachdev, M.; Rennie, D.; Massengill, L.W., "Single-Event Tolerant Flip-Flop Design in 40-nm Bulk CMOS Technology," Nuclear Science, IEEE Transactions on, vol.58, no.6, pp.3033,3037, Dec. 2011
- [16] Cameron, E.; Miles, L.; Whitaker, S.; Maki, G.; Shreve, M., "Heavy ion test results of RHBD standard cells and memory in a 110nm bulk CMOS process," Aerospace Conference, 2014 IEEE, vol., no., pp.1,7, 1-8 March 2014
- [17] Tianwen Li, his mentor Haigang Yang Professor and other 3 research personnel published the paper of "A CMOS Triple Inter-Locked Latch for SEU Insensitivity Design" (Vol.61, No.6, December, 2014) in the Top SCI Journal IEEE Transactions on top Nuclear Science
- [18] Shiyanovskii, Y.; Rajendran, A.; Papachristou, C., "A novel radiation tolerant SRAM design based on synergetic functional component separation for nanoscale CMOS," On-Line Testing Symposium (IOLTS), 2011 IEEE 17th International, vol., no., pp.139,144, 13-15 July 2011
- [19] Amusan, O.A.; Massengill, L.W.; Baze, M.P.; Bhuva, B.L.; Witulski, A.F.; DasGupta, Sandeepan; Sternberg, A.L.; Fleming, P.R.; Heath, C.C.; Alles, M.L., "Directional Sensitivity of Single Event Upsets in 90 nm CMOS Due to Charge Sharing," Nuclear Science, IEEE Transactions on, vol.54, no.6, pp.2584,2589, Dec. 2007
- [20] Petrovic, V.; Ilic, M.; Schoof, G.; Stamenkovic, Z., "Design methodology for fault tolerant ASICs," Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2012 IEEE 15th International Symposium on , vol., no., pp.8,11, 18-20 April 2012
- [21] Mei Song Zheng et al., 2015, Applied Mechanics and Materials, 713-715, 1127
- [22] Menouni, M. et al., "Design and measurements of SEU tolerant latches," Proceedings of the Topical Workshop on Electronics for Particle Physics 2008, Naxos 2008, Electronics for particle physics, CERN-2008-008: 402-405 (2008)
- [23] Shuler, R.L.; Bhuva, B.L.; O'Neill, P.M.; Gambles, J.W.; Rezgui, S., "Comparison of Dual-Rail and TMR Logic Cost Effectiveness and Suitability for FPGAs With Reconfigurable SEU Tolerance," *Nuclear Science, IEEE Transactions on*, vol.56, no.1, pp.214,219, Feb. 2009