





FP7-ICT-2013-11-619871

## **BASTION**

Board and SoC Test Instrumentation for Ageing and No Failure Found

Instrument: Collaborative Project

Thematic Priority: Information and Communication Technologies

# Report on the NFF and ageing fault study (Deliverable D1.1)

Due date of intermediate deliverable: September 30, 2014 (M9)

Due date of final deliverable: June 30, 2015 (M18)

Actual submission date: June 30, 2015

Start date of project: January 1, 2014 Duration: Three years

Organization name of lead contractor for this deliverable: University of Twente

Revision v2.4 30<sup>th</sup> June 2015

| Project co-funded by the European Commission within the Seventh Framework Programme (2014-2016) |                                                                                       |   |  |  |  |
|-------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------|---|--|--|--|
|                                                                                                 | Dissemination Level                                                                   |   |  |  |  |
| PU                                                                                              | Public                                                                                | × |  |  |  |
| PP                                                                                              | Restricted to other program participants (including the Commission Services)          |   |  |  |  |
| RE                                                                                              | Restricted to a group specified by the consortium (including the Commission Services) |   |  |  |  |
| CO                                                                                              | Confidential, only for members of the consortium (including the Commission Services)  |   |  |  |  |

#### **Notices**

For more information, please contact Dr.ir. H.G. Kerkhoff, e-mail: <a href="mailto:h.g.kerkhoff@utwente.nl">h.g.kerkhoff@utwente.nl</a>

This document is intended to fulfill the contractual obligations of the BASTION project concerning deliverable D1.1 described in contract 619871.

© Copyright BASTION 2015. All rights reserved.

# **Table of Revisions**

| Version | Date                              | Description and reason                                                                                                                | Author                                    | Affected sections                             |
|---------|-----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|-----------------------------------------------|
| 0.1     | June 3, 2014                      | Initial document created                                                                                                              | H.G. Kerkhoff<br>UT                       | Contents                                      |
| 0.2     | June 16 <sup>th</sup> , 2014      | NFF part created                                                                                                                      | H. Ebrahimi<br>UT                         | First part                                    |
| 0.3     | June 19 <sup>th</sup> , 2014      | Aging part created                                                                                                                    | H. Ebrahimi &<br>H.G. Kerkhoff<br>UT      | Second part                                   |
| 0.4     | June 20 <sup>th</sup> , 2014      | Board level NFF and BASTION survey                                                                                                    | Christophe Lotz<br>ASTER                  | Third and fourth parts                        |
| 0.5     | July 12 <sup>th</sup> , 2014      | State-of-the-art on<br>Fault Models                                                                                                   | Erik Larsson,<br>Dimitar Nikolov<br>ULUND | Third part                                    |
| 0.5     | September 9 <sup>th</sup> , 2014  | Fusion of all contributions                                                                                                           | Christophe Lotz<br>ASTER                  | All sections                                  |
| 0.6     | September 11 <sup>th</sup> , 2014 | Integration                                                                                                                           | H. Ebrahimi &<br>H.G. Kerkhoff<br>UT      | All                                           |
| 0.7     | September 24 <sup>th</sup> , 2014 | Major updates on the topics of board-level NFF and the timing fault hypothesis                                                        | Artur Jutman, TL                          | Introduction;<br>Section 3;<br>Section 4.     |
| 0.8     | September 25 <sup>th</sup> , 2014 | Hierarchical NBTI analysis                                                                                                            | Jaan Raik<br>TUT                          | Sections 1, 2                                 |
| 0.9     | September 30 <sup>th</sup> , 2014 | Added ASTER results & Integration                                                                                                     | H. Ebrahimi &<br>H.G. Kerkhoff,<br>UT     | All                                           |
| 1.0     | October 3 <sup>rd</sup> , 2014    | Final check                                                                                                                           | H.G. Kerkhoff,<br>UT                      | All                                           |
| 1.1     | October 7 <sup>th</sup> , 2014    | Minor corrections                                                                                                                     | Artur Jutman, TL                          | Introduction;<br>Section 4;<br>Conclusions    |
| 2.1     | June 9 <sup>th</sup> , 2015       | Updated aging simulation, minor corrections                                                                                           | Jaan Raik, TUT                            | Section 2, all sections                       |
| 2.2     | June 19 <sup>th</sup> , 2015      | Updates on Aging & NFF parts                                                                                                          | H. Ebrahimi,<br>UT                        | Section 2, 3                                  |
| 2.3     | June 26 <sup>th</sup> , 2015      | Update on board NFF                                                                                                                   | Christophe Lotz<br>ASTER                  | Section 4                                     |
| 2.4     | June 30 <sup>th</sup> , 2015      | Update of the Executive Summary, Introduction, Conclusions. List of Abbreviations, review of the final document and minor corrections | Artur Jutman, TL                          | Mainly the introductory parts and Conclusions |

# **Authors, Beneficiaries**

Christophe Lotz, ASTER
Erik Larsson, Lund University
Dimitar Nikolov, Lund University
Jaan Raik, Tallina Tehnikaulikool
Hans G. Kerkhoff, University of Twente
Hassan Ebrahimi, University of Twente
Artur Jutman, Testonica Lab

## **Executive Summary**

The document presents BASTION research results in the area of aging mechanisms at the IC level (task T1.2), and No Fault Found (NFF) at both IC and board level (task T1.1) and concludes activities in related tasks. First, we describe achieved results in aging fault study including the link between low-level measurements and data usage at higher abstraction levels. The second part of the document concentrates on the NFF phenomenon both at IC and board level. Partners' contributions with regard to the experimental analysis and industrial study of NFF faults as well as on overview of prior work are presented.

Intermittent faults have been identified to be an important contributor to IC-level NFFs. Hence a novel simulation model has been developed for this type of NFF with the target of mitigating the issue. In Section 4, a study that demonstrates a gap in board-level state-of-the-art fault coverage metrics, especially in the domain of timing faults, is presented. This hypothesis has been confirmed among other results by an extensive industrial study based on both a questionnaire and actual traceability data extraction and analysis (by QuadDPMO).

The mentioned questionnaire developed by ASTER and BASTION partners is included in the Appendix. Finally, conclusions regarding our research on aging faults and NFFs are also provided.

# **List of Abbreviations**

| ADC              | - Analog-to-Digital Converter                        |  |  |  |
|------------------|------------------------------------------------------|--|--|--|
| AOI              | - Automated Optical Inspection                       |  |  |  |
| AXI              | - Automated X-ray Inspection                         |  |  |  |
| BERT             | - Bit Error Rate Testing                             |  |  |  |
| BST              | - Boundary-Scan Test                                 |  |  |  |
| BTI              | - Bias Temperature Instability                       |  |  |  |
| CHE              | - Channel Hot Electron                               |  |  |  |
| CISC             | - Complex Instruction Set Computing                  |  |  |  |
| CM               | - Contract manufacturer                              |  |  |  |
| CMOS             | - Complementary Metal-Oxide Semiconductor            |  |  |  |
| CRC              | - Cyclic Redundancy Check                            |  |  |  |
| DAC              | - Digital-to-Analog Converter                        |  |  |  |
| DAHC             | - Drain Avalanche Hot Carrier                        |  |  |  |
| dB               | - Decibel                                            |  |  |  |
| DPM              | - Defects Per Million                                |  |  |  |
| DPMO             | - Defects Per Million Opportunities                  |  |  |  |
| EI               | - Embedded Instrument                                |  |  |  |
| EM               | - Electro Migration                                  |  |  |  |
| EMS              | - Enhanced Manufacturing Services                    |  |  |  |
| ESD              | - Electro Static Discharge                           |  |  |  |
| FET              | - Field Effect Transistor                            |  |  |  |
| FinFET           | - Fin-based, multigate FET                           |  |  |  |
| FMEA             | - Failure Mode and Effect Analysis                   |  |  |  |
| FPGA             | - Field Programmable Gate Array                      |  |  |  |
| FP7              | - European Union's 7 <sup>th</sup> Framework Program |  |  |  |
| FPT              | - Flying Probe Test                                  |  |  |  |
| FPY              | - First Pass Yield                                   |  |  |  |
| FT               | - Functional Test                                    |  |  |  |
| HALT/HASS        | - Highly Accelerated Life Test / Stress Screen       |  |  |  |
| HCI              | - Hot Carrier Injection                              |  |  |  |
| IC               | - Integrated Circuit                                 |  |  |  |
| ICT              | - In-Circuit Test                                    |  |  |  |
| I <sub>ddt</sub> | - Transient power-current                            |  |  |  |
| $I_{ddq}$        | - Quiescent power current                            |  |  |  |
| INEMI            | - International Electronic Manufacturing Initiative  |  |  |  |

| IPC                                                                                                | - Institute for Printed Circuits                                 |  |
|----------------------------------------------------------------------------------------------------|------------------------------------------------------------------|--|
| IRF                                                                                                | - Intermittent fault                                             |  |
| IST                                                                                                | - Information Society Technologies                               |  |
| IVF                                                                                                | - Intermittent Vulnerability Factor                              |  |
| JEDEC                                                                                              | - Joint Electron Device Engineering Council                      |  |
| JTAG                                                                                               | - Joint Test Action Group                                        |  |
| MOSFET                                                                                             | - Metal Oxide Semiconductor Field-Effect Transistor              |  |
| MPS                                                                                                | - (Coverage metrics based on) Material, Placement & Solder       |  |
| MTTF                                                                                               | - Mean Time To Failure                                           |  |
| NBTI                                                                                               | - Negative Bias Temperature Instability                          |  |
| NDF                                                                                                | - No Defect Found                                                |  |
| NFF                                                                                                | - No Fault Found, No Failure Found                               |  |
| NMOS                                                                                               | - n-type Metal Oxide Semiconductor                               |  |
| NMOSFET                                                                                            | - n-type MOSFET                                                  |  |
| NTF                                                                                                | - No Trouble Found                                               |  |
| OBD                                                                                                | - Oxide Breakdown                                                |  |
| PBTI                                                                                               | - Positive Bias Temperature Instability                          |  |
| PCOLA/SOQ - (Coverage metrics based on) Presence, Correct, Orienta Alignment/Short, Open & Quality |                                                                  |  |
| PCBA                                                                                               | - Printed Circuit Board Assembly                                 |  |
| PMOS                                                                                               | - p-type Metal Oxide Semiconductor                               |  |
| PMOSFET                                                                                            | - n-type MOSFET                                                  |  |
| PPM                                                                                                | - Part Per Million                                               |  |
| PPVS                                                                                               | - (Coverage metrics based on) Presence, Polarity, Value & Solder |  |
| PVT                                                                                                | - Process, Voltage, Temperature                                  |  |
| RISC                                                                                               | - Reduced Instruction Set Computing                              |  |
| RO                                                                                                 | - Ring Oscillator                                                |  |
| SBD                                                                                                | - Soft Breakdown                                                 |  |
| SoC                                                                                                | - System on Chip                                                 |  |
| SGHE                                                                                               | - Secondary Generated Hot Electron                               |  |
| TDDB                                                                                               | - Time Dependent Dielectric Breakdown                            |  |
| TSSOP                                                                                              | - Thin Shrink Small Outline Package                              |  |
| TSV                                                                                                | - Through Silicon Via                                            |  |
| URL                                                                                                | - Uniform Resource Locator                                       |  |
| UUT                                                                                                | - Unit Under Test                                                |  |
| VFIT                                                                                               | - VHDL-Based Fault Injection Tool                                |  |
| VHDL                                                                                               | - VHSIC Hardware Description Language                            |  |
| VHSIC                                                                                              | - Very High Speed Integrated Circuit                             |  |

# **Table of Contents**

| T | able o                                       | f Rev                          | visions                                            | iii |
|---|----------------------------------------------|--------------------------------|----------------------------------------------------|-----|
| A | uthors                                       | , Be                           | neficiaries                                        | iv  |
| L | ist of A                                     | Abbr                           | eviations                                          | vi  |
| T | able o                                       | f Co                           | ntents                                             | iv  |
| 1 | Int                                          | rodu                           | ction                                              | 1   |
|   | 1.1                                          | The                            | e structure of the report                          | 2   |
| 2 | Ag                                           | ing 1                          | faults                                             | 3   |
|   | 2.1                                          | 2.1 Introduction               |                                                    |     |
|   | 2.2                                          | Cla                            | sses of aging faults                               | 3   |
|   | 2.2                                          | .1                             | Electro-migration                                  | 3   |
|   | 2.2                                          | 2.2                            | NBTI-based aging faults                            | 4   |
|   | 2.2                                          | 2.3                            | HCI-based aging faults                             | 4   |
|   | 2.2                                          | .4                             | Time-Dependent Dielectric Breakdown (TDDB) aging   | 5   |
|   | 2.3                                          | Sta                            | te-of-the-art in aging faults                      | 5   |
|   | 2.4                                          | BA                             | STION contributions to aging fault study           | 6   |
|   | 2.4.1 Aging Measurement Embedded Instruments |                                | 6                                                  |     |
|   | 2.4.2                                        |                                | Using On-chip health monitors in SoCs              | 8   |
|   | 2.4                                          | 3                              | Hierarchical identification of NBTI-critical paths | 10  |
|   | 2.4                                          | .4                             | NBTI-critical path delay calculation               | 13  |
|   | 2.5                                          | Co                             | nclusions on aging faults                          | 14  |
| 3 | NF                                           | F fa                           | ults at the IC level                               | 15  |
|   | 3.1                                          | IC-                            | level vs. board-level NFFs                         | 15  |
|   | 3.2                                          | Int                            | Introduction to IC-level NFF                       |     |
|   | 3.3                                          | Classes of IC-level NFF faults |                                                    | 16  |
|   | 3.3                                          | .1                             | Intermittent resistive faults                      | 16  |
|   | 3.3                                          | .2                             | Intermittent stuck-at faults                       | 17  |
|   | 3.3                                          | .3                             | Intermittent short faults                          | 17  |
|   | 3.3                                          | .4                             | Intermittent open faults                           | 17  |
|   | 3.3                                          |                                | Intermittent timing faults                         |     |
|   | 3.4 State-of-the-art in IC-level N           |                                | te-of-the-art in IC-level NFF faults               | 18  |
|   | 3.5                                          | BA                             | STION contributions to NFF fault study             | 18  |
|   | 3.5                                          | 5.1                            | Intermittent fault effects on analogue CMOS        | 20  |
|   | 3.5                                          | 5.2                            | Intermittent fault effects on digital CMOS         | 21  |
|   | 3.6                                          |                                | nclusions on NFF (intermittent resistive) faults   |     |
| 4 | NF                                           | F fa                           | ults at the <i>board</i> level                     | 25  |
|   | <i>1</i> 1                                   | Int                            | roduction                                          | 25  |

| 4.2 St   | ate of the art                                         | 26 |
|----------|--------------------------------------------------------|----|
| 4.2.1    | Defects, fault model and test coverage                 | 26 |
| 4.2.2    | Yield estimation: combined defect and test coverage    | 27 |
| 4.3 A    | nalysis of gaps in test coverage                       | 28 |
| 4.3.1    | Escape rate                                            | 29 |
| 4.3.2    | New classes of board-level faults                      | 29 |
| 4.3.3    | NFF faults due to Board-level test coverage weaknesses | 32 |
| 4.4 B.   | ASTION survey                                          | 37 |
| 4.4.1    | Industrial partners                                    | 37 |
| 4.4.2    | What is NFF?                                           | 38 |
| 4.4.3    | Is NFF important?                                      | 39 |
| 4.4.4    | What is the reason of NFF?                             | 39 |
| 4.4.5    | Test Line                                              | 41 |
| 4.4.6    | The faults you know                                    | 42 |
| 4.5 To   | ools for DPMO analysis and root-cause analysis         | 42 |
| 4.5.1    | Data collection                                        | 42 |
| 4.5.2    | Traceability & repair data                             | 43 |
| 4.5.3    | QuadDPMO: True Defect Opportunities                    | 44 |
| 4.5.4    | DPMO extraction                                        | 45 |
| 4.5.5    | Data preparation                                       | 45 |
| 4.5.6    | Intermittent faults                                    | 46 |
| 4.5.7    | Insufficient coverage                                  | 46 |
| 4.5.8    | Missing test method: Ageing                            | 49 |
| 4.5.9    | Lack of communication between design and test          | 50 |
| 4.6 Co   | onclusions on board-level NFF faults                   | 52 |
| 5 Concl  | usions                                                 | 53 |
| 6 Refere | ences                                                  | 55 |
| ADDENIDI | V I                                                    | 1  |

#### 1 Introduction

The document starts with an overview of the literature at large; also briefly explaining different classes and mechanisms of aging and NFF faults, and then the state-of-the-art in the respective fields is treated. Subsequently, the BASTION investigation results by cooperation of partners about aging faults and NFF are provided.

In terms of the aging study, we are working in two directions. First, designing health monitors to observe the degradation of digital circuits. The second direction is modelling aging faults accurately, in order to enable analysis of aging effects in large digital circuits.

Health monitoring can provide advance warning of malfunction, and help to prevent catastrophic failures. Using on-chip health monitors, one can enhance long-term dependability and extend useful lifetime of MP-SoCs. The proposed approach addresses the design of aging monitors, and enables a new generation of very high dependability many-processors SoCs in safety-critical applications. A process monitor, like an NBTI monitor has been shown to have correlation with propagation delay in an aged processor. Moreover, in order to detect the performance degradation of digital CMOS circuits due to aging effects, an embedded instrument has been designed. This instrument can extract the changes in threshold-voltages which are induced by aging effects. It is planned to also look at other monitors, like HCI, and  $I_{\rm ddt}$  and  $I_{\rm ddq}$  and also ring-oscillator (RO) monitors for being able to observe the degradation of digital CMOS circuits. The research on monitors is an important contribution to be used as a basis in WP3 when developing in-field on-line error detection and monitoring solutions.

To enable analysis of aging effects in large circuits, we have introduced a method of hierarchical modeling of dynamic NBTI aging. Preliminary experimental results show high scalability as well as good match with SPICE simulation results. Work on the evaluation of the method continues along with different kind of aging monitors.

With respect to the *NFF study*, we've been working on the hypothesis that NFF faults can be categorized in the following three categories:

- 1. NFF due to intermittent faults, both at the board level and inside ICs;
- 2. NFF due to insufficient traceability data analysis for process tuning;
- 3. NFF at the board level due to insufficient/unknown test coverage, especially in the domain of speed-related or timing faults (AC domain).

The three categories correspond to different research directions.

The first direction (intermittent faults) led by UT has been also supported by several academic and industrial studies over the last decade. Intermittent faults are extremely difficult to test due to lack of determinism in creating fault activation/manifestation conditions. This fact creates a fruitful soil for test escapes and subsequent NTF problem. In BASTION, a novel simulation model has been developed for this type of NFF with the target of mitigating the issue.

The second direction (data traceability) was identified by ASTER based on their long-term experience working with large multinational corporations having complex electronics manufacturing processes. DPMO is the key parameter that needs to be carefully calculated in addition to knowing test coverage under different coverage categories. Good control over combined test coverage and DPMO promises reduction

of the amount of NFF cases. The second research direction yielded rich results as an outcome of the industrial study organized by ASTER. The study was twofold, 1) based on a survey; 2) by automatic extraction and analysis of actual industrial traceability data using a new QuadDPMO tool. The experimental software tool QuadDPMO has been specifically developed by ASTER. The tool facilitates data stratification, classification and unification in a single database in order to simplify analysis of NFF cases and root-cause analysis. The data collection methods are implemented according to existing IPC standards.

The third direction (insufficient/unknown test coverage) initiated by TL and ASTER is based on the analysis of the cumulative fault coverage achieved by the combination of state-of-the-art board-level test techniques. The weakest point is the coverage of timing faults or performance-related issues, e.g. delay faults, crosstalk, signal integrity, interconnect quality, communication link integrity, etc. State-of-the-art test methods for these fault classes (like BERT or at-speed test) do not provide proof of good or sufficient fault coverage. We expect to improve test coverage metrics based on this research, as well as provide techniques to achieve higher fault coverage (to be reported in D1.3 by M30).

The survey questionnaire used for industrial study is attached as Appendix. It has been developed by ASTER and partners based on initial industrial experience with the goal to receive broader industrial opinion on our focus points.

## 1.1 The structure of the report

The report starts with the analysis of aging and NFF faults at the *IC-level*. After a study on the state-of-the-art, the BASTION investigation results by cooperation of partners about aging faults are presented.

Then, the NFF phenomenon is studied at the IC-level as well as board-level and BASTION partners' contributions with regard to NFF faults are presented.

In its second part, a study of NFF faults at *board-level* is presented including the analysis of weaknesses of state-of-the-art test coverage metrics and best industrial practices. A board-level test coverage gap with respect of timing faults is demonstrated. Afterwards, we describe the results of performed industrial study based on a survey and automated traceability data analysis by QuadDPMO software,

A reference list is provided with papers on aging and NFF. Subsequently, the report is summarized with main initial conclusions.

The Appendix shows the set-up of a BASTION questionnaire which has been distributed to main industrial contacts to enable a better understanding on the industrial view on aging and NFF.

## 2 Aging faults

In this section, the most important classes of aging faults, that include electromigration (EM), hot carrier injection (HCI), time dependent dielectric breakdown (TDDB), and bias temperature instability (BTI), as well as state-of-the-art publications about aging faults are introduced. Then, the first investigations in aging faults from partners are presented. It is concluded with some initial conclusions.

#### 2.1 Introduction

As a result of non-constant aggressive scaling of technology in terms of device dimensions, increasing electric fields and the use of new materials to meet the demands set by these technologies, the reliance on electronic systems fabricated in these technology nodes has become a very important aspect. There are various degradation mechanisms that can worsen the performance of devices, circuits and their associated electronic systems as a result of this aggressive technology scaling. Although the research is mostly focused on CMOS, also FinFETS suffer from aging effects, like NBTI.

## 2.2 Classes of aging faults

The aging faults can be classified according to the main cause into the following categories: electro-migration, hot carrier injection, time dependent dielectric breakdown, and bias temperature instability. There are more aging phenomena, but the ones listed are the most prominent ones in CMOS related technologies.

## 2.2.1 Electro-migration

Electro-migration (EM) is the dominant failure mode of interconnects that results from aggressive interconnect scaling. As the technology is scaling, the device density is increasing and as a result interconnects that carry signals are consequently reduced in size, specifically, in height and cross section.

This leads to extremely high current densities, in the order of at least 10<sup>6</sup> A/cm<sup>2</sup> and associated thermal effects, which can cause reliability problems [1]. At these current densities, momentum transfer between electrons and metal atoms becomes important. The transfer, which is called the *electron-wind force*, results in a mass transport along the direction of electron movement.

Once the metal atoms are activated by the electron wind, they are subject to the electric fields that drive the current. Since the metal atoms are positively ionized, the electric field moves them against the electron wind once they have been activated. The interplay of these two phenomena determines the direction of net mass transfer. This mass transfer manifests itself in the movement of vacancies and interstitials. The vacancies coalesce into voids or micro-cracks, and interstitials become hillocks. The voids, in turn, decrease the cross-sectional area of the circuit metallization and increase the local resistance and current density at that point in the metallization. Both the increase in local current density and temperature, increase EM effects. This

positive feedback cycle can eventually lead to thermal runaway and catastrophic failure which will eventually degrade the system dependability.

#### 2.2.2 NBTI-based aging faults

The bias temperature instability (BTI) is a degradation mechanism that occurs in MOS devices as a result of interface traps between the gate oxide and silicon substrate at elevated temperatures (30 to  $200^{\circ}$ C) [2] and hence degrade the dependability of associated electronic devices. This degradation mechanism results in device threshold voltage (V<sub>th</sub>) shift and loss of drive current (I<sub>on</sub>). The BTI effect is more severe for PMOSFETs than NMOSFETs due to the presence of holes in the PMOS inversion layer that are known to interact with the oxide states.

The highest impact of BTI in PMOSFETs is observed when stressed with high negative gate voltage at elevated temperatures [3]. It is referred to as *negative* BTI (NBTI) due to the negative gate-to-source voltage. In PMOSFETs, the channel holes interact with the passivated hydrogen bonds in the dielectric resulting into generation of traps and interface states. This results into an increase in absolute threshold voltage (V<sub>th</sub>) value and the effect increases at high temperatures. The introduction of new dielectric materials like high-k has enabled the BTI effect in NMOSFETs and is referred to as *positive bias temperature instability* (PBTI) due to the positive gate-to-source voltage.

It has been noticed that BTI degradation starts relaxing very quickly after the removal of the stress. This recovery process is caused by de-trapping of charges during subsequent removal of stress signal after a stress phase [4]. The stress signal causing BTI degradation can be of two types; the static stress (DC Stress) and the dynamic stress (AC Stress). The AC stress is known to be beneficial for lifetime enhancement because it can introduce the recovery process mentioned above [5]. Recovery after NBTI or PBTI stress in MOSFETs and its dependence on gate voltage, temperature and frequency of stress signal has been a hot topic of research in the past decade [6].

Currently, BTI is one of the most serious and important reliability concerns for both digital and analog circuits. At advanced technology nodes this effect is enhanced due to reduced voltage headroom, high oxide electric fields resulting from non-constant field scaling, high temperatures due to higher power dissipation and introduction of new dielectric material.

## 2.2.3 HCI-based aging faults

The hot carrier injection (HCI) degradation mechanism has been an important failure mechanism for the last three decades and still remains important in new technologies. According to Takeda [7], there are three main types of hot carrier injection modes:

- o Channel Hot Electron (CHE) injection.
- o Drain Avalanche Hot Carrier (DAHC) injection.
- o Secondary Generated Hot Electron (SGHE) injection.

CHE injection is due to the escape of "lucky" electrons from the channel, causing a significant degradation of the oxide and the Si-SiO<sub>2</sub> interface, especially at low temperatures [8].

On the other hand, DAHC injection results in both electron and hole gate currents due to impact ionization, giving rise to the most severe degradation at room temperature.

SGHE injection is a result of minority carriers from secondary impact ionization or, more likely, bremsstrahlung radiation, and becomes a problem in ultra-small metal oxide semiconductor (MOS) devices. Therefore, HCI will degrade the electrical characteristics of MOSFETs and hence the dependability of the associated electronic systems.

#### 2.2.4 Time-Dependent Dielectric Breakdown (TDDB) aging

The TDDB is a degradation phenomenon of SiO<sub>2</sub>, the thin insulating layer between the control "gate" and the conducting "channel" of the transistor. SiO<sub>2</sub> has a very high band gap (approximately 9 eV) and has excellent scaling and process integration capabilities, which makes it the key factor in the success of MOS technology.

Although SiO<sub>2</sub> has many extraordinary properties, it is not perfect and suffers degradation caused by stress factors, such as a high oxide field. The exact physical mechanism of TDDB is still an open question. The general belief is that TDDB of gate insulating material results from the cumulative effect of insulator trapped charge buildup during short-term and long-term high-field stress. High trapped-charge-induced local fields build up within the insulator creates defects in the volume of the oxide film. These defects accumulate with time and eventually reach a critical density, triggering a sudden loss of dielectric properties [9]. These defects also cause gate leakage and excess noise in MOSFETs. A surge of current produces a large localized rise in temperature, leading to permanent structural damage in the Sio2. One usually refers to *soft-breakdown* (SBD), which is reversible, and *hard-breakdown*, which results in permanent damage. Both will create failures in MOSFETs and hence the dependability of associated electronic systems will degrade.

## 2.3 State-of-the-art in aging faults

In [10], authors build a unified gate sizing algorithm which considers NBTI and OBD (Oxide Break Down) along with the traditional metrics (power and performance) of gate sizing. They have developed a static timing analysis engine and a discrete gate sizer which use an accurate delay model. Their model is embedded in their sizer to perform NBTI aware gate sizing which degrades circuit lifetime due to OBD. They have used a metric for OBD at the circuit level and performed reliability-aware gate sizing by modifying the cost/metric in the sizer algorithm.

In [80], authors proposed a gate replacement technique to mitigate NBTI-induced circuit aging. Their technique identifies the critical gates by considering the impact of the types of input gates on their protectability, guaranteeing all the critical gates can be protected from static NBTI fatigue. The experimental results show a gain of four times improvement on NBTI-induced delay degradation, compared to the techniques that neglect the impact of input gates' type.

Microprocessors at nano scale have been exposed to various reliability issues, which include a more rapid aging of all components. This leads to increasing pipeline stage delays during the operational lifetime, resulting in imbalanced designs in terms of delay and Mean Time To Failure (MTTF), if the delays are balanced at design time. In [11], authors proposed an aging-aware MTTF-balanced pipeline design scheme to replace the traditional delay-balanced paradigm. Using their approach, the imbalance during runtime is minimized, allowing better designs. Their experimental results show that for the so-called FabScalar microprocessor, the MTTF-balanced design yields in a more than 2.3 times longer MTTF, while showing the same performance as for the delay-balanced design can be maintained.

In order to predict a failure due to transistor aging [77], authors proposed a scan-based on-line aging monitoring scheme which observes aging by capturing functional data at different timing within the functional clock period and comparing the captured data. They added an early capture scan chain which captures the functional data earlier than the original scan chain. To decide the early capture timing, they used the so-called guard-band interval. An early capture scan chain was designed to capture functional data at the falling edges of the functional clock. An adjustable duty cycle clock generator was employed to shift the falling edges within a guard-band interval. Since all the aging monitoring operations are performed during system operation, there is no performance impact.

In [81], authors designed a digital sensor IP for in-situ timing slack monitoring on actual circuit paths. Their sensor extracts timing slack information from circuit paths in the post-silicon phase. The timing slack data reveals how much critical or near-critical paths changed because of reliability effects like process variation and aging.

## 2.4 BASTION contributions to aging fault study

Some of BASTION partners' activities on aging are focusing on the design of embedded instruments and monitors. These methods are able to detect the early effects of aging in processors. Some measurement results are already available on aging, and this data can be used in the next phase, where hierarchy is introduced and more complex digital systems are tackled.

### 2.4.1 Aging Measurement Embedded Instruments

In nanometer CMOS technologies, aging effects tend to increase the threshold-voltage of single MOS transistor in time and hence reduce its drain current. The consequence is certain performance reduction for analogue/mixed-signal IPs.

An existing solution for measuring the degraded performance is using an embedded-instrument (EI). These on-chip EIs are supposed to test basic physical parameters like voltages, currents, temperatures, as well as performance parameters of analogue/mixed-signal IPs.

In Bastion, we designed a threshold-voltage measurement EI for MOSFETs. It can measure the threshold-voltage of a MOS transistor periodically and extract the change in threshold-voltage which is caused by aging effects.

The threshold-voltage is one of the most sensitive parameters in MOS transistors in terms of reliability. Many reliability effects, like NBTI/PBTI, HCI, TDDB and CHC, tend to increase the absolute value of the threshold-voltage with stress time. Therefore, measuring the threshold-voltage repeatedly during MOSFET's lifetime is

highly demanded. Traditionally, the threshold-voltage is measured by complicated equipment like probe stations and a semiconductor parameter analyzer. It increases the cost and limits the time period to observe the threshold-voltage shift. The proposed EI provides a way to measure the MOSFET's threshold-voltage using on-chip ADCs and DACs.

The threshold-voltage  $(V_{th0})$  cannot be measured directly. Normally, It is extracted from the drain current measurement at various gate and drain voltages with compact models.  $V_{th0}$  shift by ageing can be measured by the difference in  $V_{th0}$  between the reference transistor and the DUT aged transistor.

Figure 1 shows the proposed EI for measuring the  $V_{th0}$  ageing behavior inside a SoC.



**Figure 1:** The proposed EI for measuring the V<sub>th0</sub> ageing behaviour inside a SoC.

A long-time stress test on 90nm has been carried out. The test chip is put into an oven and is heated up to 127°C. The test chip contains 32 PMOS DUT transistors and supplied by a 1.15V power supply during the stress test.

The change of threshold-voltage,  $V_{th0}$  due to reliability effects is plotted in Figure 2. The figure shows the  $V_{th0}$  by the three methods of the same group DUTs after 1 week (167 hours) stress at 127°C. It can be seen that the proposed EI can characterize  $V_{th0}$  shift with 3mV accuracy.



**Figure 2:** After stress for 167 hours, comparing the measured  $V_{th0}$  change with the EI and another two methods in 90nm PMOS DUT transistors

#### 2.4.2 Using On-chip health monitors in SoCs

The trend of downscaling and increased complexity of digital ICs has enabled the implementation of many-processors in complex SoCs. Especially in the case of homogeneous many-processors SoCs (MP-SoCs), this turns out to be an extremely nice feature in terms of reliability. However, the downside of downscaling and complexity is the increase in variability and decrease in reliability of the components. To counteract this loss, use can be made of multi-processors. Our generic approach for implementing high-dependability SoCs uses on-chip health monitor (HM) tests or measurements on processor cores during their operational life to evaluate their health and subsequent repair of (to be) faulty processors by remapping and rerouting (spare) correct cores using run-time mapping software.

We have enhanced long-term dependability of MP-SoCs being used in high-level security and automotive applications. Our prognostic approach for life-time prediction of cores uses on-chip health monitors (e.g., supply-voltage monitoring) per core in combination with advanced prediction algorithms in software to ensure a high dependability. However, this approach assumes that there is a *close correlation* between the on-chip health monitor measurements and key core specification parameters, like for example the maximum operating clock speed or the dynamic power current, as function of time (aging).

In other words, the final goal is that only the on-chip set of HMs will accurately predict, together with embedded life-time prediction software, when cores are expected to fail in time. In this advanced way, a timely replacement can be made. It is much more efficient than an automatic scheduled repair action determined at design time.

For the following example, the target Xentium processor core for the BASTION aging fault study has been used for the measurement results. This target processor core is an (embedded) Xentium<sup>TM</sup> reconfigurable DSP core, implemented in 90nm CMOS technology which has been designed and implemented by Recore Systems.

An example of measurement results in terms of cell propagation delay is shown in Figure 3, resulting from NBTI ( $V_{th}$  shift) aging for an Inverter cell. Figure 3 shows the propagation delay increases around 5.7% after a stress time of only 16 minutes under 125°C. The applied stress voltage is also shown.



**Figure 3:** Example of the increased propagation delay of an inverter versus stress based on NBTI Vth shift *measurements*. A pulse-wave stress signal has been used.

The expected increased propagation delay times of a Xentium versus stress time is shown in Figure 4. Stress temperature, stress time, and voltage stress profile are indicated.



**Figure 4:** The increased delay (decrease in operating frequency) of the most critical path in the Xentium processor core obtained from *simulation*. This result will cause speed failure after some time (reduced dependability).

A basic requirement is that in our case the health monitoring data should sufficiently correlate with the key specifications of importance with respect to dependability/reliability. The previous NBTI delay-related data (Figure 3) is linked to the Xentium delay data (Figure 4), both under the same stress regime of voltage, temperature and stress time. This is depicted in Figure 5.



**Figure 5:** Delay obtained via NBTI health monitor (measurements) and delay (simulated) of the Xentium processor core under the same aging regime. Data points are in relation with the stress times (0-1000s).

Horizontal or vertical data lines in the graph would show that there is no correlation at all between the two; anything in between will indicate some degree of correlation. How much correlation is required is often an object of discussion; often it ranges between 85% up to 99%.

As it can be seen from Figure 5, this correlation exists, and hence the concepts used in alternate testing, like deriving mapping functions, can be applied in principle. It can

be further improved if multiple health sensors are incorporated which are showing correlations with the Xentium delay. The same principle also holds for other key parameters.

The advantage of our reliability measurement program approach is that beside the information from the health monitors and Xentium parameters, also actual reliability measurements/calculations can be used to calibrate the life-time via really occurred failures. This is rather unique, and should significantly improve the life-time prediction accuracy.

#### 2.4.3 Hierarchical identification of NBTI-critical paths

The previous developments in (low-level) NBTI measurements from the University of Twente can be subsequently used by the work on hierarchy of aging faults of the Tallinn University of Technology. In order to enable analysis of aging effects in large circuits, we have introduced a concept of hierarchical modeling of dynamic NBTI aging. To allow calculation of NBTI-induced gate delay degradation, for each signal  $x_i$  in a net list, signal probabilities  $Pz(x_i)$  (i.e. the probability of signal being 0 over a functional test set) are calculated by gate-level logic simulation and the numbers of gate fanouts  $F(x_i)$  are derived by structural analysis at the gate netlist, respectively. These parameters together with the expected operation time  $\Phi$  in years are applied as an input to the NBTI-aware gate delay degradation analysis.

Figure 6 presents the proposed hierarchical NBTI-critical path analysis flow, which takes place as follows. As a preprocessing step, complex gates are flattened into NAND, NOR and inverter gates: e.g., an AND gate will be represented by a NAND gate followed by an inverter gate. Then, gate-level simulation calculating the signal probabilities  $Pz(x_i)$  for all inputs  $x_i$  of the stages is performed which is followed by structural analysis providing the number of fan outs  $F(x_i)$  for the stages that are driven by signals  $x_i$ .



Figure 6: NBTI-induced gate delay degradation analysis flow.

These parameters together with the circuit lifetime  $\Phi$  (in years) are included to the NBTI-induced gate delay degradation analysis. After calculating the individual NBTI-degraded gate delays, static timing analysis in order to identify path delays taking into account the effects of NBTI are obtained.

The nominal delays  $d(G_k)$  in the logic gates  $G_k$  are taken from the technology library. Calculation of the NBTI-induced delay degradation  $\tau(G_k)$  is technology dependent. In BASTION, we applied the data for 65 nm technology from [76] as follows. Delay  $t_i$  was calculated for each input signal  $x_i$  separately. The voltage threshold shift  $\Delta V_{th}(x_i)$  was calculated as follows:

$$\Delta V_{th}(x_i) = (\alpha \cdot P_z(x_i))^{\beta}, \tag{1}$$

Where  $P_z(x_i)$  is the signal probability for input signal  $x_i$  and  $\alpha$  and  $\beta$  are technology dependent constants that were set to generate a graph that matches the curves obtained by NBTI-aging analysis in [76] (See Figure 7). (In our experiments  $\beta$  is set to 0.18868 and  $\alpha$  is set to  $15 \cdot 10^{-7}$  for 1 year of aging and to  $115 \cdot 10^{-7}$  for 10 years of aging, respectively). In order to calculate the  $\Delta V_{THp}(x_i)$  values for the static case where  $P_z(x_i) = 1$ , we assigned  $\Delta V_{THp} = 0.18V$  for 1 year and  $\Delta V_{THp} = 0.27V$  for 10 years of aging, respectively. Figure 7 shows the fitting of the mathematically convenient function (1) (the red and blue curves) to the comprehensive analysis of **Error! Reference source not found.** (the white and black dots).



**Figure 7:** Threshold voltage shift  $\Delta V_{THp}$  as a function of signal probability  $P_z$ 

The *SPICE* simulation process consisted of simulating the basic cells of the technology library for different  $\Delta V_{THp}$  (*pMOS* transistor threshold voltage shift) values in order to capture the dependence of gate output delay on  $V_{THp}$ . Figure 8 displays the typical n- and p-networks displaying device interconnections for *INVERTER*, *NAND* and *NOR* gates considered for simulation.



**Figure 8:** Typical n- and p-networks displaying device interconnections for (a) Inverter, (b) NAND and (c) NOR gates considered in this work



Figure 9: Dependence of gate output delay on  $V_{THp}$  shift for the rising transition  $0 \rightarrow 1$  in SPICE (blue curve) and the curve function (dashed black) that matches the SPICE results for (a) Inverter, (b) NAND2 and (c) NOR2

Figure 9 shows the gate-aging characterization step in SPICE for Inverter, NAND2 and NOR2 gates. It captures the dependence of Gate Output Delay on  $V_{THp}$  for the rising input transition  $0\rightarrow 1$ . The following mathematically convenient function (dashed black curve) was matched to the curves characterized with SPICE (blue curve) in order to calculate the percentile change in gate delay values:

$$\Delta t_{gate} = \lambda \cdot \Delta V_{THp}(x_i) + (\mu \cdot \Delta V_{THp}(x_i))^2 \qquad (2)$$

where  $\Delta t_{gate}$  is the nominal gate delay increase in percent for the gate,  $\Delta V_{THp}(x_i)$  is the change of threshold voltage  $V_{THp}$  for *pMOS* transistors at the gate input  $x_i$  and  $\lambda$  and  $\mu$  are technology dependent constants. In our experiments  $\lambda$  and  $\mu$  are set to 3.1 and 2.7 for the NOR gate, to 2.05 and 2.85 for the NAND gate and to 2.9 and 1.5 for the Inverter, respectively.

Note that only the increased gate delay for the  $0\rightarrow 1$  transition at gate inputs were characterized as the *SPICE* experiments revealed no deviation, or at least there was a negligible deviation in the input  $1\rightarrow 0$  transition delay after aging. This can be explained due to the fact that as the *pMOS* device in the *p*-network is getting older, it facilitates the task of discharging the gate output capacitance by the *nMOS* device,

placed in the *n*-network. The exception is a *NOR* gate, especially *NOR* with multiple inputs, e.g. *NOR4*, where gate delay degradation for input  $1\rightarrow 0$  transition became slightly negative, i.e. transition delay decreased compared to the nominal one.

Note that when compared with the *NOR* gate, the *NAND* gate does not display reduction of the output delay because of the *n*- and *p*-network topologies: in the *NOR* gate case, *pMOS* devices are connected in series (resp. in parallel for *NAND* gate), whereas *nMOS* devices are connected in parallel in *NOR* (series in *NAND* gate, see Figure 8). This device interconnection facilitates the discharge of the gate output capacitance by the *nMOS* devices, while rendering more difficult the task of *pMOS* devices to charge-up the output capacitance in the case of *NOR* gates, that is the reason we observed reduction of the output delay for the fall-edge delay. The proposed functions (1) and (2) closely match the *SPICE* electrical characterization and the *NBTI* data from Error! Reference source not found, respectively. These functions were implemented in *gate-level aging simulation to provide extremely fast calculation for NBTI-induced delay degradation.* 

#### 2.4.4 NBTI-critical path delay calculation

In the following, a method for fast calculation of the *NBTI*-induced delay degradation at paths of a gate-level circuit is proposed. In the calculation process we use the following notations:

- $d(G_k)$  is the nominal delay of the fresh gate  $G_k$  without considering aging, i.e. its delay at time zero;
- $\tau(G_{k,i})$  is the increase in the delay of the gate  $G_k$  from the *i*-th input  $x_{k,i}$  to the output of the gate caused by NBTI-induced aging;
- $t(G_{k,i})$  is the total delay of the gate  $G_k$  from the i-th input  $x_{k,i}$  to the output of the gate caused by NBTI-induced aging,

$$t(G_{k,i}) = d(G_k) + \tau(G_{k,i});$$

 $t(G_k)$  – is the total maximum delay of the gate  $G_k$  over all its  $m_k$  inputs, when taking into account *NBTI*-induced aging,

$$t(G_k) = \max \{t(G_{k,1}), t(G_{k,2}), ..., t(G_{k,m_k})\};$$

 $D(G_k)$  – is the delay calculated for the slowest signal path in the cone  $C_{IN}(G_k)$  based on the values of  $t(G_{k,i})$  for all gates on this path,

$$D(G_k) = \max \{ (D(G_i) + t(G_{k,i})) \mid G_i \in IN(G_k) \},$$

where  $IN(G_k)$  is the set of input gates of  $G_k$ , and  $t(G_{k,i})$  is the total delay of the gate  $G_k$  from the output of the gate  $G_i$  caused by aging;

Consider a combinational circuit as a network of gates where all the gates have numbers which show the ranking of gates in a partial order such that:

- (1) all the input gates are numbered in an arbitrary order,
- (2) all other gates may get their numbers only if all their predecessor gates have already got their numbers.

We present Algorithm 1 for calculating  $D(G_k)$ , which is based on processing the gate-level netlist, gate by gate, from inputs to outputs. The method calculates the maximal degraded path delay values  $D(G_k)$  for all the gates of the circuit based on the estimates of  $t(G_k)$ , where NG is the number of gates in the circuit.

**Algorithm 1**. NBTI-aware static timing analysis

```
FOR all gates G_k, k = 1, 2, ..., NG:
t'(G_{k,i}) = \begin{cases} t(G_{k,i}), & x_{k,i} = 0 \\ d(G_k), & x_{k,i} \neq 0 \end{cases}
D(G_k) = \max \{D(G_i) + t'(G_{k,i}) \mid G_i \in IN(G_k)\}
```

As a result, fast an accurate calculation of NBTI-degraded paths will be performed at the gate-level.

Experimental results show a high scalability as well as good match with SPICE simulation results for the proposed method.

## 2.5 Conclusions on aging faults

After an introduction about aging faults and state-of-the-arts publications, we described our original contributions stemming from the BASTION project.

The proposed approach addresses the design of aging monitors, and enables a new generation of very high dependability many-processors SoCs in safety-critical applications. It consists of aging monitors, which can be related to the performance of processors. More specifically, an NBTI monitor has shown to have correlation with propagation delay in the aged processor.

Moreover, an embedded instrument has been designed. This instrument is able to measure the change in threshold-voltages caused by aging effects. This instrument can provide detailed information with respect to the performances of digital systems.

In addition, the Tallinn University of Technology has introduced a method of hierarchical modeling of dynamic NBTI aging to enable analysis of aging effects in large circuits. This can use the results from the previous research of the University of Twente. Preliminary experimental results show high scalability as well as good match with SPICE simulation results. Work on the evaluation of the method continues, to validate these new results in practice.

#### 3 NFF faults at the IC level

#### 3.1 IC-level vs. board-level NFFs

No Fault (or Failure) Found (NFF) is a term used in various fields, especially in the electronics industry referring to a system or component that has been returned to the manufacturer or distributor for warranty replacement or service repair, but operates properly while being re-tested. This situation is also referred to as No Defect Found (NDF) and No Trouble Found (NTF) and it is closely related to test escapes.

The typical NFF symptoms at the product level are:

- o System passes all tests at the production
- o System fails at the end customer
- o Troubleshooting cannot repeat the failing condition

NFF returns can seriously erode profit margins for manufacturers and service providers. The time, materials and shipping costs in exchanging hardware is enormous in relation to the cost of the item being replaced. Further, NFF returns can also indicate those customers' problems have not been resolved, thus implying reduced customer satisfaction and eroded brand value.

NFF can be considered at different system levels and different stages of the supplierclient value chain. In the context of BASTION, we consider two corner cases:

- a) NFF in the IC (component-level) and
- b) NFF at the **board** (product-level).

The latter case was described a few paragraphs above, while the former case is similar, but it does not involve the end customer. Putting it simple, if board-level or product-level tests are failing in the way that indicates a particular IC being faulty, but that IC re-testing is OK, it is considered to be an *IC-level* NFF problem.

According to our current hypothesis, the two cases should be considered separately and they may have different major NFF causes. The initial findings indicate that a realistic NFF contributor at the IC level is the intermittent fault, which is detailed in the following section. At the board-level test, the root-cause of NFF (in addition to intermittent faults) could be the incompleteness of test coverage metrics due to missing fault models for certain types of defects (primarily timing faults) and, as a consequence, missing of certain types of deterministic test sets.

In this section, NFF faults and different types of NFFs, mainly at IC level are introduced. The relation of NFF and intermittent faults and how intermittent faults can be distinguished from other kind of faults are explained. After a classification of NFFs, a study of state-of-the-art works in NFF faults is presented. In the last part, our contribution with regard to IC-level NFF faults is presented. The *board-level* NFF problem study is given in Chapter 4.

#### 3.2 Introduction to IC-level NFF

With the continuous decrease of CMOS feature size, there are more electrical products that show anomalous behavior in the field but function properly during testing. These products are known as NFF (No Fault Found). These products are also referred to as "could not duplicate", "trouble not identified", "retest OK", "no trouble found" and so on [12-14]. In order to improve product reliability and reduce return costs, manufacturers spend a significant amount of time and money to investigate causes of NFFs.

A potential important cause of NFFs corresponds to intermittent faults. Faults in semiconductor devices can be classified as permanent, transient and intermittent faults. Transient faults are induced by temporary environmental conditions such as neutrons from the atmosphere and energetic particles from packaging material. Hard (or permanent) faults reflect irreversible physical changes, mainly corresponding to manufacturing defects, such as contaminations in silicon devices or wear-out of materials. Intermittent faults occur due to unstable or marginal hardware, and they can sometimes be activated by an environmental change such as temperature or voltage alterations [15], [16].

Transient and intermittent faults manifest very similarly. However, an intermittent fault is distinguishable from a transient one by the following criteria:

First, replacement of the offending circuit removes the intermittent fault, by contrast with transients, which cannot be eliminated by repair.

Second, transient faults affect random locations while an intermittent fault occurs repeatedly at the same location.

Third, errors induced by intermittent faults tend to occur in bursts [17].

#### 3.3 Classes of IC-level NFF faults

A specific category of NFFs are intermittent faults, characterized by random low-level occurrences in time, randomly fixed in locations, but repairable (at least in PCBs and cabinets) if found. Especially in the space and avionic application fields, this category of faults ranks among the highest in terms of occurrence (>50%) as well as cost [18]. Different categories of intermittent faults can be identified. The next subsections address them.

#### 3.3.1 Intermittent resistive faults

Intermittent resistive faults usually are a prelude to permanent faults (in particular, open faults) in time (aging) [19]. It is important to note that intermittent faults are quite dependent on environmental conditions, like temperature or mechanical effects (e.g., vibration) [18].

There are several physical root causes of intermittent resistive faults. At PCB and cabinet level, cold solder contacts (see Figure 10a), damaged traces/wires, and loose connectors are the major reasons. In integrated circuits, the continued scaling of interconnection is likely to increase intermittent faults. Origins can be electro

migration (EM), soft breakdown (SBD), material residuals and induced voids (see Figure 10b) and cracks in 3D Through-Via Contacts (TSVs) within chips [20].



Figure 10: Several possible causes of intermittent faults: a) Cold (cracked) soldering joint on a PCB, b) voids and cracks in TSVs.

#### 3.3.2 Intermittent stuck-at faults

An intermittent stuck-at fault causes the value on the faulty signal line to intermittently be stuck at a logic value "1" or "0". The most vulnerable structures to intermittent stuck-at faults are storage structures, such as memory and register file. Intermittent stuck-at faults are caused by residues in storage cells or solder joints during manufacturing.

#### 3.3.3 Intermittent short faults

Intermittent short faults are shorts in wires or shorts in transistors. If an element is intermittently shorted to power or ground, it is equivalent to an intermittent stuck-at fault. If two signal wires are shorted together, an intermittent bridging fault occurs.

#### 3.3.4 Intermittent open faults

Intermittent open faults are breaks or imperfections in circuit interconnections such as wires, contacts, transistors and so forth. These faults are usually caused by electromigration, stress migration, or intermittent contacts.

#### 3.3.5 Intermittent timing faults

Intermittent timing faults will result in timing violations and affect data propagation when they occur. They usually lead to writing wrong data to storage cells. Intermittent timing faults can be broadly classified into intermittent path-delay faults and intermittent transition faults. Intermittent timing faults are mainly caused by inductive noises, aging, crosstalk, or process, voltage, temperature (PVT) variations.

#### 3.4 State-of-the-art in IC-level NFF faults

The effects of transient and permanent faults have been extensively analyzed [21, 22]. However, less attention has been given to the intermittent faults. Field collected data and failure analysis results from avionic instruments [23-25], and commercial electrical systems [14, 17, 26, 27] clearly show that intermittent faults are a major cause for field returns in modern integrated circuits.

Recently, there are few publications which study the impacts of intermittent faults on microcontrollers [28] and microprocessors [29-31]. In [28], the authors have generated fault models for intermittent faults at logic and RTL abstraction levels, and injected these faults in the VHDL model of a microcontroller to study their impact on the system behavior. Their experiments were carried out using the so-called fault injection technique. Their study has been focused on the buses. They used a tool called VFIT to inject intermittent faults in a VHDL model of an 8051 microcontroller. Their fault model included intermittent pulse, intermittent short, intermittent open, and intermittent delay. Their experimental results show that the percentage of failures when injecting *multiple* intermittent faults is between 60% and 80% depending on the *workload*. In addition, the impact of pulse, open and short intermittent faults are very similar, provoking high values of the percentage of failures, but the intermittent delay fault model has a lower impact than the intermittent pulse, intermittent open and intermittent short, due to timing masking in synchronous components.

Gil-Tomás et al. in [31] studied the impact of intermittent faults on the behavior of a reduced instruction set computing (RISC) microprocessor. The authors of this work used VHDL-based Fault Injection to investigate the impact of intermittent faults. Authors emphasize that their work complements their previous works published on a commercial complex instruction set computing (CISC) microcontroller.

Another work which studies intermittent faults in microprocessor is [29]. In this paper, authors propose a metric called *intermittent vulnerability factor* (IVF) to characterize the vulnerability of microprocessor structures to intermittent faults. A structure's IVF is the probability an intermittent fault in that structure causes an external visible failure. They compute IVFs for register files and buffers considering three intermittent fault models: the intermittent stuck-at-1 and stuck-at-0 fault model, the intermittent open and short fault model, and the intermittent timing fault model.

Their experimental results show that among the three major types of intermittent faults, intermittent stuck-at faults have the most serious impact on program execution.

## 3.5 BASTION contributions to NFF fault study

In [32][78], we have made a first step to investigate the effects of a special category of No Faults Found, being intermittent resistive faults resulting from interconnection flaws which are random in time, but not in location(s). They are known to be extremely difficult to detect, diagnose and in chips to correct. They occur in PCBs as well as integrated circuits; with the arrival of 3D-TSV in future integrated systems they are likely to occur more frequently than nowadays, and ball-grid SoCs have been shown to be vulnerable for this type of faults.

To have an insight into intermittent resistive fault behavior, we emulated several loose connections and cold soldering / ball grids. We precisely measured the amount of

resistance that loose connections or cold solder can induce on a given circuit. Figure 11 shows the *measured* intermittent resistive behavior of a loose connection. Thermal and vibration effects were generated using external stimulators.



Figure 11: Measured resistance of a damaged/cold soldering interconnection.

Based on our (and others, like Ridgetop) measurement experiences, a simulation injection model for intermittent resistive faults has been developed (see Figure 12). The parameters in this simulation-based fault injection model can be extended and changed at will, as they are probability functions.



Figure 12: Basic scheme of a programmable intermittent resistive fault injector for our simulator environment. Resistance, burst number, activation time, inactive time and delay are the input parameters.

As intermittent resistive faults have different influence on analogue and digital circuits, we have investigated the influence of them on analogue and digital CMOS circuits separately in [32] and [78].

#### 3.5.1 Intermittent fault effects on analogue CMOS

In order to get an idea of the influence of the fault parameters on analogue CMOS circuits, a number of experiments have been carried out.

Two examples of intermittent resistive faults have been generated by our fault injectors which are shown in Figure 13. The fault model has been written in Verilog-A and can be directly used in combination with Cadence Virtuoso tools.

The first fault model has resistance value  $1\Omega - 850\Omega$ , a start time of  $10\mu s$ , a minimum activation and inactivation time, the safe time equals to  $10\mu s$ , and 10 events occur in the burst.

The second fault has the following specifications R:  $1\Omega$ – $1K\Omega$ , start time of  $5\mu$ s, minimum activation and inactivation time, safe time equal to  $15\mu$ s, and 8 events in burst.



Figure 13: Two examples of an intermittent resistive fault generated by our injector. a): R:  $1\Omega - 850\Omega$ , b) R:  $1\Omega - 1K\Omega$ .

As an initial target circuit for our experiments, an analogue 65nm TSMC OpAmp has been chosen. This can easily degenerate in a digital inverter or buffer; but it has the advantage to provide us more information on what really changes in the circuits, without inmediately being (or not) restored in the pure digital case.

Our experimental (Cadence simulation) results show that intermittent resistive faults on the power signal line can have the largest influence. Figure 14 shows the simulated power-supply current signal of a component with an emulated intermittent resistive fault on the power-supply signal of the component.

Via a current monitor, this kind of behavior could be observed continuously.



Figure 14: Simulated power-current output signal.

Finally, as it can be observed from Figure 15, the SNR decreases if the number of events increases. This Figure shows the SNR of the output as a function of the number of events in a burst of an intermittent resistive fault in the input and power-supply line.



Figure 15: Influence of sweeping the burst length from 1 to 10 on the output SNR of the example component (OpAmp).

## 3.5.2 Intermittent fault effects on digital CMOS

In [78], we have investigated the impact of intermittent resistive faults on the behavior of a digital CMOS circuit via simulation. The occurrence rate of this kind of defects can take a very long time, such as one month, whereas the duration of the defect can be as short as 50 nanoseconds. Therefore, evoking and detection of these faults are a huge scientific challenge. An on-chip data logging system with time stamp and stored environmental conditions, along with the detection, will drastically improve the task of maintenance of avionics and reduce the current high debugging costs.

To evaluate the influence of intermittent resistive faults on the electrical behavior of digital circuit, a fault injector has been used, as showed in Figure 12. There are six parameters that can be set according to the specific application, with a minimum and maximum value and a certain (random) distribution.

As a simple example we have used a static CMOS full-adder circuit, its sum and carry outputs latched in D-type flip-flops, in 45nm Nangate CMOS technology. The circuit

operates at a clock frequency of  $3.3 \mathrm{GHz}$  (0.3ns). The logic scheme of the combinational part is depicted in Figure 16a; its transistor implementation is provided at a lower hierarchical level. The single (statistically) generated IRF in the carry  $C_{in}$  input is shown in Figure 16b.



**Figure 16:** Simulation of a full adder under influence of an IRF in input Cin. a) logic scheme of a full adder in 45nm NAN CMOS library, b) used IRF input pattern at input Cin (red star shows under which condition a logic error occurred, value on top is the Iddt value), c) the simulated functional output voltages of the full adder, including the dynamic power current. Red stars indicate functional logic failures.

The flip-flop clock clk, inputs (X, Y, Cin) and output (S, S-FF, CO, C-FF) voltages, and dynamic supply current Iddt are shown in Figure 16c.

As can be seen in Figure 16b, the carry input Cin has been disturbed by the IRF (a burst of 20 pulses), but only in three cases (red star) this translated into an incorrect logic output after the flip-flops. This is because digital CMOS circuits are very robust with regard to disturbances; or in terms of testing, most of the faults (IRF) are being masked. However in the analogue dynamic power current (Iddt), disturbances can be noticed (bottom Figure 16c).

As another experiment, an IRF has been inserted in the Vdd line; the generated IRF at Vdd is identical to Figure 16b. The resulting outputs are shown in Figure 17a, while a detail of Vdd and Iddt are shown in Figure 17b.

As can be concluded from the above simulation results, the impact of an IRF on Vdd (as well as Ground) is quite large. From these results, it becomes clear that analog data is the best way of monitoring IRFs in digital circuits, thus avoiding the logic masking of IRFs.



**Figure 17:** a) The resulting output caused by an IRF at the Vdd of the full adder, and (b detail of simulated Vdd voltage of the full adder, including the Iddt current.

In the case that resistive pulses are much smaller than the clock, the detection will be more difficult. Depending on the location in time of an IRF pulse/burst (four cases), no logic faults will occur. Only in the case of the existence of a pulse during the rising clock edge, there could be a very small chance that a logic fault would occur. As also the duration is smaller than the clock, the RC time will not affect the circuit. It is not practical to detect such an IRF via a functional test. This makes the IRF detection quite difficult. Therefore, one possible option is to detect short pulses of analog values of a voltage (or current) in the circuit. A number of circuits have been suggested in the past to handle related tasks, like late transition detection [79]. Figure 18 shows a circuit to detect small pulses generated by IRF. It uses a number of flip-flops, (say 10) which receive the same input in parallel, but with different clock delays (using the internal clock of the system). The flip-flop outputs are all connected to a multi-input OR gate, basically detecting any output differences. Our simulations show this technique increases the possibility of detecting IRF with small pulses.



Figure 18: A logic circuit to detect very small IRF-related pulses (smaller than the used clock duration).

## 3.6 Conclusions on NFF (intermittent resistive) faults

We presented a simulation model for a particular type of NFF, corresponding to intermittent resistive faults resulting from bad interconnections, as well as an environment which supports this model. The resulting behavior of a transistor-level circuit under different fault conditions has been examined for digital and analogue systems. The simulation results show how intermittent faults affect performance of analogue as well as digital systems. It is shown that because of the rare occurrences and short bursts, these faults are the most difficult ones to detect and diagnose by traditional test techniques. We proposed a design which is able to detect very small pulses caused by intermittent faults.

#### 4 NFF faults at the board level

The general introduction to the NFF phenomenon has been presented in Chapter 3, together with detailed study on IC-level NFF. This chapter concentrates at the *board*-level (or product-level) NFF, as these should be considered separately. They may have major NFF causes that are different from the *IC*-level counterpart.

Our initial hypothesis here is based on the identified incompleteness of test coverage metrics at the board-level, due to missing fault models for certain types of defects, primarily timing faults. As a consequence, certain types of deterministic test sets are missing. We also observed limitations in state-of-the-art industrial best practices for traceability data analysis, DPMO tracking, and, hence, space for improvement in NFF case analysis. These *preliminary* results define our further research in WP1 and first inputs for WP4.

#### 4.1 Introduction

The NFF phenomenon is extremely hard to study, as we need to analyze faults which have not been detected. As they are not detected, there is no data to understand root causes or adequate action. It is difficult, therefore to anticipate the defect occurrence and defect detection by advanced test function.

#### "How to know what is unknown?"

Donald Rumsfeld say: "There are known knowns; there are things we know we know. We also know there are **known unknowns**; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know.". It is the background of BASTION research on NFF.

In order to overcome this difficulty, we decided to organize the *board-level* research on NFF along 3 axes:

#### 1. NFF known from state-of-the-art

Analyze the state-of-the-art in order to define the defect universe and more specifically, NFF faults. The international contributions have been reviewed to define the various defect classes that have been considered. Data stratification has been arbitrarily defined, to group the defects based on the product life cycle (design defects, supply chain-component defects, placement defects, soldering defects, functional defects and aging defects).

#### 2. NFF known from industrials partners

The BASTION consortium has been organizing a survey, which is presented in the form of questionnaire, see Appendix I.

A number of industrial partners have already been, and will be, invited to participate in the survey. The analysis of the results will be a valuable source of information to subcategorize the defect categories and classes.

#### 3. NFF known from test data analysis

Supervise product life cycles in order to understand the defect universe. A product is passed through a line of test which, step per step, detects specific types of defects. Experiments will be run, with fault injection and subsequent test application, in order to identify potential/typical test escapes.

In the following, the state-of-the-art is presented with key definitions for defect, fault model, test coverage and product yield. All these concepts are tied together and affect the Slip, or Escape Rate, which is critical for the NFF phenomenon.

As a first result of the BASTION research program, the unified (IC & Board) defect stratification is proposed, based on the product-life cycle. We are demonstrating the weaknesses in existing board-level test coverage metrics and fault models, potentially leading to test escapes and, consequently, NFF.

As a second result, the QuadDPMO algorithm has been developed which makes possible the extraction of DMPO metrics from a traceability & repair data.

Finally, as the DPMO extraction is subject to be linked to the traceability tool for performance/availability, we decided to create a survey which has two key benefits:

- It helps to get information from the industry on a short term basis
- It will give us references to compare to the QuadDPMO algorithm.

#### 4.2 State of the art

#### 4.2.1 Defects, fault model and test coverage

Some definitions to ease the understanding:

A <u>defect</u> is an unacceptable deviation from the norm. A defect is therefore undesirable and requires some remedial action, such as discarding the board, or repairing it, or at the very least, fixing the process step responsible for it.

A <u>fault model</u> is the description of the electronic behavior in the presence of a defect. This behavior, often an abnormal signal, is what should be detected by a test.



Figure 19: Test strategy versus the defect universe.

The <u>test coverage</u> is a number (in percent) that measures the capability of a test, to detect the defects.

The <u>fault coverage</u> is similar to the test coverage but expresses the ability of a particular test, to detect a particular fault.

The detection of all theoretically possible defects, in order to deliver defect-free products to the end-user, is a strong industry requirement due to market expectation.

#### 4.2.2 Yield estimation: combined defect and test coverage

Publications from the industry show how to define and combine defect and test coverage in a "production model" report (Figure 20). In this model, the so-called Slip/Escape Rate is studied in detail.

There are several models for defect categorization available, including MPS [34], PCOLA/SOQ [33] and PPVS [35] [36]. They simply differ in the number of facets (defect groups and coverage groups) which are considered during the analysis. Those models are *not* suitable for NFF analysis as new defect classes must be taken into consideration.

The objective of the yield estimation is to summarize the facets of the coverage and the defect opportunities, in a limited set of information, in order to guide the test strategy choice.



Figure 20: Production model - Yield estimation summary.

Yield estimation is performed referencing the following parameters (see Figure 20):

- *Production Yield* represents the probability that a board is good. It is calculated by the accumulation of the defect opportunities on the components and the pins.
- *Test Efficiency* is calculated using a weighted coverage. According to the level of analysis, it is possible to consider only one stage of test, or the whole range of tests on the production line.
- Fall-Off Rate corresponds to the boards that fail the test. This group includes real defective boards as well as wrongly rejected boards. These boards will be repaired before being integrated at the following stage.
- First-Pass Yield corresponds to the boards that pass the test. This group includes good boards plus bad boards that passed the test because of insufficient test coverage. The Slip is an effective way to measure the product quality.



Figure 21: Yield estimation in a test line.

# 4.3 Analysis of gaps in test coverage

Board-level fault coverage is a composite result of completely different test, measurement and inspection techniques. Aspects such as measured electrical parameters, Boolean values, logic behavior, error statistics, delay thresholds and even color and appearance are among those typically checked for on respective structural parts of the board. Hence, test techniques and test equipment have to be designed using knowledge from very different fields of science and technology. Collectively, all of these different techniques need to provide measurable combined test coverage of a good quality. The situation is similar to comparing apples and oranges, and creates a great challenge in developing good test coverage metrics.

#### 4.3.1 Escape rate

The BASTION research program is investigating the NFF phenomenon. The defects which are detected during board-level integration or system-level test should have only two sources:

- 1. The defect is already on the board prior the integration level. The board test is not able to detect the defect, due to lack of coverage (*Slip*), wrong coverage metrics or a defect universe misaligned with the industrial reality.
- 2. The defect occurs later, after board test (aging) and/or is intermittent.

In addition, new coverage metrics are required to consider all new defects/faults. These will be investigated during task T1.4 (Improvement of Test Coverage Metrics).

The available sources of data are targeting, almost exclusively, manufacturing defects. We mention:

- IPC-9261A IPC Standard relating to In-Process DPMO and Estimated Yield for PCAs.
- iNemi Defect case study relating to manufacturing defects (www.inemi.org).
- **PPM Monitoring** Professional Web site (<a href="www.ppm-monitoring.com">www.ppm-monitoring.com</a>) is now closed. It helped CMs and EMS to compare manufacturing performances. Some data are still available on <a href="www.smartgroup.org/downloads/PPMandDefectMonitoringaProcess.pdf">www.smartgroup.org/downloads/PPMandDefectMonitoringaProcess.pdf</a>

In the next section, NFF faults are explored more thoroughly.

#### 4.3.2 New classes of board-level faults

An analysis of academic publications and industrial contributions has been launched in BASTION, to classify existing approaches and practices for defect/fault modeling.

At the beginning of the BASTION Research Program, as the defect occurrence is unknown, it was started with the assumption that each defect occurrence is attached to a phase in the product life: Design, Manufacture, Function and Product lifecycle as a whole. Table 1 is a first result of the BASTION research program, presenting a set of defect categories/classes, which will be discussed in detail subsequently.

| Step | Defect category      | Defect classes                                                      |
|------|----------------------|---------------------------------------------------------------------|
| 1    | Design               | Should be defined based on FMEA (Failure Mode and Effect Analysis). |
| 2    | Material / Component | Bad Part                                                            |
|      |                      | Electrically dead                                                   |
|      |                      | Functionally bad                                                    |
|      |                      | Tolerance defect                                                    |
| 3    | Placement            | Missing component                                                   |
|      |                      | Wrong component                                                     |

|   |                      | Misaligned component                                                |
|---|----------------------|---------------------------------------------------------------------|
|   |                      | Tombstone component                                                 |
|   |                      | Inverted component                                                  |
| 4 | Solder / Termination | Solder Bridge                                                       |
|   |                      | Insufficient Solder                                                 |
|   |                      | Open                                                                |
|   |                      | Excessive Solder                                                    |
|   |                      | Residue                                                             |
|   |                      | Grainy Solder                                                       |
|   |                      | Lifted leads                                                        |
|   |                      | Bent leads                                                          |
|   |                      | Cold solder                                                         |
|   |                      | Solder Voids                                                        |
| 5 | Function             | Dynamic (at-speed) defect                                           |
|   |                      | • Crosstalk                                                         |
|   |                      | • Jitter                                                            |
|   |                      | Delays     Parformance mahlema                                      |
|   |                      | <ul><li>Performance problems</li><li>Bit-error rates high</li></ul> |
|   |                      | Bad features                                                        |
|   |                      | Programming / software                                              |
| 6 | Product Lifetime     | Intermittent resistive                                              |
|   |                      | Intermittent stuck-at                                               |
|   |                      | Intermittent short                                                  |
|   |                      | Intermittent open                                                   |
|   |                      | Intermittent timing                                                 |
|   |                      | Ageing (wear-out) problems                                          |

Table 1: QuadDPMO defect categories.

# 4.3.2.1 Design defects

When the overall PCB assembly creation process is examined, two main phases can be seen: design and production. Faults are introduced during both of these phases. First, we take a look at the design faults.

The common approach for predicting design faults is employing FMEA (Failure Mode and Effect Analysis). An FMEA procedure consists of two parts:

- Determine the failure models (simulation, experiments, expert system, etc.)
- Determine the possible effect of each failure. This consists of two parts: the occurrence (how likely is it) and the severity (if it occurs, how serious is its effect).

Most of the potential faults that are identified during an FMEA should be 'solved' in the design phase. A balanced prototype verification strategy follows such a 'correct by design' approach and the remaining number of design failures is zero, or at least fairly limited. Therefore, design failures are not targeted when testing for production defects.

#### 4.3.2.2 Material defects

In order to create a good product, all components should be functioning correctly. The components, in the widest sense of its meaning are the resistors, transistors, ICs, transformers, but also the bare PCB, and the design. So all items used to assemble the products, are included as components. In general, the component suppliers guarantee the quality of all incoming components. However, it is known from practice that component failures do occur and therefore, need to be checked for. However, this should be kept to the necessary minimum.

#### 4.3.2.3 Placement defects

A component should be placed correctly. Every placement machine has its tolerances but, sometimes a component is missing or simply drops out of the placement machine for some reason. So, presence and accurate placement are issues to deal with. In practice, two basic issues are checked:

- A component is placed to make contact. If a contact check fails, the expected signals will fail. So the test has more relation with the third category: pin level defects, described further on.
- A component is placed within accuracy limits. This check is most often done by the placement machine itself and makes it possible to check for cleanliness and wear of the machine, as well as for alignment within 6-sigma processes.

#### 4.3.2.4 Pin-level defects

Every connection should be made correctly. This is the most elementary test: if all incoming materials are correct, and one can prove that all connections are correct, then the circuit will work.

#### 4.3.2.5 Functional defects

Defects related to the functionality of an application are referred to as functional defects. Understanding how functional test can supplement upstream test stages becomes critical to implementing an end-to-end test plan.

The iNEMI Functional Test Coverage Assessment Project, proposed to add functional test specific defects to a structural defect list:

- "Feature", an aggregate of any or all silicon, circuitry and software.
- "At-speed", testing the full range of functional speed from lowest to highest.

- "In parallel", creating system stress by running tests in parallel measurement, discrete measurement (voltage, current, dB, etc.) or CRC, BER, etc.
- Other potential items:
  - o Diagnosability can the fault be deduced from the failure?
  - o A criterion for system stress.

More details on these issues can be found in:

iNEMI Functional Test Coverage Assessment Project

#### 4.3.2.6 Product lifetime defects

This category includes IC-level ageing (wear-out) defects and intermittent defects which can be modeled as:

- Intermittent resistive faults
- Intermittent stuck-at faults
- Intermittent short faults
- Intermittent open faults
- Intermittent timing faults.

Some of them have been described in previous sections.

# 4.3.3 NFF faults due to Board-level test coverage weaknesses

This sub-section will attempt to point out a weakness in the state-of-the-art methodology that creates a lack of board-level test coverage with respect to timing faults (performance/dynamic/at-speed faults). Potentially, this leads to insufficient test stimuli, test escapes and consequently to costly NFFs. This is one of initial hypotheses we are targeting in BASTION's WP1.

Typically, each produced board assembly has to pass a sequence of different test phases before being qualified for shipping. The amount of these phases could be numerous for a modern complex electronic product, but depends on its complexity and quality/reliability requirements. The particular combination of test techniques is also dictated by the economic feasibility [50][51], as each particular test type fits best for a limited target class of defects, while covering additional defect classes is either impossible or requires extra costly effort. An efficient test strategy for a complex digital or mixed-signal product typically includes at least one technique for each category, as shown in Table 2.

#### 4.3.3.1 Classical PCBA test techniques

Inspection techniques help to check the general integrity of board assemblies including component presence, polarity, soldering quality, lifted leads, etc. Electrical test and measurement techniques are efficient when testing passive or analog components on the board by measuring their values, polarity and other parameters. Testing timing faults is not foreseen. Indirect testing for performance problems is

| Test Domain                      | Typical Test Techniques                                                                                                                             | Timing Faults (TF)                                   | TF Coverage                          |
|----------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------|--------------------------------------|
| Inspection                       | Pre-reflow: Solder Paste Inspection (SPI), Automated Optical Inspection (AOI) Post-reflow: Visual inspection, Automated X-ray Inspection (AXI), AOI | Does not address                                     | N/A                                  |
| Electrical test                  | In-Circuit Test (ICT), Manufacturing Defect<br>Analysis (MDA), Flying Probe Test (FPT)                                                              | No capability                                        | N/A                                  |
| Scan test                        | Boundary Scan (BS) and other test<br>techniques based on IEEE 1149.1 and<br>related standards (see Table 3)                                         | Indirectly with<br>IEEE 1149.6 only<br>(see Table 3) | Indirect (IEEE<br>1149.6 only)       |
| High-speed test<br>& measurement | Processor-centric automated test solutions,<br>FPGA-centric automated test solutions, Bit-<br>Error Rate Test (BERT)                                | Capable with proper test stimuli                     | No universal<br>metrics<br>available |
| Embedded instrumentation         | BIST instrumentation (fixed hardware),<br>Synthetic instrumentation on FPGA (flexible<br>hardware)                                                  | Capable with at-<br>speed test mode                  | No universal<br>metrics<br>available |
| Functional test                  | Test of interfaces and basic behavior, test of main functions (fit-for-function test)                                                               | Capable with proper test stimuli                     | Unknown                              |

Table 2: Main categories of PCBA test and Timing Fault coverage gap

possible by detecting structural defects affecting signal integrity (e.g., bad soldering, missing termination, etc.).

Scan test (such as JTAG / Boundary Scan) [52] is the industry's standard in board-level test today, providing an inexpensive yet efficient test technology, in terms of coverage and trouble-shooting capabilities. A brief summary of the standards from the Boundary Scan family and their target application purpose is given in Table 3.

| Main Target<br>Application   | Main Purpose                    | Essential<br>Technology    | Target Fault<br>Classes            | Timing Faults<br>(TF)      | TF Coverage                    |
|------------------------------|---------------------------------|----------------------------|------------------------------------|----------------------------|--------------------------------|
| IEEE 1149.1 -                | <b>Boundary Scar</b>            | n [65]                     |                                    |                            |                                |
| Manufacturing test of PCBA   | Test access (TA)<br>improvement | On-chip scan registers     | Pin-level faults;<br>net integrity | No capability              | N/A                            |
| IEEE 1149.4 -                | Mixed-Signal 1                  | est Bus [66]               |                                    |                            |                                |
| Measurement: analog values   | TA<br>improvement               | On-chip<br>switches        | Parametric values                  | No capability              | N/A                            |
| IEEE 1149.6 -                | <b>BST of Advanc</b>            | ed Digital Netv            | vorks [67]                         |                            |                                |
| Testing LVDS high-speed nets | Test trough AC-<br>coupled nets | On-chip pulse generators   | Net integrity:<br>AC and DC        | Indirectly (net integrity) | Unknown                        |
| IEEE 1149.7 -                | Reduced-pin a                   | nd Enhanced T              | AP [68]                            |                            |                                |
| Board test;<br>SW debug      | Flexible 2-pin<br>high-speed TA | SERDES,<br>addressing      | Same as all above                  | Not intended               | N/A                            |
| IEEE 1149.8.1                | - Pin Toggle a                  | nd Contactless             | Sensing [69]                       |                            |                                |
| Interconnect test of PCBA    | Links to passive components     | Capacitive sense plate     | Net opens:<br>AC and DC            | Not intended               | N/A                            |
| IEEE P1149.10                | ) - High Speed                  | Test Access Po             | rt (TAP) [70]                      |                            |                                |
| All of the above             | High-speed test data exchange   | J                          | Same as all above                  | Not intended               | N/A                            |
| IEEE 1687 – E                | mbedded Instr                   | umentation Acc             | cess [71]                          |                            |                                |
| IC test, debug, diagnosis    | Instrument access standard      | Reconfigurable scan chains | Instrument-<br>specific            | Capable by instruments     | No universal metrics available |

Table 3: Summary of IEEE standards from the Boundary-Scan family

The latest version of IEEE 1149.1 was issued in 2013 [53] with major updates incorporated, including a standardized means to control embedded instruments and pin-level electrical signal conditioning. Apart from the capability to provide access to BIST and other embedded instrumentation, BS-based test, (BST) is not intended for timing related fault testing.

The major limitation of the classical BST, is the inability to apply test patterns atspeed, hence limiting the covered fault spectrum to static (DC domain) faults. The classic industrial work-around has always been the usage of carefully crafted functional tests. The leading companies have recently adopted emerging high-speed or at-speed test techniques, based on the automated (re-) configuration or programming of on-board programmable devices like FPGAs. This is referred to as FPGA-centric [54] or FPGA-controlled [55] test, or by use of the processor, this is referred to as processor-emulation [56], processor-centric [57] or processor-controlled [58] tests. These techniques rely on JTAG infrastructure for test flow control and data exchange and convert available on-board FPGA/CPU devices, into embedded testers. Apart from the ability to cover timing faults, AC domain, delays, crosstalk, bad terminations, these techniques provide a very good degree of test access, due to the fact that FPGAs/CPUs are typically backbone components of complex digital and mixed-signal devices, by design [59]. When the test is done, the test configuration is erased and the board is configured into its normal functional mode. With these methods, no extra Design-for-Test (DfT) overhead is needed.

Another large class of the emerging board-level test techniques, the adoption of which is today still in its infancy, is *embedded instrumentation* [60]. In the context of PCBA test, two major sub-classes of embedded instruments can be named:

- a) Fixed built-in embedded circuits, mainly in ASICs;
- b) Synthetic reconfigurable multi-purpose instruments mainly in FPGAs.

Typical examples of the former class are Memory BIST or (Pseudo Random Pattern generators) PRPGs and error counters, for Bit-Error Rate Test (BERT) of a communication channel. The lack of standardization and common practices, limits wide adoption and reuse of such fixed embedded instruments at the board level, although IC-level applications of various BIST solutions are blossoming. Conversely, the FPGA-centric synthetic embedded instrumentation is a very promising emerging board-level test technique [59].

Being at the central structure of a board and allowing fully flexible reconfiguration and reuse, the FPGA becomes an excellent embedded tester. A few cutting edge JTAG-based commercial test systems provide synthetic embedded instrumentation platform for the following applications:

- Memory test and BIST (on board);
- Bit-Error Rate Test (BERT) on communication channels (gigabit links);
- Test of common buses (LAN, SATA, PCIe, USB, CAN, LIN, I2C, SPI, etc.) and UART;
- In-system test and programming of non-volatile memories (flash devices);
- User-defined instrumentation.

Embedded instrumentation opens an unprecedented potential in diagnostic access, monitoring and high-speed test. Studies show that industrial expectation towards benefits of adoption of embedded instrumentation is currently very high [61]. Active industrial research in this area is very active with two main focus points:

- a) Automation [54];
- b) Fault coverage improvement [62].

The new IEEE 1687 (IJTAG) standard opens up the door towards seamless integration of tools, algorithms, instruments, IP cores and test patterns [63] across vendors and users.

Together with relevant test coverage metrics, both test techniques described above (high-speed test and embedded instrumentation) promise good timing fault coverage.

Otherwise, if the metrics is missing, the coverage is accidental because generating good-quality test stimuli becomes difficult.

## 4.3.3.2 PCBA-level fault models and lack of testability metrics

There are several distinctive views on defect categorization, enumeration and coverage measurement at the board level, as shown in Table 4.

Modeling static (DC) faults (first two rows in Table 4) has long ago become the industrial standard, with minor updates following progress in mounting/integration and test technologies. The AC domain speed-related faults, (the last row in Table 4) represent today a major research and standardization challenge. Except the BER measurement, that reflects the channel quality, i.e. signal/noise ratio, more than the presence of a particular structural defect, there are no relevant industry-wide metrics used at the board level to measure quality of high-speed or at-speed tests (e.g., run from embedded instrumentation).

| Approach to fault modeling                                                                                            | Level of<br>Abstraction                                                                          | Examples of<br>Defects                                                                        | Test<br>Coverage<br>Metrics                                                                                                         | TF Coverage                                                                              |
|-----------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| Targeting<br>defects in<br>material and<br>defects caused<br>by assembly<br>process                                   | Structural<br>faults at<br>physical level                                                        | Bad soldering,<br>lifted/bent<br>leads, bad<br>component,<br>misalignment,<br>tombstone, etc. | PPVS [63],<br>MPS [64],<br>PCOLA/SOQ<br>[62]                                                                                        | PCOLA/SOQ/<br>FAM<br>extension<br>provides non-<br>quantitative<br>"at-speed"<br>measure |
| Targeting pin-<br>level and net-<br>level defects                                                                     | Structural and<br>behavioral<br>faults at logic<br>level                                         | Opens, shorts,<br>bad driver (pin<br>logic / buffer)                                          | stuck-at for<br>opens;<br>zero, one and<br>net dominance<br>for shorts;<br>stuck-driving<br>and -not driving<br>for pins [65]       | Not intended<br>for TF                                                                   |
| Functional<br>problems<br>caused by<br>defects                                                                        | System level<br>malfunction<br>(behavioral)                                                      | Booting failure,<br>unstable opera-<br>tion, wrong<br>behavior                                |                                                                                                                                     | Accidental TF detection; coverage unknown                                                |
| Performance-<br>related faults<br>mainly at<br>interconnect<br>lines, buses,<br>interfaces,<br>communication<br>links | Mainly<br>statistical (error<br>rates);<br>structural<br>approaches are<br>missing but<br>needed | High error rate<br>(slow perfor-<br>mance),<br>crosstalk, jitter,<br>delay fault              | Bit error rates<br>at communi-<br>cation links, but<br>no universal<br>industry-wide<br>structural fault<br>coverage metric<br>[60] | statistical<br>measure;<br>poor                                                          |

**Table 4: Different approaches to test coverage measurement** 

The three tables given in the above sub-sections summarize the weakness of common practices and state-of-the-art research in the field of board-level test and fault modeling in general. The incompleteness of existing test coverage metrics has an important implication in terms of potentially missing fault coverage. This is due to the incompleteness of test pattern sets, as a result of the inability to adequately measure achieved fault coverage, with regard to an at-speed test set and respective timing faults

In its turn, the potentially unknown lack of test coverage contributes to the important No-Failure Found (NFF) problem [26], which is very costly. Hence, defect characterization and fault coverage metrics improvement is clearly a topic for extensive research. This study performed in the framework of BASTION represents

the first phase, while continuing to prove this hypothesis by an empirical study. The result of the latter, will later contribute to the development of extended test coverage metrics in frames of T1.4.

# 4.4 BASTION survey

Prior to getting precise metrics from industrial sites, the BASTION consortium has been organizing a survey, which is presented in form of the questionnaire in Appendix I

A number of industries have already been, and more will be, invited to participate in the survey. The initial expectation was to catch the attention of at least 50 companies. It was quickly discovered that the NFF is a very interesting subject for all companies but most of the companies don't want to expose the defects that occur in the product life cycle.

The famous quote "Keep your secrets secret" may have become "Keep your defects secret" when applied to the electronics industry!

## 4.4.1 Industrial partners

The survey has been distributed in PDF format by email campaign to the ASTER installed base (about 800 industrial partners in Europe). Phone calls have also been organized to stimulate interests. The survey has been distributed in paper format during the Nordic Test Forum, to around 50 companies. A Web version of the survey has been designed and released on the BASTION Web site.





















Figure 22: BASTION survey - Industry sector

Most of the electronics industry is represented.

#### **4.4.2 What is NFF?**



Figure 23: BASTION survey - What is NFF?

There is a different interpretation depending on industry segment. The most popular definition is "A PASSED product which is FAILED at customer". Sometimes, the definition is reversed "A FAILED product at customer, which is PASSED when back to production site".

#### 4.4.3 Is NFF important?



**Figure 24:** BASTION survey - Is NFF problem Important?

NFF is a critical subject for all industrial partners today, but possibly even more important in the future.

#### 4.4.4 What is the reason of NFF?



**Figure 25:** BASTION survey - What is the reason of NFF?

In order to determine the reasons behind NFF, especially on the relation between faults, test and test coverage, there are several hypotheses being pursued:

- Insufficient coverage: test available but test coverage is low
- Missing test method: test can target the fault, but it is not implemented on the test line
- Missing fault model: test can target the fault, but no coverage metrics
- No test on certain faults: Test cannot target certain faults

The analysis of the answers has revealed that the NFF has 4 main sources:

- 1. Intermittent faults
- 2. Insufficient test coverage, Escape rate/Slip: Mismatch between the defect spectrum and test coverage
- 3. No coverage for certain faults, like timing related faults
- 4. Ageing seems unimportant...but as there is not always stress test, we cannot discard it! See section "Test line"

#### 4.4.4.1 Intermittent faults:

The most adopted NFF definition is: "A PASSED product which is FAILED at customer". Basically from analysis point of view, it PASS, it FAILS, it PASS...

It sounds like an intermittent fault. But is it really true? Maybe it masks a more complex reality.

At board level, intermittent faults have a more open definition:

- IC level intermittent faults which are detectable at board level
- Design issues linked with the accumulation of tolerances (lack of worst case simulations)
- High speed signals or timing related defects
- Bad contact (connector, backplane)



# 4.4.4.2 Insufficient test coverage, escape rate

Please refer to sections "Yield estimation: combined defect and test coverage", "NFF faults due to Board-level test coverage weaknesses" and "PCBA-level fault models and lack of testability metrics" for more details.

## 4.4.4.3 Timing related faults

Dynamic (at-speed) defect includes:

- Crosstalk
- Jitter
- Delay faults

- Performance problems
- Bit-error rates high

These defects have been known for years. But at board level there is still no, or limited, test instruments and no test coverage metrics.

At board level, digital worst-case simulation, as supported by LASAR, HILO or CATE, are no longer used. The two key benefits: timing problem identification and document fault simulation results are lost as a consequence.

Within the BASTION consortium, TESTONICA and ASTER are working together to address this issue. TESTONICA is developing a new test strategy to catch such faults. ASTER is defining coverage metrics and a test coverage report format.

#### 4.4.5 Test Line



Figure 26: BASTION survey - Test Line

Despite ICT test being considered expensive, due to the bed of nails constraints, it is still very popular. Unlike ICT, Functional Test does not rely on physical test access, test points, so is not impacted by the increasing board population density. Stress test (HALT/HASS) is not popular. It is under evaluation to explain the effects of aging in the causes of NFF. This point will be reviewed more in detail with "NFF known from Test data".

Boundary-Scan test is growing and becomes more and more popular with increasing amount of components being produced as JTAG compatible. Please refer to "Classical PCBA test techniques" for more details.



Figure 27: BASTION survey - The fault you know

The limited knowledge on fault distribution is increasing NFF opportunities due to high Escape Rates. When you don't know your defect spectrum, it is difficult to select the appropriate test strategy.

The analysis of the results will be a valuable source of information to subcategorize the defect categories and classes.

The comparison of the survey with the initial results of the QuadDPMO is also very promising.

# 4.5 Tools for DPMO analysis and root-cause analysis

#### 4.5.1 Data collection

A simple data mining system, based on traceability and repair data, is not enough to study the NFF phenomenon. As the survey shows, the NFF is at the junction of defects and test coverage.

According to this evidence, QuadDPMO has been organized around 3 sets of data:

- Traceability and repair data: defect occurrences
- CAD data: opportunities of defects
- Test coverage data: which defect is subject to be caught by each test station?

#### 4.5.2 Traceability & repair data

Improving product yields does not simply mean better product quality at a lower cost, but also, acquiring a deeper understanding of the whole manufacturing process by making full use of manufacturing information. This can lead to improved product reliability, prevention of re-occurring defects, improved future product designs and increased competitiveness. Processes must be monitored before they can be controlled; this often means collecting data from a number of incompatible sources.

With the TestWay tool development, ASTER got market recognition as the leading supplier for Test Coverage analysis. With the demonstration that Test Coverage and DPMO are strongly linked through the yield estimation model, knowledge of probable defects is as important as the knowledge of test coverage. ASTER has been also developing QUAD (QUalityADvisor) [37]. This is a flexible and modular software tool, built around a centralized and open architecture database, for providing traceability of any PCBA production data. It helps to accurately retrieve data and convert it into meaningful information that can be used to fine tune the product life cycle.



Figure 28: QUAD system overview.

The added value of QUAD is in the centralized database. This should be used on existing customer data to extract DPMO (Defect Per Million Opportunity) metrics from the traceability & repair database.

#### 4.5.3 QuadDPMO: True Defect Opportunities

Both DPPM and DPMO are used for determining the overall quality of the UUT produced in the sample quantity inspected. DPPM means Defective Parts Per Million. It is a measure of Throughput (how many bad parts come out at the end) DPMO means Defects Per Million Opportunities. It is a measure of Performance (how many times a mistake is made).

Six sigma experts use DPMO to find process sigma. DPPM levels are used commonly to monitor quality of UUT in production. DPPM determines the quality levels on the line. As DPMO is inversely proportional to the defect opportunities per unit, it can be manipulated by changing defect opportunities per unit. This is possible as defect opportunities per unit are defined by users before doing the DPMO calculation. For this reason, a link to NFFs at board-level is suspected.

Within the scope of the BASTION research program a new concept is in development, by ASTER, to analyze the NFF phenomenon via a software tool. The QuadDPMO algorithm application is under development to:

- Collect & understand in-real time the DPMO,
- Group defect labels and root causes by defect class,
- Compute long term, medium term and short term DPMO metrics,
- Investigation of common areas of occurrence for each defect class.



Figure 29: QuadDPMO synopsis.

QuadDPMO produces many reports where the test data can be analyzed by site, period of time, test station, lot ID, product ID. It gives access to detailed reports where defect family, defect code or defect label, are sorted by pin count, pitch, mounting technology, mounting side, JEDEC shape, manufacturer, component function, board complexity, board type... The chart (pie or bar graph) colors are automatically transferred to the layout or schematic views in order to verify that the defect occurrence is linked with a physical or logical location (functional block).

#### 4.5.4 DPMO extraction

QuadDPMO development is organized to support the following objectives:

- Define data collection methods around existing IPC standards.
  - o IPC-9261A In-Process DPMO and Estimated Yield.
  - IPC-7912A Calculation of DPMO and Manufacturing Indices for PCBAs.
- Define data stratification and classification methods.
- Combine the data into a single database:
  - o DPMO for Design defect issues.
  - o DPMO for Material (Part number).
  - o DPMO for Placement (Package type).
  - o DPMO for Soldering (Reflow & Wave).
  - o DPMO for Function
  - o DPMO for Aging
- Identify range and standard deviation for any DPMO.
- Compare actual yield to estimated yield:
  - o By test step.
  - o Full test line.
- Correlate of test coverage/strategy to DPMO rates.

## 4.5.5 Data preparation

In order to support BASTION research program, ASTER has developed contacts with industrial partners in order to get access to real traceability & repair data, CAD data, and test coverage report information.

CAD data and test coverage reports have been analyzed using TestWay test coverage tool from ASTER to compute the individual coverage per component/pin/test station in PCOLA/SOQ format.

Two QuadDPMO databases have been produced:

- 1. An unfiltered database, merging all partners' contributions even when Test coverage and CAD data are not available. This database could be used for defect universe analysis.
- 2. A filtered database including results linked exclusively with complete data sets (Traceability & repair, CAD and Test Coverage). Data anonymization has been applied on sensitive information which is protected by Non Disclosure Agreements.

#### 4.5.6 Intermittent faults

There is no test technique at board level, used for manufacturing, which specifically targets intermittent faults. Despite this, the QuadDPMO algorithm has highlighted an interesting case which has never before been published.

Security & mechanical devices are sensitive to assembly process. On these devices, some defects are detected during structural electrical tests (ICT, BST) but not all are revealed and occur again during the functional test.



Figure 30: Functional Test - Material defects by component

#### 4.5.7 Insufficient coverage

ASTER innovates with a test coverage matrix to classify defects occurrences against test coverage. It is a typical managerial report where in one page, all critical information are presented: "Convert data in information" as the information helps to take decisions

There are 4 cases:

- Faults which have been detected and are subject to be covered: This number shows the effective test coverage which is used. The difference between this number and total test coverage highlights the test coverage which has been produced but never used due to "no defect" occurrence. It opens the opportunity for cost savings with a test development driven by defect that really occurs.
- Faults which have been not detected and are not covered. It is the typical definition of Insufficient Coverage.
- Faults which are detected in contradiction with the coverage: the test coverage is clearly under estimated. The coverage report generated by the test equipment must update to reflect the true test efficiency.
- Faults are not detected and are subject to be covered: the test coverage is over estimated. This contributes to escape rate and NFF.



# **TestWay Coverage Matrix**

Category Live
Strategy Type ICT
Location
Reference



Figure 31: Innovative TestWay Coverage Matrix

Looking at a test line, the AOI is positioned first:



Figure 32: Defects caught by AOI – Match with expectations

When looking at electrical tests like ICT, BST or even functional test, a significant amount of manufacturing defects were still found.



**Figure 33:** Defects caught by Functional test –Assembly faults are slipping – Escape from AOI This problem is the same across the product life cycle for all defect categories. All test strategies, even if they catch most of the targeted defects, leave some defects which may or may not be detected by subsequent test steps. When it passes the last test, it is

shipped and the product appears as a FAILED product on the customer site. Escape rate is massive contributor to NFF.

## 4.5.8 Missing test method: Ageing

Stress test is implemented by a limited amount of industrial companies (about 20% - source BASTION survey). So all the defects subject to be revealed by stress test including ageing are not, and so will contribute to a high escape rate and an increasing NFF rate.

- Discrete analog component parameters tend to drift over time and can cause problems with sensitive designs. Integrated circuits can undergo electro migration. Furthermore, environmental effects: corrosion, vibration and temperature are of extreme concern. Transient stresses such as electrostatic discharge (ESD) and lightning can also cause failures.
- Environmental degradation contributes heavily to the burden of NFF.



Figure 34: Stress test defect distribution

If no stress test is applied to the boards, the boards will fail on customer's site contributing to the escape rate. Stress test must be implemented in most of the IC and board manufacturers and to significantly reduce the NFF rate.

# 4.5.9 Lack of communication between design and test

Statistical analysis helps to identify abnormal events.



Figure 35: AOI – Drill-down on misalignment defects

Figure 35 seems to indicate poor placement accuracy or a reflow process issue on tantal capacitor parts.



Bargraph indicates defect count by component

Figure 36: AOI – Drill-down on solder

Figure 36 seems to indicate poor placement accuracy or a reflow process issue on TSSOP parts.

For figures 25 and 26, after investigation, this tuned out not to be the case. The root cause was a bad definition of the shape and so the defect should be reclassified as a design defect instead of a manufacturing defect.

QuadDPMO detected that a lot of boards were falling at the functional test level. The problem was located in a band pass filter functional block. The root cause analysis demonstrated that each component tested by ICT were good and within tolerance, but the functional block failed due to the accumulation of the tolerances which had never be simulated during design phase. The problem should be classified as design defect instead of a manufacturing defect.

If we perform a capability analysis (Cp and Cpk) and obtain a Cpk close to "1" then this indicates that the process is not capable of producing product to the required specifications on a routine basis. The net result will be high level of variation between units of output product and intensive levels of inspection will be necessary to control the output, with excessive rework, repair and scrap rates.

 $C_p$ = Process Capability - a simple and straightforward indicator of process capability.

 $C_{\rm pk}$ = Process Capability Index - Adjustment of  $C_{\rm p}$  for the effect of non-centered distribution.

Cp index does not take into consideration the placement of process with respect to the given limits or the specification width while Cpk considers the centering of the process distribution. Industrials must have a  $C_{\rm pk}$  of 1.33 [4 sigma] or higher to satisfy end-customers quality expectations.

On a high speed digital board, QuadDPMO highlights another design problem: Insufficient margin to cope with clock jitter due to noise and crosstalk.

#### 4.6 Conclusions on board-level NFF faults

Our initial NFF hypothesis was based on the identified incomplete test coverage metrics at the board-level due to missing fault models for certain types of defects (primarily timing faults). As a consequence, certain types of deterministic test sets were also missing. We presented an extensive study of the state-of-the-art board-level test coverage metrics and industrial best practices, showing existing weaknesses and therefore, potential room for test escapes and product-level NFF. The BASTION survey and the test data analysis confirm this hypothesis.

This line of research will continue, by running experimental studies through fault injection and evaluating capabilities of existing board-level test approaches to detect target faults (primarily from the timing/performance domain).

A BASTION survey has been also prepared in order to adjust research directions in the area of NFF (see Appendix I). The answers have been collected and analyzed.

Observation is ongoing of the limitations in state-of-the-art industrial practices for traceability data analysis and DPMO tracking, to find space for improvement of NFF case analysis. The regular contacts with industrial partners (Airbus, Ericsson, and Schneider) have demonstrated an interest for the QuadDPMO initiative, launched through BASTION.

Combining the survey results and QuadDPMO experiments on different customer sites, we can conclude:

- Insufficient coverage/escape rate is the most important contributor to NFF.
- Ageing is the second contributor, as stress testing is not yet widely used by board assembly companies.
- Lack of communication between the design and manufacturing departments has a measurable influence on final product quality.
- Intermittent and timing faults are confirmed as NFF contributors, without any possibility to measure the proportions, due to lack of test technique or test coverage metrics.

These results will be further used in WP1 and WP4.

## **5 Conclusions**

An extensive study has been carried out on existing work of aging faults as well as No Faults Found at IC and board level. In addition, new work carried out collaboratively by UT, TUT, ULUND, ASTER and TL within the BASTION framework in both areas have been provided and are the foundation of further work on these subjects in other BASTION tasks.

With respect to aging study, we have enhanced long-term dependability of MP-SoCs using on-chip health monitors. The proposed approach addresses the design of aging monitors, and enables a new generation of very high dependability many-processors SoCs in safety-critical applications. It consists of different kind of aging monitors, which can be related to performance parameters of logic / processors. More specifically, an NBTI monitor has shown to have correlation with propagation delay in the aged processor. In addition, we designed an embedded instrument able to extract the change in threshold-voltage which is caused by aging effects. This instrument gives ability to detect the performance degradation of digital CMOS circuits due to aging effects.

To enable analysis of aging effects in large circuits, we have introduced a method of hierarchical modeling of dynamic NBTI aging. Experimental results show high scalability as well as good match with SPICE simulation results.

In summary, aging is really becoming a big issue in safety-critical integrated systems, and countermeasures should be taken.

With respect to IC-level NFF faults study, we presented a simulation model for a particular type of NFF, corresponding to intermittent resistive faults resulting from bad interconnections, as well as an environment supporting such a model. To have a precise model of intermittent fault, we emulated several loose connections and cold soldering / ball grids. We precisely measured the amount of resistance that loose connections or cold solder can induce on a given circuit. This model will form the first basis of simulations for small and later on complex digital systems.

Based on the simulation model, we studied behavior of transistor-level circuits under different fault conditions for digital and analogue systems. The simulation results show how intermittent faults affect performance of analogue as well as digital systems. It is shown that detection of intermittent fault in digital systems are more challenging than in analogue systems.

With respect to board-level NFF faults study, it has been clear that the NFF issue depends on many causes, of which most are not well understood; especially the stimulation of NFFs is problematic and remains majorly a statistics issue, which can however be somewhat influenced.

One of our initial NFF hypotheses developed in WP1 of BASTION was based on the identified incompleteness of test coverage metrics at the board-level due to missing fault models for certain types of defects (primarily timing faults) and, as a consequence, missing of certain types of deterministic test sets. This hypothesis has been confirmed among other results by an extensive industrial study based on a survey and actual traceability data extraction and analysis.

The industrial study was facilitated first by joint development of the questionnaire initiated by ASTER and secondly by developing QuadDPMO software. The modular

flexible software tool that is built around a centralized and open architecture database, for providing traceability of any PCB electronic production data helps to accurately retrieve data and convert it into meaningful information that can be used to fine tune the product life-cycle.

Improving product yields does not simply mean better product quality at a lower cost, but also, acquiring a deeper understanding of the whole manufacturing process by making full use of manufacturing information. This can lead to improved product reliability, prevention of re-occurring defects, improved future product designs and increased competitiveness. Processes must be monitored before they can be controlled, which often means collecting data from a number of incompatible sources.

After combining the survey results and QuadDPMO experiments on several customer sites, we made a few conclusions. The most important contributor to NFF was narrowed down to the insufficient coverage; whereas intermittent and timing faults were confirmed to be challenging issues due to lack of respective test technique or test coverage metrics. Ageing is the second contributor, as stress testing is not yet widely used by board assembly companies. Lack of communication between the design and manufacturing departments has a measurable influence on final product quality as well. There is no systematic strategy to test timing faults at the board level as well as to estimate the quality of test in terms of fault coverage.

# 6 References

- [1] A. Scorzoni, B. Neri, C. Caprile, and F. Fantini, "Electromigration in thin-film interconnection lines: models, methods and results," *Materials science reports*, vol. 7, pp. 143-220, 1991.
- [2] A. W. Strong, E. Y. Wu, R.-P. Vollertsen, J. Sune, G. La Rosa, T. D. Sullivan, et al., Reliability wearout mechanisms in advanced CMOS technologies vol. 12: John Wiley & Sons, 2009.
- [3] R. Entner, "Modeling and simulation of negative bias temperature instability," Ph.D. Dissertation, Vienna University of Technology, 2007.
- [4] T. Grasser, B. Kaczer, P. Hehenberger, W. Gos, R. O'Connor, H. Reisinger, *et al.*, "Simultaneous Extraction of Recoverable and Permanent Components Contributing to Bias-Temperature Instability," in *Proceedings of IEEE International Electron Devices Meeting (IEDM)*, 2007, pp. 801-804.
- [5] T. Nigam and E. B. Harris, "Lifetime Enhancement under High Frequency NBTI measured on Ring Oscillators," in *Proceedings of IEEE International Reliability Physics Symposium*, 2006, pp. 289-293.
- [6] H. Reisinger, T. Grasser, K. Hofmann, W. Gustin, and C. Schlunder, "The impact of recovery on BTI reliability assessments," in *Proceedings of IEEE International Integrated Reliability Workshop (IRW)*, 2010, pp. 12-16.
- [7] E. Takeda, C. Y.-W. Yang, and A. Miura-Hamada, *Hot-carrier effects in MOS devices*: Academic Press, 1995.
- [8] M. Song, K. P. MacWilliams, and J. C. Woo, "Comparison of NMOS and PMOS hot carrier effects from 300 to 77 K," *IEEE Transactions on Electron Devices*, vol. 44, pp. 268-276, 1997.
- [9] J. H. Stathis, "Physical and predictive models of ultrathin oxide reliability in CMOS devices and circuits," *IEEE Transactions on Device and Materials Reliability*, vol. 1, pp. 43-59, 2001.
- [10] S. Roy and D. Z. Pan, "Reliability Aware Gate Sizing Combating NBTI and Oxide Breakdown," in *International Conference on VLSI Design and International Conference on Embedded Systems*, 2014, pp. 38-43.
- [11] F. Oboril and M. B. Tahoori, "Aging-Aware Design of Microprocessor Instruction Pipelines," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 33, pp. 704-716, 2014.
- [12] S. Davidson, "Understanding NTF components from the field," in *Proceedings* of *IEEE International Test Conference (ITC)*, 2005, pp. 1-10.
- [13] P. Söderholm, "A system view of the no fault found (NFF) phenomenon," *Reliability Engineering & System Safety*, vol. 92, pp. 1-14, 2007.
- [14] S. Davidson, "Towards an understanding of no trouble found devices," in *Proceedings of IEEE VLSI Test Symposium (VTS)*, 2005, pp. 147-152.
- [15] C. Constantinescu, "Impact of Deep Submicron Technology on Dependability of VLSI Circuits," in *Proceedings of the International Conference on Dependable Systems and Networks (DSN)*, 2002, pp. 205-209.
- [16] P. M. Wells, K. Chakraborty, and G. S. Sohi, "Adapting to intermittent faults in multicore systems," *Acm Sigplan Notices*, vol. 43, pp. 255-264, 2008.
- [17] C. Constantinescu, "Intermittent faults and effects on reliability of integrated circuits," in *Proceedings of Reliability and Maintainability Symposium*, 2008, pp. 370-374.

- [18] B. A. S. C. S. Chambers and K. Anderson, "White Paper: The right stuff for aging electronics, intermittence, no fault found," 2010.
- [19] B. Steadman, F. Berghout, N. Olsen, and B. Sorensen, "Intermittent fault detection and isolation system," in *Proceedings of International Automatic Testing Conference (AUTOTESTCON)*, 2008, pp. 37-40.
- [20] E. J. Marinissen, "Testing TSV-based three-dimensional stacked ICs," in *Design, Automation & Test in Europe Conference & Exhibition (DATE)*, 2010, pp. 1689-1694.
- [21] C. Constantinescu, "Trends and Challenges in VLSI Circuit Reliability," *IEEE Micro*, vol. 23, pp. 14-19, 2003.
- [22] J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers, "The impact of technology scaling on lifetime reliability," in *Proceedings of International Conference on Dependable Systems and Networks (DSN)*, 2004, pp. 177-186.
- [23] L. V. Kirkland, T. Pombo, K. Nelson, and F. Berghout, "Avionics health management: searching for the prognostics grail," in *Proceedings of IEEE Aerospace Conference*, 2004, pp. 3448-3454.
- [24] B. Steadman, S. Sievert, B. Sorensen, and F. Berghout, "Attacking "bad actor" and "no fault found" electronic boxes," in *Proceedings of International Automatic Testing Conference (AUTOTESTCON)*, 2005, pp. 821-824.
- [25] I. James, D. Lumbard, I. Willis, and J. Goble, "Investigating no fault found in the aerospace industry," in *Reliability and Maintainability Symposium*, 2003, pp. 441-446.
- [26] C. Constantinescu, "Intermittent faults in VLSI circuits," in *Proceedings of the IEEE Workshop on Silicon Errors in Logic-System Effects*, 2007.
- [27] C. Constantinescu, "Impact of intermittent faults on nanocomputing devices," *Proceedings of the International Conference on Dependable Systems and Networks (DSN) (Supplemental Volume)*, pp. 238-241, 2007.
- [28] D. Gil-Tomás, J. Gracia-Morán, J. Baraza-Calvo, L.-J. Saiz-Adalid, and P.-J. Gil-Vicente, "Studying the effects of intermittent faults on a microcontroller," *Microelectronics Reliability*, vol. 52, pp. 2837-2846, 2012.
- [29] S. J. Pan, Y. Hu, and X. W. Li, "IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults," *IEEE Transactions on Very Large Scale Integration (VISI) Systems*, vol. 20, pp. 777-790, 2012.
- [30] J. Gracia-Moran, D. Gil-Tomas, L. Saiz-Adalid, J. Baraza-Calvo, and P. Gil-Vicente, "Defining a Representative and Low Cost Fault Model Set for Intermittent Faults in Microprocessor Buses," in *Latin-American Symposium on Dependable Computing (LADC)*, 2013, pp. 98-103.
- [31] J. Gracia-Moran, J. C. Baraza-Calvo, D. Gil-Tomas, L. J. Saiz-Adalid, and P. J. Gil-Vicente, "Effects of Intermittent Faults on the Reliability of a Reduced Instruction Set Computing (RISC) Microprocessor," *IEEE Transactions on Reliability*, vol. 63, pp. 144-153, 2014.
- [32] J. Wan and H. G. Kerkhoff, "The Influence of No Fault Found in Analogue CMOS Circuits," in *submitted to International Mixed-Signal Workshop*, 2014.
- [33] K. Hird, K. P. Parker, and B. Follis, "Test coverage: what does it mean when a board test passes?," in *Proceedings of IEEE International Test Conference (ITC)*, 2002, pp. 1066-1074.

- [34] W. Rijckaert and F. De Jong, "Board test coverage: the value of prediction and how to compare numbers," in *Proceedings of IEEE International Test Conference (ITC)*, 2003, pp. 190-199.
- [35] C. Lotz, "Board defect coverage analysis from design to production," *Design Automation & Test Exhibition*, 2004.
- [36] C. Lotz, P. Collins, and D. Wiatrowski, "Functional Board Test–coverage analysis what does it mean when a functional test passes?," presented at *the European Board Test Workshop*, 2006, pp. 1-8.
- [37] Y.M. Le Donnant, "QUAD-Electronic Assembly Repair & Quality Management System," Aster Technologies, 2008.
- [38] S. Davidson, "Towards an Understanding of No Trouble Found Devices," in *Proceedings of IEEE VLSI Test Symposium (VTS)*, 2005, pp. 147-152.
- [39] S. Davidson, "Understanding NTF components from the field," in *Proceedings* of *IEEE International Test Conference (ITC)*, 2005, pp. 1-10.
- [40] T.A. Dawn, K. Ayers and M. Pecht, "The "trouble not identified" phenomenon in automotive electronics," Microelectronics Reliability, 2002, pp. 641-651.
- [41] R. Garcia, "Rethink fault models for submicron-IC test," *Test&Measurement World*, 2001, pp. 35-46.
- [42] K.P. Parker, "Defect Coverage of Boundary/Scan Tests: What does it mean when a Boundary-Scan test passes?," in *Proceedings of IEEE International Test Conference (ITC)*, 2003, pp. 181-189.
- [43] H. Qi, S. Ganesan, and M. Pecht, "No-fault-found and intermittent failures in electronic products," Microelectronics Reliability, 2008, pp. 663-674.
- [44] M. Santoro, "New Methodologies for Eliminating No Trouble Found, No Fault Found and other Non Repeatable Failures in Depot Settings," in *IEEE AUTOTESTCON*, 2008.
- [45] D. Santos, B. Rajkumar, R. Lane et al., "Defect reduction in PCB contract manufacturing operations," in *International Conference on Computers and Industrial Engineering*, 1997.
- [46] R. Soukup, "Defect level prediction of printed circuit board assembly manufacturing based on DPMO metric," in *Spanish Conference on Electron Devices*, 2011.
- [47] R. Soukup, "Yield and defect level prediction of designed printed circuit board assembly based on DPMO metric," in *International Spring Seminar on Electronics Technology*, 2011.
- [48] A. Söderholm, "A system view of the No Fault Found (NFF) phenomenon," in *Reliability Engineering & System Safety*, 2007.
- [49] X. Zhang, H. Zhen and S. Liangxing, "Process Quality Metrics for Mechanical and Electrical Production," *Lin. Procedia Engineering*, 2011.
- [50] B. Davis, The Economics of Automatic Testing, 2nd ed. McGraw-Hill, 1994, 416 p.
- [51] M. J. Smith, "The Real Cost of Not Testing!," Nordic Test Forum 2010.
- [52] K.P. Parker, The Boundary-Scan Handbook, Kluwer Academic Publishers, Boston, MA, USA, 2003, 373 p.
- [53] IEEE Standard test access port and boundary-scan architecture, IEEE Std. 1149.1-2001 (R2008), IEEE 2002, 212 p.

- [54] EEE Standard test access port and boundary-scan architecture, IEEE Std. 1149.1-2013. Working group URL: http://grouper.ieee.org/groups/1149/1/
- [55] I. Aleksejev, FPGA-Based Embedded Virtual Iinstrumentation, TUT Press, 2013, 155 p.
- [56] A. L. Crouch and S. A. Hack, "How P1687 Enables FPGA-Controlled Test," *IEEE International Board Test Workshop (BTW)*, 2011.
- [57] H. Ehrenberg and T. Wenzel "Combining Boundary Scan and JTAG Emulation for advanced structural test and diagnostics," White Paper, GOEPEL electronics, 2009, 12 p.
- [58] A. Tsertov et al., "SoC and Board Modeling for Processor-Centric Board Testing," in *Proceeding Euromicro Conference on Digital System Design* (DSD), pp 575-582.
- [59] J.A. Moore, "Processor-Controlled Test Enhances EMC's Test Effectiveness," in *IEEE International Board Test Workshop (BTW)*, 2009.
- [60] I. Aleksejev, A. Jutman, S. Devadze, S. Odintsov and T. Wenzel, "FPGA-Based Synthetic Instrumentation for Board Test," in *Proceeding of International Test Conference*, 2012.
- [61] "How to test high-speed memory with non-intrusive embedded instruments," ASSET InterTech, Whitepaper, 2012.
- [62] "Embedded Instrumentation: Its Importance and Adoption in the Test & Measurement Marketplace," Frost & Sullivan, Whitepaper, 2010, 20 p.
- [63] A. Jutman, "Fighting No Failure Found by Testing Dynamic Faults at Board Level," presented at Emerging Test Strategies, IEEE European Test Symposium (ETS), 2014.
- [64] A.L. Crouch, "IJTAG: The path to organized instrument connectivity," in *Proceeding International Test Conference (ITC)*, 2007. pp. 1-10.
- [65] S. Davidson. "Towards an understanding of no trouble found devices," in *Proceeding of VLSI Test Symposium (VTS)*, 2005, pp. 147-152.
- [66] IEEE Standard for a Mixed-Signal Test Bus, IEEE Std. 1149.4-2010, IEEE 2011. Working group URL: http://grouper.ieee.org/groups/1149/4/
- [67] IEEE Standard for Boundary-Scan Testing of Advanced Digital Networks, IEEE Std. 1149.6-2003, IEEE 2013. Working group URL: http://grouper.ieee.org/groups/1149/6/
- [68] IEEE Standard for Reduced-pin and Enhanced-functionality Test Access Port and Boundary Scan Architecture, IEEE Std. 1149.7, 2009. Working group URL: http://grouper.ieee.org/groups/1149/7/
- [69] IEEE Standard for Boundary-Scan-Based Stimulus of Interconnections to Passive and/or Active Components, IEEE Std. 1149.8.1-2012, IEEE 2012. Working group URL: http://grouper.ieee.org/groups/1149/atoggle/
- [70] High Speed Test Access Port and On-chip Distribution Architecture, IEEE P1149.10, Working group URL: http://grouper.ieee.org/groups/1149/10/
- [71] IEEE Standard for Access and Control of Instrumentation Embedded within a Semiconductor Device, IEEE Std. 1687, IEEE 2014. Working group URL: http://grouper.ieee.org/groups/1687/
- [72] T. Taylor, "Functional Test Coverage Assessment Project," in *IEEE International Board Test Workshop (BTW)*, 2009.
- [73] C. Lotz, "LeanTest key: Test coverage analysis powered by traceability," in *IEEE 11th International Board Test Workshop (BTW)*, 2012.

- [74] W. Rijckaert and F. Jong, "Board Test Coverage The value of prediction and how to compare numbers", in *Proceeding International Test Conference* (*ITC*), 2003, pp. 190-199.
- [75] W. Feng et al., "Fault detection in a tristate system environment" in *IEEE Micro* vol. 21, no. 5, 2001, pp. 77-85.
- [76] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu and Y. Cao, "The Impact of NBTI Effect on Combinational Circuit: Modeling, Simulation, and Analysis," in *IEEE Transaction On VLSI*, vol. 18, no. 2, 2010, pp. 173-183.
- [77] H. Yi, T. Yoneda, and M. Inoue, "A Scan-Based On-Line Aging Monitoring Scheme," *Journal of Semiconductor Technology and Science*, vol. 14, pp. 124-130, 2014.
- [78] H. G. Kerkhoff and H. Ebrahimi, "Intermittent Resistive Fault in Digital CMOS Circuits", in *IEEE international Symposium on Design and Diagnostics of Electronic Circuits & Systems(DDECS)*, pp. 221-216, 2015 (Best paper).
- [79] A. Amouri and M. Tahoori, "A Low-Cost Sensor for Aging and Location for Late Transitions Detection in Modern FPGAs", in *international Conference on Field Programmable Logic and Applications*, pp. 329-335, 2011.
- [80] G. Zhang, M. Yi, Y. Miao, D. Xu, and H. Liang, "NBTI-induced circuit aging optimization by protectability-aware gate replacement technique," in *Latin-American Test Symposium (LATS)*, 2015, pp. 1-4.
- [81] M. Sadi, L. Winemberg, and M. Tehranipoor, "A robust digital sensor IP and sensor insertion flow for in-situ path timing slack monitoring in SoCs," in *IEEE VLSI Test Symposium (VTS)*, 2015, pp. 1-6.

# **APPENDIX I**

# The BASTION survey form

| Indu | stry sector (check all that applies).                                       |
|------|-----------------------------------------------------------------------------|
|      | Automotive                                                                  |
|      | Cell phone/tablet/ultra-book                                                |
|      | PCs: desktop/notebook                                                       |
|      | Medical electronics                                                         |
|      | Military/aerospace/Aeronautic                                               |
|      | Networking products: routers/switches                                       |
|      | Telecom                                                                     |
|      | E.M.S.                                                                      |
|      | Transport                                                                   |
|      | Servers/storage/data center products                                        |
|      | Consumer products (TV, home audio, game systems, DVRs, set-top boxes, etc.) |
|      | Industrial electronics                                                      |
|      | Test and measurement                                                        |
|      | Other (please specify)                                                      |

# **Board complexity**

Please complete the table, expressing the volume as a percentage of your production volume. Provide statistical information in terms of components, pins and nets, whichever applicable.

|           | Percent of total in production | Average number of components per board | Averages<br>number of nets<br>per board | Averages<br>number of pins<br>or solders per<br>board |
|-----------|--------------------------------|----------------------------------------|-----------------------------------------|-------------------------------------------------------|
| Digital   |                                |                                        |                                         |                                                       |
| Analog/RF |                                |                                        |                                         |                                                       |
| Mixed     |                                |                                        |                                         |                                                       |

BASTION Survey Appendix •1

| <i>i</i> es | t line (cneck all that applies).                          |
|-------------|-----------------------------------------------------------|
|             | Automated Optical Inspection – AOI Pre-reflow             |
|             | Automated Optical Inspection – AOI Post-reflow            |
|             | Human Optical Inspection – HOI                            |
|             | Automated X-ray Inspection – AXI                          |
|             | In-Circuit Test - ICT                                     |
|             | Flying Probe Test – FPT                                   |
|             | Boundary-Scan Test – BST                                  |
|             | Embedded Instruments                                      |
|             | Built In-System Test                                      |
|             | Functional Test – FT                                      |
|             | Power-On Self-Test – POST                                 |
|             | Health monitoring or fault management                     |
|             | Halt/Hass Test                                            |
|             | Bit-Error Rate Test – BERT                                |
|             | Other (please specify)                                    |
|             |                                                           |
|             | - (No Foilure Found) - What is your NFI                   |
|             | F (No Failure Found) - What is your NFI<br>that applies). |
|             | Return from customer under guarantee                      |
|             | An intermittent problem                                   |
|             | A PASS test                                               |
|             | A PASSED product which is FAILED at customer              |
|             | Others, Specify:                                          |
| Is N        | IFF problem important?                                    |
|             | Important in general                                      |
|             | Important for me (my product / production)                |
|             | Importance increasing                                     |
|             | Importance decreasing                                     |
|             | Not important                                             |

Appendix • 2 BASTION Survey

# The faults you don't know - Share your interpretation

In NFF, the fault origin is not known. But perhaps, you have an idea about the root cause, please share your views and fulfill, per category and/or defect class with percentage (%) or PPM/DMPO the lines/columns which are applicable.

| Category    | Defect Class                                       | Defect Opportunities |        |        |
|-------------|----------------------------------------------------|----------------------|--------|--------|
|             |                                                    | Digital              | Analog | Hybrid |
| Design      |                                                    |                      |        |        |
| Material /  | Bad Part                                           |                      |        |        |
| Component   | Electrically dead                                  |                      |        |        |
|             | Functionally bad                                   |                      |        |        |
|             | Tolerance defect                                   |                      |        |        |
|             | Other [Specify]                                    |                      |        |        |
| Placement   | Missing component                                  |                      |        |        |
|             | Wrong component                                    |                      |        |        |
|             | Misaligned component                               |                      |        |        |
|             | Tombstone                                          |                      |        |        |
|             | Inverted component                                 |                      |        |        |
|             | Other [Specify]                                    |                      |        |        |
| Solder /    | Bridge                                             |                      |        |        |
| Termination | Insufficient                                       |                      |        |        |
|             | Open                                               |                      |        |        |
|             | Excessive solder                                   |                      |        |        |
|             | Solder residue                                     |                      |        |        |
|             | Grainy solder                                      |                      |        |        |
|             | Lifted leads                                       |                      |        |        |
|             | Bent leads                                         |                      |        |        |
|             | Cold solder                                        |                      |        |        |
|             | Solder voids                                       |                      |        |        |
|             | Other [Specify]                                    |                      |        |        |
| Function    | Dynamic (at-speed) defect                          |                      |        |        |
|             | • Crosstalk                                        |                      |        |        |
|             | • Jitter                                           |                      |        |        |
|             | <ul><li>Delay faults</li><li>Performance</li></ul> |                      |        |        |
|             | problems                                           |                      |        |        |

BASTION Survey Appendix •3

|                    |                                                              | 1                | 1              |               |
|--------------------|--------------------------------------------------------------|------------------|----------------|---------------|
|                    | Bit-error rates high                                         |                  |                |               |
|                    | Bad features                                                 |                  |                |               |
|                    | Programming / software                                       |                  |                |               |
|                    | Other [Specify]                                              |                  |                |               |
| Product            | Intermittent resistive                                       |                  |                |               |
| lifetime           | Intermittent stuck-at                                        |                  |                |               |
|                    | Intermittent short                                           |                  |                |               |
|                    | Intermittent open                                            |                  |                |               |
|                    | Intermittent timing                                          |                  |                |               |
|                    | Ageing (wear-out) problems                                   |                  |                |               |
|                    | Other [Specify]                                              |                  |                |               |
| Other<br>[Specify] | Other [Specify]                                              |                  |                |               |
| _                  | e reason for NFF? – cient coverage per test cate             | _                | -              | ion           |
| □ Missing          | g test method/technique/eq                                   | uipment (existir | ng methods are | insufficient) |
| □ Missing          | g fault models (incomplete                                   | coverage metri   | cs)            |               |
| □ Intermi          | ittent faults                                                |                  |                |               |
| ☐ Ageing           | g (wear-out) problems                                        |                  |                |               |
| □ No test          | No test possible for certain faults                          |                  |                |               |
| □ Lack o           | Lack of communication between designer-manufacturer-customer |                  | er             |               |
| What are the       | ne most critical testa<br>/predict?                          | ability probl    | lems you       |               |
| Describe below     | r:                                                           |                  |                |               |

Appendix • 4 BASTION Survey

# The faults you know

Quality systems, with traceability, could you share accurate information about category/defect class?

| categor | y/defect class:                                  |
|---------|--------------------------------------------------|
| 0       | I don't know and I have not filled in the table. |

| I know partially. In this case I have specified a percentage (%) in the table. |
|--------------------------------------------------------------------------------|
| I know exactly. In this case I have specified PPM or DPMO in the table.        |

| Category    | Defect Class              | Defect Opportunities |        |        |
|-------------|---------------------------|----------------------|--------|--------|
|             |                           | Digital              | Analog | Hybrid |
| Design      |                           |                      |        |        |
| Material /  | Bad Part                  |                      |        |        |
| Component   | Electrically dead         |                      |        |        |
|             | Functionally bad          |                      |        |        |
|             | Tolerance defect          |                      |        |        |
|             | Other [Specify]           |                      |        |        |
|             |                           |                      |        |        |
| Placement   | Missing component         |                      |        |        |
|             | Wrong component           |                      |        |        |
|             | Misaligned component      |                      |        |        |
|             | Tombstone                 |                      |        |        |
|             | Inverted component        |                      |        |        |
|             | Other [Specify]           |                      |        |        |
|             |                           |                      |        |        |
| Solder /    | Bridge                    |                      |        |        |
| Termination | Insufficient              |                      |        |        |
|             | Open                      |                      |        |        |
|             | Excessive solder          |                      |        |        |
|             | Solder residue            |                      |        |        |
|             | Grainy solder             |                      |        |        |
|             | Lifted leads              |                      |        |        |
|             | Bent leads                |                      |        |        |
|             | Cold solder               |                      |        |        |
|             | Solder voids              |                      |        |        |
|             | Other [Specify]           |                      |        |        |
|             |                           |                      |        |        |
| Function    | Dynamic (at-speed) defect |                      |        |        |
|             | • Crosstalk               |                      |        |        |

**BASTION Survey** Appendix ●5

|                     | <ul> <li>Jitter</li> <li>Delay faults</li> <li>Performance problems</li> <li>Bit-error rates high</li> <li>Bad features</li> <li>Programming / software</li> <li>Other [Specify]</li> </ul> |  |
|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| Product<br>lifetime | Intermittent resistive                                                                                                                                                                      |  |
| memme               | Intermittent stuck-at                                                                                                                                                                       |  |
|                     | Intermittent short                                                                                                                                                                          |  |
|                     | Intermittent open                                                                                                                                                                           |  |
|                     | Intermittent timing                                                                                                                                                                         |  |
|                     | Ageing (wear-out) problems                                                                                                                                                                  |  |
|                     | Other [Specify]                                                                                                                                                                             |  |
|                     |                                                                                                                                                                                             |  |
| Other<br>[Specify]  | Other [Specify]                                                                                                                                                                             |  |
|                     |                                                                                                                                                                                             |  |

# Please provide information about yourself and your company. (Optional)

Please provide information about yourself and your company. (Optional)

Name:

Company:

Address 1:

Address 2:

City/Town:

State/Province:

ZIP/Postal Code:

Country:

Email Address:

Appendix ● 6 BASTION Survey

| Phone Number:                                 |                                      |
|-----------------------------------------------|--------------------------------------|
| Do you want to receive the results of survey? | <ul><li>□ Yes</li><li>□ No</li></ul> |

BASTION Survey Appendix •7