The silicon enigma: Bridging the gap between simulation and silicon
By Amitav Halder , Ankit Khandelwal, Deepak Mahajan (Freescale Semiconductor)
VLSI design teams are eagerly anticipating the full functional fab out Silicon to portray their months of hard work, on the other hand the Test teams are busy planning their functional coverage (to fill in the gaps of scan (atpg) patterns coverage holes) but more often than not, the unexpected happens and the teams are busy debugging the Si bring up for functional cases. This paper is trying to highlight the seemingly innocuous issues that occur on first few day of Si bring up and proactive steps that would help reduce these cycle.
Manufacturing is imperfect
When a design comes out in the form of silicon, it is expected to be of highest quality meaning that the yield is good and defects/faults are very less. Each part is sampled to ensure that everything is working fine and good to go to the customer. For these, many production tests are run on these samples for functional and scan coverage. These are tester patterns for production and run on each and every sample, so these take a huge amount of time and hence cost very dearly to the company. The test time reduction is one area where every organization is striving hard as it directly influences their gross margins. While stabilizing these patterns, which are meant to run for production, many challenges are faced which leads to many iterations and long debug time to rectify.
The paper takes reference of an automotive silicon which has (analog and digital components) and functional issue that limit the progress of Test program. However this paper is not dealing with any issues regarding voltages or tester channel assignments or layout issues , it also assumes that after a proper bring up and most of the structural fault checks ( scan / bist / boundary scan ) have gained sufficient coverage . This paper delves into the niche area of functional tester pattern debug on Silicon , which are required to augment the DFT scan tests as scan coverage holes often exist in the digital interface of non-scan blocks which include analog blocks and memories. Scan holes also exist in some of the pad interface logic.
AC IO characterization is also one of the requirements which as of today do not get addressed completely by DFT.
Architecture of the automotive part
Architecture of any automotive chip does have admixture of analog and digital components. Here on-die flash help meet the non volatile memory requirement , Phase lock loops help in generating high frequency clocks for faster operation . The Figure 1. below shows a typical architecture of an Automotive SoC .
Figure -1
An automotive SoC will contain few masters [M] sitting on the north of interconnect fabric. The south side of the fabric will house some slaves. The periphery of the chip will be covered by IO pads for external communications. The slave housed on the interconnect can be categorized as
- Digital IPs [D], mainly as communications IPs, counters, clock and reset control units, fault detection units etc
- Analog IPs [A] like Non volatile memories, Rams, Phase locked loop’s, oscillators etc.
Design for test ensures coverage of all Digital components with scan, but it is not practical to expect 100% coverage. Moreover in this kind of architecture, characterization of high speed communication interface is also required.
All these are catered through functional test patterns , where in you ensure that you are in one defined testmode of the chip and able to check and do CZ of critical interfaces . Functional test patterns require an entirely different setup from normal functional verification setup taking into account
- All controls through external pins/pads only
- Controllability and observability of different functional modes
- Running software based checks on test modes
- Run through the spec-ed range of voltages and frequency
Basic reset sequencing
Mixed analog and digital components necessitate different reset requirements. Resets are generally provided in phased manner to the different components of the SoC. For eg, after the Power On Reset , and early reset is provided to the digital co-components of the analog blocks as well as the few of the digital blocks, so as to compensate on the power up time of analog and to ensure that the chip gets powered up and initialize required logic in correct order. Like clocking IP’s get an early reset as clock is required for other state machines to proceed further. After that the reset to non-volatile memories like flash and other analog IP’s are lifted to ensure correct reset sequencing. The master cores and most of the SOG logic get a final reset.
Figure -2
This staggering of resets also pose a problem for functional testing as we can’t accurately predict the delay of analog blocks getting started , keeping in mind the fact that, the entire software is loaded externally through general input / output pads, keeping cycle accuracy with the tester clock
Clocking consideration
For tester, we need a clock from outside the design, to control. If an internal clock is used instead , it will lead to many uncertainties with respect to asyncronism, start-stop requirement etc . Therefore an external clock is required on which the logic should run so that the tester knows what is being driven or captured on what logic. One EXTAL clock input is provided in the system for the same but the extal clock also has frequency limitations and can only provide accurate clocks for the range of 40-50MHz. For running system at speed, generally PLL clock present in the design are used. We can also use a external LVDS clock instead of PLL which gives the following advantages:
- Control from tester to drive/stop the clock
- Removes any uncertainties associated with the PLL as PLL is an analog block and has many parameters which are undeterministic like its lock time.
Tester requirement
All the control programming for tester patterns is done through Test Control Unit (TCU). For running patterns on tester, generally all the settings is done in the beginning for a pattern and the same patterns is run again and again on different voltages and frequencies which is given a term shmoo. So, in this process the power on reset is given for the first time and then only the Non power on resets are given between the pattern runs. Now, if we have some bits in the TCU which are only POR resettable, these bits will be in their reset value for the first run and if their value is changed in the patterns, they will maintain that value in the consecutive runs. Because of this, the patterns will pass for the first run and will fail for the consecutive runs as these bits will not reach their reset state. This should be either taken care
- In the testbench side while generating the patterns that the values on these pins are masked out at appropriate times or by giving appropriate reset in the end of the test-case.
- Or in the tester by masking the comparison on the pads for the corresponding mismatches.
Like the problem statement above,there are many considerations for pattern development which if properly taken care can reduce the iterations and debug effort on tester significantly.
1. An issue was seen in which the patterns were running well when EXTAL was programmed <= 25 MHz and started failing on above frequencies. According to the spec, extal was to run at 40 MHz. Everything was passing on simulation, but JTAG output pin (TDO) started to fail on frequencies > 25 MHz on tester. The shmoo for the same can be seen below:
Figure -3
It was root caused to the fact that some security keys were written to test control unit but reset to test control unit was delayed reset. So on tester, while running on greater freq, the reset to test control unit was de-asserting in a normal manner but the writes to security keys in test control registers to unlock the test mode access was happening early. So, sufficient time was given in the sequence to have delayed reset of test control unit properly de-asserted and then writing the security keys to it. This resolved the issue and a passing shmoo on faster extal clock was observed. But still patterns were failing within a particular frequency range at all voltage levels.
Figure – 4
Further improvement was achieved after correctly programming the calibration parameter of PLL, which again was not giving any surprises in digital simulation , but greatly enhanced the deterministic behaviour of the PLL.
2. Downloading of the code happens through GPIO’s on Tester. The number of pads used for it varies across different NPI’s. Generally code is downloaded on a set of 16 pads but other schemes such as 8 bit download and 1 bit download should also be implemented. This becomes necessary in case one or more pads are failing on boundary scan or have a stuck at fault, then the code can be downloaded using other. Otherwise let’s say if we have only 16bit download implemented and 1 of the pad fails, then the device cannot be tested.
3. Core should be in stop / reset state in case it is not needed for test patterns. For this either the reset of the core is to be asserted for the whole time or the clock to the core can be kept gated. To gate the clock, control should be provided on design top on a GPIO or a register bit which can be controlled using jtag interface , so that the clock to the core can be controlled in a way desired.
4. Analog blocks such as PLL have some attributes such as Lock time for which a value is mentioned in the datasheet. This lock time is defined over a range and can vary on different runs of the same device on the same PVT conditions. Also, the samples are run over typical conditions which may change when moved to worst or best corners. It may so happen that the data parameters as coded in behavorial models are not accurate enough , which leads to difference in lock time between Silicon and simulation results and further a poor yield on tester. These type of data parameters should be corrected to have sufficient time to cater to analog characteristics , which also leads to Improvement of tester results . These types of parameters should also be checked in AMS simulations to have close to Silicon results.. An example of the same can be seen below. Figure 5 shows a shmoo plot when sufficient lock time was missing .
Figure – 5
Figure 6. below shows a final shmoo plot taking all the correction measures as mentioned in the paper
Figure - 6
Take Aways
- While in architecture definition phase take a good judgment on how much controllability and observability is required in the design .
- Analog block parameters needs to be correlated with AMS simulations / data sheet and used as an input in test pattern generation .
- Correlate design cyclization requirements with actual reset and clock sequence so as to enhance testability.
|
Related Articles
- Static timing analysis: bridging the gap between simulation and silicon
- Bridging the Gap Between Silicon and Software Validation
- Bridging the Gap Between IP Provider and Silicon Design Center
- Bridging the Gap between Pre-Silicon Verification and Post-Silicon Validation in Networking SoC designs
- Bridging the gap between speed and power in Asynchronous SRAMs
New Articles
Most Popular
E-mail This Article | Printer-Friendly Page |