SoC-Techno : February 2021

Monday, February 8, 2021

Special Cells placement in power gated implementations like AON / LS / ISO / RET cells

These cells (AON/LS/ISO/RET) are present in multi-power domain based designs.

LS/ISO cells are to be placed at the power domain crossings.
Placement of these cells can be done by the tool provided the secondary power routes are drawn at regular intervals through-out the design.
Bounds/Inst Groups for these cells are created in case of non availability of secondary power routes through-out the design.
AON cells are needed if the Always-On signal goes through the switchable region. If your design is congested and if you don’t want to pass these signals through the core logic then you can do the following:
- Create bounds for these cells around the edge of the switchable region for every x microns, here x is the max length that one AON cell can drive. This is if you don’t want these signals not to go through the core.
- Create islands for the AON power inside the switchable domain for every x microns, here x is the max length one buffer cell can drive. This is if you have congestion and area concerns.
Finalize the RET cells placement after one placement trail, this is if you are creating a bound for these cells.

PHYSICAL DESIGN

Physical design

It is a process of transforming the circuit level description into the physical layout.
In this layout, the position of the cells and routes for the interconnections between the cells are described by geometric representation. This geometric representation is called the Integrated Circuit layout.

Synthesis:

Translation of RTL (Register Transfer Level) to Gate Level Netlist for Targeted technology.
It mainly focuses on logical optimization
It assumes a square shape without any physical data with lesser density.
it assumes only macro placement, not power routing
the main advantage of the synthesis is the Best optimized Logic.
Synthesis with Physical Data
Floorplan: Die & Core area, Macro placement, Power Ground routing is given for accuracy and optimization.
It is accurate than synthesis as Floorplan input is also given to the synthesis team.
RTL database is given as input, after the floor plan stage in the form of DEF file.

PD ASIC Flow

It starts from DataBase preparation (Inputs), this is from vendor libraries.
This DataBase is given as an input to the Synthesis team.
The synthesis team requires IP libraries with DataBase.
After Physical Synthesis PnR (Placement and Routing ) will be done.
After the routing stage RC Extraction by the EDA database in form of SPEF file.
The next stage is the Static Timing Analysis (STA).
The final stage is the GDSII output file.
Physical verification like DRC, LVS..etc
Tape out in the Final Stage.

Common Questions:

Why DFF is used in Designs?---Race around condition is avoided in DFF.
What is meant by Banking?---Merging of Standard cells into one Standard cell of the same functionality. Pros: To save power and area. but it may cause congestion
Debanking?-- Splitting of Banked Standard cell into many of the same functionality.

Saturday, February 6, 2021

Input Files for Physical Design

Types of Files

.V / .vhd (RTL or Gate Level Netlist or VHDL )
.lib --( Library file)--Liberty Timing File
.db --Timing information--some library files are in .db format.
.lef --(Layout Exchange Format)
.tf --(Technology File)
.mw --(milky way )
.def --(Design Exchange Format)
.sdc --(Synopsys / Standard Design Constraints)
.tlu --(TLU+ Library)--Models help to compute R & C.
.itf --(Instance Toggling File)
upf --(Unified Power format)
.io --(input-output)
.map--mapping file
.spef --(Standard Parasitic Exchange Format)
.sdf --(Standard Delay Format)---used in synthesis.
.saif --(Switching Activity Interchange Format)
GDS--(Graphic Data Stream).
.tcl --(tool command language)

.v /.vhd file consists of logical information (Verilog and VHDL file Representation)
it is used as input to synthesis tool like Design Compiler or Genus..etc
The netlist file obtained after synthesis is used as input to Floorplan.
.vg file is used to differentiate gate-level netlist file

NETLIST:

It is the combination of sequential elements and their logical connectivity.

Netlist contains-

Input and output information of the design.

Wire information.

Cell and instance information.

Module information.

Hierarchy information.

Port information.

LIB file:

.lib File (library file)/Liberty Timing file coded in ASCII representation of timing power parameters for all the cells in the design for a particular technology node is described in this file

.lib contains all the standard cell information like cell transition, cell delay, setup & hold time requirements, electrical and functional characteristics of a cell

PVT conditions of the cells are also defined in the library.

These are supplied by Vendors or Foundry.

.db consists of Timing information

LEF(Library Exchange Format)--It has physical information about the design

Two types

Technology lef

Cell/macro lef

Technology lef --It contains a metal layer and via information like

Metal layer:
Direction
Pitch
Width
Area
Spacing table
Min enclosure area
Diag spacing
Diag min edge length
Resistance
Capacitance
Thickness
Antenna model and antenna area ratio
DC current density

VIA information:

Spacing

Width

Antenna model

Antenna area ratio

DC current Density

2. Cell/Macro lef

Class

Origin

Size

Symmetry

Pin:

Antenna gate area

Direction

Usage

Port

SDC (Synopsys design constraint)

Clock definition

Create_clock
Create_virtual clock
Create generated clock
Create clock uncertainty

External delays

Input delays

Output delays

DRV’s

Max transition, max capacitance, and max fanout

Timing path exceptions

False path
Multi-cycle path
Max delay
Min delay

.mw / .FRAM consists of Frame view or Physical view--It consists of Physical Information.
.cell view is the entire design view
.sdc consists of Timing constraints
.upf consists of power information
.tlu+ helps to compute R & C values

if .lib and .db files are changed entire standard cell information will be changed

Common questions

What is the input to the .lib file? -- Input transition & Output load.
What is the output to the .lib file?-- Output transition & cell delay.
Drive Strength? -- The Maximum capacitance load a device can drive. It is captured in the Library file as max_capacitance.
Fanout ?-- The number of gates it can drive

FLOORPLAN

Floor plan plays a critical role in the designing of a chip, Floorplan must be given utmost importance in designing a chip.

In Floor plan stage we will estimate

Die Area
Core Area
Utilization Ratio
The shape of the block
Creating Multiple Voltage domains & Gaurd bands in Multi-Voltage designs.
Flipping of First Site row if required.

In Floorplanning the cells are added for yield like

IO Buffers
Endcap cells
Decap cells
Tap cells
Tie cells are added during the optimization of power planning

Pin Placement:

Before starting with macro placement first, you should finalize the pin placement.
The following points should be considered at the time of pin placement:

Width and pitch of the pins should match with other blocks that are interfacing with your block.
Make sure your block pins are on routing tracks.
Make sure you leave 2 routing tracks on both the sides of the clock pins and pins which have NDR (Non-Default Rules) on their nets.
Place clock pins on higher layers.
Make sure none of the pins are overlapping.
Place IO buffers along with the pins if required.

Channel Estimation:

At Block Level:

The basic macro channel calculation formula is the following:

channel = (no. of pins to be routed / no. of layers available) x layer pitch

Apart from the above equation, we should consider the following additional points while estimating macro channels.

1. Channel should have at least one power and ground strap to power the cells within the channel.

2. NDR routes (Clock routes, Isolation controls, Power switch controls) in the channel have to be budgeted for additional space requirements.

3. Buffer count estimation in the channel for long route buffering and hold fixing.

While Considering at Full Chip Level:

The routing channel width is not just based on the two partitions but on all the signals that pass through or can pass through (if this is the shortest route) across the full chip and all the partitions.

Based on the floorplan and the data flow, a thorough estimate of the number of signals that can pass through the channel needs to be done

Out of these signals, an estimate on the approximate number of critical signals also need to be done. (if no data on this, we can have a pessimistic estimate of 30%).

Assuming additional spacing for the critical nets, the total channel width is estimated based on the number of signals * (width + (2)*spacing).
Considering double spacing for the critical nets.
Considering the horizontal or vertical channels, the corresponding metals need to be used for the estimation.
For each of these metals, we need to estimate how many tracks per width are used for power routing and hence estimate the width required for the signal routing. Additional 20-30% width needs to be provided as a buffer

Macro Placement GuideLines--01:

Identify macros that are in the same hierarchy.
Place the macros by taking a look at the sharing of inputs/outputs across them. Eg: Generally memories are cascaded to get more storage. To increase no of bits, memories are cascaded horizontally i.e. memories share the same address bits and to increase no of locations memories are cascaded vertically i.e. memories share the same data bits. Based on these you should stack the memories horizontally/vertically.
Place the macro pins facing the core if the connectivity to the standard cells is higher.
Group the Macros based on MBIST modules.
Stack the macros based on routing blockages of the macros.
If the technology is 14nm or below, take particular care of macro spacing. If you are abutting them typically there are no issues seen, but if you are providing space between them, make sure you are not having any Base related DRCs, mainly on the RX layer.

Macro Placement GuideLines--02:

Place the Macros around the boundaries of the block by checking the connection to IO pins.
Avoid placing Macros at the center of the block, If placed it causes IR Drop at the center.
Estimate the channel between macros and place accordingly.
Check the orientation of Macros and align them.
Pins of the Macros should face the core area, it ensures the non-overlapping connectivity of the standard cells.
Avoid Notches in Macro placement.
Estimate the Hotspots and apply blockages if required.
Place Decap cells around the Macros.
Avoid Criss-Cross connections.
Keep out Margin or Cell Padding for Macros with more number of pins.

Synthesis-- Summary

Synthesis

Synthesis is the process of converting RTL i.e. synthesizable Verilog code to technology-specific gate-level netlist. Netlist basically includes nets, sequential and combinational cells, and their connectivity information.

Goals of Synthesis -

To get a gate-level netlist
Inserting clock gates
Logic optimization
Inserting DFT logic
Logic equivalence between RTL and netlist should be maintained

Inputs for Synthesis Tool -

.tf- technology-related information.
.lib-timing info of standard cell & macros
.v- RTL code.
SDC- Timing constraints.
UPF- power the intent of the design.
Scan config- Scan related info like scan chain length, scan IO, which flops are to be considered in the scan chains.

For Physical aware synthesis -

RC co-efficient file (tluplus).
LEF/FRAM- abstract view of the cell.
Floorplan DEF- locations of IO ports and macros.

Synthesis Flow -
---------------------------

Analyze -
---------------
This step do syntax checking on RTL (Verilog) code

Elaborate -
-----------------
First, all lower-level blocks brought into synthesis tool.
Verilog code and arithmetic operators are converted into Gtech and DW components. These are technology-independent libraries.

Gtech- contains basic logic gates &flops.
DesignWare- contains complex cells like FIFO, counters.

Analyses the design hierarchy.
Removes empty switches and dead branches.
Detects asynchronous reset.
Converts decision trees to mux.
Converts synchronous to Dlatch/DFF.

Do FSM Pass:
Detects FSM logic and extracts the no of input, output bits and state bits.
Converts FSM logic to basic logic.

Do Memory Pass:
Merging DFF to memory write(memwr) and memory read (memrd)
Consolidating memwr/memrd cells
Generate memory (mem) cells
Mapping mem cells to basic logic

Import timing and power constraints -
--------------------------------------------------------------
Once the design is extracted in the form of technology-independent cells, timing constraints are imported from the SDC file.

If the design consists of multiple power domains, then using the UPF power domains, isolation cells, level shifters, power switches, retention flops are placed.

Clock gating -
----------------------
Due to the high switching activity of clock a lot of dynamic power is consumed. One of the techniques to lower the dynamic power is clock gating.
Clock gating is inserted in existing design for design power reduction.
(We will discuss clock gating in details in another topic)

Optimization -

Performs logic and design optimization.
Overall Optimization can be categorized as follows
Logic optimization & Design optimization.

Logic optimization:

Constant folding
Detect identical cells
Optimize mux (dead branches in mux)
Consolidate mux and reduce inputs (many to single)
Remove DFF with a constant value
Remove unused cells and wires

Design optimization:

Reduce TNS and WNS
Power Optimization
Area Optimization
Meet the timing DRV’s
Incremental clock gating

DFT insertion (Design for Testing) -

------------------------------------------------------
DFT circuits are used for testing each and every a node in the design.
More the numbers of nodes that can be tested with some targeted pattern, more is the coverage. Scan chains are inserted in scan-based designs.
(We will discuss details scan-based design in later topic)

Incremental Compile -
------------------------------------
Technology mapping of DFT circuit
Optimization of the design

Outputs of Synthesis -
------------------------------------

Netlist
SDC file
UPF file
Scan Def

How to Qualify Synthesis Results?
--------------------------------------------------------------

Synthesis results can be qualified based on the following points –
Check if the RTL and netlist are logically equivalent (LEC/FM).
Check if SDC and UPF are generated after synthesis and also check their completeness.
Check if there are any assign statements.
Combinational loops
Un-clocked registers
unconstrained IO’s
IO delay missing
Un-expandable clocks
Master-slave separation
multiple clocks
Checks related to design
Floating pins
multi driven inputs
un-driven inputs
un-driven outputs
normal cells in the clock path
pin direction mismatch
Check for don’t use cells

Electron Migration ( EM )

Electromigration (EM) is the movement of material that results from the transfer of momentum between electrons and metal atoms under the influence of an applied electric field. This momentum transfer causes the metal atoms to be displaced from their original positions.

This effect increases with increasing current density in a wire, and at higher temperatures the momentum transfer becomes more severe. Thus in sub-100nm designs, with higher device currents, narrower wires, and increasing on-die temperatures, the reliability of interconnects and their possible degradation from EM is a serious concern.

The transfer of metal ions over time from EM can lead to either narrowing or hillocks (bumps) in the wires.

Narrowing of the wire can result in degradation of performance, or in extreme cases can result in the complete opening of the conduction path as shown in the picture below.

Widening and bumps in the wire can result in shorts to neighboring wires, especially if they are routed at the minimum pitch in the newer technologies. Foundries typically specify the maximum amount of pitch in the newer technologies.

Broadly EM is classified as cell EM and Wire EM.

Cell EM

Cell EM rules address the EM caused by current within a cell. Cell EM rules operate on the principle that, although the currents within a cell cannot be calculated due to a lack of physical layout information, they can be controlled based on external physical entities. The tool estimates the detrimental effects of currents within a cell as a function of its,

Output load
Input slew
Switching frequency

Wire EM
There are two types of wire EM:

Signal EM – It is performed net by net, simulating the charging and discharging for all possible paths to determine the worst-case average and RMS current for each wire segment. Once currents are determined, the current density is computed.

Self-Heating: It is a physical design issue that takes place in the output nodes/interconnects of circuits that charge and discharge frequently, Leads to other problems caused by heating, like an increase in resistance of the interconnect and hence an increase in charging time of the node. Also, it causes thermal reliability issues.

Techniques to solve EM:

Increasing the metal width to reduce the current density is a typical solution
For a via EM violation, you can increase the number of vias to fix potential EM issues
Additional straps for the current supply
Layer switching is another option; typically, upper metal layers in the technology have higher current
driving capability (due to greater thickness)
Reduce the cell size driving the signal net if we have positive slack on that path

IR DROP

What is IR Drop?

IR drop is the voltage drop in the metal wires constituting the power grid before it reaches the power pins of the standard cells. It becomes very important to limit the IR drop as it affects the speed of the cells and the overall performance of the chip. There are two types of IR drops:

1.      Static
2.      Dynamic

Static IR Drop:  Static IR drop is an average voltage drop for the design. It is dependent on the RC of the power grid connecting the power supply to the respective standard cells. The average current depends totally on the time period. Gate-channel leakage current is the major reason for the static IR drop.

Vstatic_drop = Iavg x Rwire  [Iavg are all factors of leakage currents ]

Dynamic IR Drop:  Dynamic IR drop is a drop in the voltage due to the high switching activity of transistors. It happens when there is an increasing demand for current from the power supply due to switching activities of the chip. Dynamic IR drop depends on the switching time of the logic and is less dependent on the clock period. Dynamic IR drop evaluates the IR drop caused when a large number of circuitry switches at the same time, causing peak current demand. This current demand could be highly localized and could be brief within a single clock cycle (a few hundred ps), and could result in an IR drop that causes additional setup or hold-time violations. Typically, high IR drop impact on clock networks causes hold-time violations, while IR drop on data path signal nets causes setup-time violations. In such cases, you can separate the standard cells apart so that the burden on a given bump to feed many standard cells, which have high switching activity, can be mitigated.

Vdynamic_drop = L (di/dt) [current L is due to switching current]

Electromigration Mitigation

1. Apply NDR (Non-default Rule) on the violated nets (vulnerable nets)

Once you have the EM results, you can take the net shapes and re-route those nets with the NDR. Applying NDR involves routing of clock nets using double-wide or triple-wide metal with more spacing. This will quickly remove most of the violations and can even predict the nets, which are more likely to have EM violations based on two parameters: 1) driver strength and 2) load

You can filter out nets with more load and heavy drivers and move them to NDR.

You can decide the threshold load for different driving strength based on project statistics.

2. Restricting load target for nets

Restricting or reducing the load on the nets can also be helpful in preventing the occurrence of electromigration. For example, we saw 142fF as an average capacitance in the design. Based on the statistics of a few experiments, we restricted all nets to have a maximum 60fF of load. As a result, we saw a very good improvement in signal EM as well as on average net length.

IR Drop Mitigation

1. Padding clock cells: when it comes to IR drop issues, clock structure is the primary culprit for the power consumption of the chip due to high clock switching. However, with padding clock cell technique, clock buffers/inverters, and clock gate cells are given extra areas as keepout regions to avoid placement of standard cells and any excessive cell density around them. This helps to prevent the dynamic IR drop.

2 . Cell Padding/Decap insertion around cells within a dynamic IR hotspot region

Some cells with high driving strength create dynamic IR drop issues. You can give cell padding to these cells or insert decap cells around it or IR hotspot region to prevent IR drop issues.

GUIDELINES FOR PLACING ANALOG BLOCK AT FLOOR PLAN STAGE

Analog blocks are quite sensitive to noise and below precautions need to be taken

Sufficient isolation is to be provided for the Analog layout from the digital logic. This includes both the IO and core logic.

Normally, the IO power for the Analog block needs to be isolated. Even if the power source is the same, it is important to isolate the power connections to the analog block IOs from the other IOs.

They need to be isolated at the package source. For very critical blocks, it is needed to isolate even the source and the ground completely. Providing an independent power source may not always be possible and need to be looked at from the architecture side.

Providing an isolated ground can be even more difficult to implement at a system level. It is better to have a separate mesh/connection from the source of power/ground itself

The power for the Analog IO interface and a block needs to be separately estimated and accordingly the number of power and ground IOs/bump pads need to be estimated. Since the Analog IOs are sensitive to noise, it is recommended to have a power/ground pad alternatively after each signal IO

The ESD rail is generally common across all the IOs. However, in rare cases, there may be a need to have a separate ESD rail for the analog IO interface (for very critical and sensitive IOs).

Even the analog block may have digital logic inside and only part of it may be true analog logic and custom layout. It helps to have this information so we can apply the isolation accordingly.

There may be core voltage being used in the analog block but it may be for digital logic or analog logic within the block. If it is for analog logic, the supply to it needs to be isolated from the core mesh (used by the core digital logic).

One power source may be placed close to the analog block and from there, the power routed to the power pins of the block and not shared with other logic. This power routing needs to be of sufficient width (based on the power requirement and the distance from the source) and with 3-4 additional spacing.

The analog logic area of the block needs to be isolated from the digital logic placement area. The isolation area can be filled with decaps. For an extended width from the isolation, care to be taken to not use high drive cells

All the power pins of the block are to be connected with the appropriate width and at least the width of the power pin

Congestion and Techniques to avoid it !!!

Congestion is a scenario in which the number of routable nets in a particular region of the design is higher than the resources available in that region. This could be because of the following reasons:

Higher standard cell utilization in certain pockets of the design
Clustering of Higher pin-count cells
Wrong module placement resulting in the crisscrossing of routes.
Higher routes in certain regions of the design
Limited Metal stack used in the design

Ways to avoid congestion

Don’t allow flops to be placed in channels. Flops in channels can result in huge congestion inside the channels. In the case of timing criticality, flops can be placed within channels with percentage blockages to restrict the usage of the channel.
Visually check the macro channels after routing. You might not be seeing any congestion at placement and at routing, but you should check these channels visually and keep the margin for future routes.
Always keep a tab on cell and route congestion. Maintain fewer bins with overflow.
You can also provide max density constraints for placement; you can make it tighter for better congestion optimization.
Make sure to check the placement of the cells which are small in size and having more pins like AOI/OAI cells. Try to give padding for these cells for better congestion/hold buffering.
Make sure to check the placement of the sequential cells. Have sufficient gaps for these cells for hold buffering. Try to give padding for the sequential cells also for hold buffering.
If you see cells clustered in certain pockets of the design, then try to spread them by creating small placement blockages in checkerboard fashion or some other fashion in that region.
In deep sub-micron technologies, lower layers are manufactured in multiple masks. In these technologies fixing DP violations become challenging. To prevent these loop violations in the early stage by controlling the utilization of the lower layers. For ex use 60% of M2/M3 layers for routing.

Redistribution Layer (RDL)

Redistribution Layer (RDL)

The redistribution layer (RDL) is the interface between chip and package for flip-chip assembly (shown in fig below). An RDL is an extra metal layer consisting of wiring on top of core metals that makes the I/O pads of the die available for bonding out other locations such as bump pads. Bumps are usually placed in a grid pattern and each one is molded with two pads (one on the top and one on the bottom) that are then attached to the RDL and package substrate respectively

Engineers use a redistribution layer (RDL) in flip-chip designs to redistribute I/O pads to bump pads without changing the I/O pad placement. The RDL, therefore, serves as the layer connecting I/O pads and bump pads. However, traditional routing capacity may be insufficient to handle sizable designs, in which the RDL may be very congested and especially when there is a less-than-optimal I/O-bump assignment. As a result, routing may not be completed within a single layer even with manual routing.

As demand for more input/output (I/O) increases, traditional wire-bond packaging may not effectively support thousands of I/Os. Flip-chip assembly is commonly used in place of wire bond because it reduces chip area while supporting many more I/Os. It also greatly reduces inductance, allows high-speed signals, and possess better heat conductivity properties. The flip-chip ball grid array (FCBGA) is also growing in popularity as an alternative methodology for high I/O count chips.

SoC-Techno

Pages