EDA

These are some raw notes I took while talking to someone about the end-to-end EDA workflow, specifically in the context of migrating EDA to cloud.

About EDA

The EDA workflow is broadly:

Generate RTL
Simulate RTL - logic-level or gate-level
Synthesis - turns RTL into gates
Place and route
Post-route simulation, static timing analysis, design rules check (DRC)
Tapeout
Proteus (run at the fab)

Frontend

everything prior to place and route
hard to move into the cloud due to crazy I/O requirements

Backend

is everything after place and route
mostly cloud already - data is easier to deal with

At fab

Everything after tapeout (aka “signoff”) happens at the fab (TSMC, etc)

send to the fab
they run Synopsys Proteus (mask synthesis)
CATS (separates out mask layers to inform mask creation)

History

EDA tools started in the 1970s

each tool came from different companies in the beginning
each tool would read data, then write data - this led to standardized formats
Synopsys, Cadence, Mentor started buying up all the companies
- they wanted to break standards, but customers freaked
- as a result, the file formats are the same and any Synopsys tool can be yanked and replaced with an equivalent tool from a competitor - file interfaces are the same

Started out as a workstation tool

give every element in a library its own directory, then put different types of data for a gate into that directory

Individual users became teams, and processing went from workstation into data center + NAS filer

when this happened, NetApp was the only NFS in town
NetApp grew with the environment and in many ways seems purpose built for EDA’s requirements
Isilon tried to break in during the mid-late 90s but failed - they couldn’t handle the metadata

Frontend

Simulation tools are often embarrassingly parallel

circuit simulation - for analog circuits - memory companies and analog design do most of this
logic simulation - more abstracted than low-level circuit simulation

Logic simulation

Logic simulation is simulating RTL (Synopsys VCS)

the worst offender of small files and has two phases: compile and run
relies on two corpuses of data: design data & library data
- design data - your RTL (I presume)
- library data - catalog of circuit components
  - e.g., can have high performance AND gate, low-power AND gate, high-drive AND gate
  - contain electrical characteristics, capacitance, delay, process information
  - each characteristic is wrapped in a single tiny file, and each circuit component is a directory

Compile time

input: your RTL design, input test bench, and libraries
builds an executable using gcc that runs the test bench against your design and libraries
during compile, huge spike in reads where tool will read all the bits of data across the directory tree
maybe have dozens or hundreds of processes traversing the library tree and compiling C code

Run time

Logic simulation executable (vsim - the VCS terminology) runs

does out-of-core computation - dumps data in little files (kilobytes each) to disk that it may or may not need later
causes write contention on filers
can put these mini cores into a local scratch since you don’t really need this data to be shared

Circuit simulation

Circuit simulation is SPICE

instead of looking at logic level, it looks at transistor-level design
useful for analog circuits - think DRAM, not CPUs
fewer workloads doing this
scale isn’t so big
a little better behaved than logic simulation, but can still stress storage system
- still traverses library
- fewer concurrent processes doing it and fewer elements per job

Library characterization are short jobs that run SPICE under the hood

Simulating an ARM CPU would be thousands of simulation jobs - a full sized CPU would be tens of thousands of jobs.

Place and route

Monolithic - relies on a single memory space

Backend

An ARM CPU would be high dozens to hundreds of jobs - a full CPU would be hundreds or thousands of jobs

Design Rules Check (DRC):

head node distributes data to worker nodes
parallel and most HPC-like
10-100s of worker nodes - starting to see some breach 1000 for 5nm and 3nm
datasets are larger, so fs stress not as bad

Typically monolithic tools

Performance of the node is critical since you can’t scale out - need high frequency CPUs, 32-2048 GB DRAM needed
Not friendly to spot priced VMs because backend are longer-running and bigger
may take 30 minutes to dump 2 TB of memory to disk before a spot shutdown (1.3 GB/s sustained)

Storage problems moving to the cloud

Growing discomfort with storage in EDA because the relative cost keeps going up

even 10 years ago, people complained that storage was becoming a majority cost component
compute became commoditized - buy the cheapest possible servers
but storage has remained boutique (NetApp)
- On-prem, cost of having a single expensive filer is sunk once, and then there’s no incurred cost over time
- in cloud, customers have a monthly reminder of the pain they’re experiencing so it is top of mind

Quality of on-prem HPC infrastructure solutions is generally not great

VCS systems are purpose-built to run VCS and nothing else
highly cost optimized for the one workload
- e.g., don’t need system security because it’s airgapped and only connected to the outside world via the file system
- don’t need a lot of maintenance or support or anything that isn’t just running workloads
Cloud provides flexibility, monitoring, reliability - compute as a service - which is value that is unappreciated by EDA despite the increased cost
means on-premise can still be cheaper than cloud, even at scale

Lustre worked for backend workloads as a cost savings measure in some scenarios

challenge: on-premise workflow has baked-in assumption that there is a single filer namespace connecting backend and frontend execution
- traditional version control software may not be capable of targeting multiple file namespaces for different stages of the EDA workflow
- if it can, a lot of user-interpreted outputs could go to a cheap-and-slow NFS because they don’t have the latency requirements imposed by machine-generated workloads
- just haven’t considered if version control software can handle multiple volumes - it’s just not been an issue for people who’ve done this on-prem

Some silicon designers may burst a single workload to the cloud

e.g., run only static analysis component of the workflow in a cloud
challenge: have to make a mirror of data in cloud somehow
- FlexCache (NetApp) which read-through cache
- may optionally warm cache from on-prem datasets

Glenn's Digital Garden

Explorer

EDA

About EDA

Frontend

Backend

At fab

History

Frontend

Logic simulation

Compile time

Run time

Circuit simulation

Place and route

Backend

Storage problems moving to the cloud

Graph View

Table of Contents