These are some raw notes I took while talking to someone about the end-to-end EDA workflow, specifically in the context of migrating EDA to cloud.

About EDA

The EDA workflow is broadly:

  1. Generate RTL
  2. Simulate RTL - logic-level or gate-level
  3. Synthesis - turns RTL into gates
  4. Place and route
  5. Post-route simulation, static timing analysis, design rules check (DRC)
  6. Tapeout
  7. Proteus (run at the fab)

Frontend

  • everything prior to place and route
  • hard to move into the cloud due to crazy I/O requirements

Backend

  • is everything after place and route
  • mostly cloud already - data is easier to deal with

At fab

Everything after tapeout (aka “signoff”) happens at the fab (TSMC, etc)

  1. send to the fab
  2. they run Synopsys Proteus (mask synthesis)
  3. CATS (separates out mask layers to inform mask creation)

History

EDA tools started in the 1970s

  • each tool came from different companies in the beginning
  • each tool would read data, then write data - this led to standardized formats
  • Synopsys, Cadence, Mentor started buying up all the companies
    • they wanted to break standards, but customers freaked
    • as a result, the file formats are the same and any Synopsys tool can be yanked and replaced with an equivalent tool from a competitor - file interfaces are the same

Started out as a workstation tool

  • give every element in a library its own directory, then put different types of data for a gate into that directory

Individual users became teams, and processing went from workstation into data center + NAS filer

  • when this happened, NetApp was the only NFS in town
  • NetApp grew with the environment and in many ways seems purpose built for EDA’s requirements
  • Isilon tried to break in during the mid-late 90s but failed - they couldn’t handle the metadata

Frontend

Simulation tools are often embarrassingly parallel

  • circuit simulation - for analog circuits - memory companies and analog design do most of this
  • logic simulation - more abstracted than low-level circuit simulation

Logic simulation

Logic simulation is simulating RTL (Synopsys VCS)

  • the worst offender of small files and has two phases: compile and run
  • relies on two corpuses of data: design data & library data
    • design data - your RTL (I presume)
    • library data - catalog of circuit components
      • e.g., can have high performance AND gate, low-power AND gate, high-drive AND gate
      • contain electrical characteristics, capacitance, delay, process information
      • each characteristic is wrapped in a single tiny file, and each circuit component is a directory

Compile time

  • input: your RTL design, input test bench, and libraries
  • builds an executable using gcc that runs the test bench against your design and libraries
  • during compile, huge spike in reads where tool will read all the bits of data across the directory tree
  • maybe have dozens or hundreds of processes traversing the library tree and compiling C code

Run time

Logic simulation executable (vsim - the VCS terminology) runs

  • does out-of-core computation - dumps data in little files (kilobytes each) to disk that it may or may not need later
  • causes write contention on filers
  • can put these mini cores into a local scratch since you don’t really need this data to be shared

Circuit simulation

Circuit simulation is SPICE

  • instead of looking at logic level, it looks at transistor-level design
  • useful for analog circuits - think DRAM, not CPUs
  • fewer workloads doing this
  • scale isn’t so big
  • a little better behaved than logic simulation, but can still stress storage system
    • still traverses library
    • fewer concurrent processes doing it and fewer elements per job

Library characterization are short jobs that run SPICE under the hood

Simulating an ARM CPU would be thousands of simulation jobs - a full sized CPU would be tens of thousands of jobs.

Place and route

Monolithic - relies on a single memory space

Backend

An ARM CPU would be high dozens to hundreds of jobs - a full CPU would be hundreds or thousands of jobs

Design Rules Check (DRC):

  • head node distributes data to worker nodes
  • parallel and most HPC-like
  • 10-100s of worker nodes - starting to see some breach 1000 for 5nm and 3nm
  • datasets are larger, so fs stress not as bad

Typically monolithic tools

  • Performance of the node is critical since you can’t scale out - need high frequency CPUs, 32-2048 GB DRAM needed
  • Not friendly to spot priced VMs because backend are longer-running and bigger
  • may take 30 minutes to dump 2 TB of memory to disk before a spot shutdown (1.3 GB/s sustained)

Storage problems moving to the cloud

Growing discomfort with storage in EDA because the relative cost keeps going up

  • even 10 years ago, people complained that storage was becoming a majority cost component
  • compute became commoditized - buy the cheapest possible servers
  • but storage has remained boutique (NetApp)
    • On-prem, cost of having a single expensive filer is sunk once, and then there’s no incurred cost over time
    • in cloud, customers have a monthly reminder of the pain they’re experiencing so it is top of mind

Quality of on-prem HPC infrastructure solutions is generally not great

  • VCS systems are purpose-built to run VCS and nothing else
  • highly cost optimized for the one workload
    • e.g., don’t need system security because it’s airgapped and only connected to the outside world via the file system
    • don’t need a lot of maintenance or support or anything that isn’t just running workloads
  • Cloud provides flexibility, monitoring, reliability - compute as a service - which is value that is unappreciated by EDA despite the increased cost
  • means on-premise can still be cheaper than cloud, even at scale

Lustre worked for backend workloads as a cost savings measure in some scenarios

  • challenge: on-premise workflow has baked-in assumption that there is a single filer namespace connecting backend and frontend execution
    • traditional version control software may not be capable of targeting multiple file namespaces for different stages of the EDA workflow
    • if it can, a lot of user-interpreted outputs could go to a cheap-and-slow NFS because they don’t have the latency requirements imposed by machine-generated workloads
    • just haven’t considered if version control software can handle multiple volumes - it’s just not been an issue for people who’ve done this on-prem

Some silicon designers may burst a single workload to the cloud

  • e.g., run only static analysis component of the workflow in a cloud
  • challenge: have to make a mirror of data in cloud somehow
    • FlexCache (NetApp) which read-through cache
    • may optionally warm cache from on-prem datasets