Glenn's Digital Garden
Search
Search
Explorer
nodes
Azure HBv4
Azure HBv5
Azure ND A100 v4
Azure ND GB200 v6
Azure ND H100 v5
Azure ND MI200 v4
Azure ND MI300X v5
BullSequana XH3406-3
BullSequana XH3515
Cray EX154n
Cray EX235a
Cray EX235n
Cray EX254n
Cray EX255a
Cray EX425
papers
A democratic vision for artificial intelligence must prevail over an authoritarian one.
Azure Accelerated Networking: SmartNICs in the Public Cloud
Big Tech Is Rushing to Find Clean Power to Fuel AI’s Insatiable Appetite
Carbon-Removal Firms Have One Very Big Backer. That’s a Problem
Datacenters to emit 3x more carbon dioxide because of genAI
FASST RFI
Machines of Loving Grace
Nuclear finance will rely on consumers’ stomach for risk
Recommendations on Powering Artificial Intelligence and Data Center Infrastructure
Revisiting Reliability in Large-Scale Machine Learning Research Clusters
The Intelligence Age
The National Security Case for Public AI - Vanderbilt Policy Accelerator
processors
AMD MI250X
AMD MI300A
AMD MI300X
AMD MI325X
AMD MI355X
Custom A100 GPUs
Custom H100 GPUs
Intel Ponte Vecchio
Microsoft Maia 100
NVIDIA A100
NVIDIA B200
NVIDIA GH200
NVIDIA Grace
NVIDIA H100
NVIDIA H200
Trainium2
systems
Alps (CSCS)
Aurora (ALCF)
Colossus (xAI)
Eagle (Microsoft)
El Capitan (LLNL)
Frontier (OLCF)
Fugaku (R-CCS)
FugakuNEXT (R-CCS)
Horizon (TACC)
Isambard-AI (Bristol)
JUPITER (JSC)
Leonardo (CINECA)
Meta's H100 clusters
Niagra
Perlmutter (NERSC)
Project Ceiba (AWS)
Project Rainier (AWS)
Red Storm (Sandia)
Vista (TACC)
tags
usa
NSF
U.S. Department of Energy
Europe
Papers
Reliability
seedling
storage
Azure SmartNICs
Dragonfly topology
Government's role in AI
InfiniBand
LLM training
LLM training at scale
LLM training datasets
Microsoft supercomputers
MTBF, FIT, and AFR
Nuclear energy
Scaling laws
Sustainability in HPC
Ultra Ethernet
AMD CDNA
availability
Azure infrastructure
blobfuse
Broadcom Tomahawk 5
BXI
cables and connectors
Canadian HPC
capex
checkpointing
China
Co-ops
Coarse-Grained Reconfigurable Array
combined cycle
component reliability
Cray EX
DAOS
Darshan
DeepSeek-R1
digital garden
discrete event simulation
distillation
Dragonfly+ topology
DRAM architecture
ECC schemes
elbencho
excursions
FASST
fio
foundation models for science
Franklin Farms
Frontier models
Google TPUv4
GPU terminology decoder ring
GPUaaS
GSP and SMC
High-Performance Linpack
IOR
Job Mean Time To Interrupt
job reliability
LCCF
LLM inferencing
LPDDR5 Reliability
Lustre
manufacturing level
mdtest
memory bandwidth
Meta Llama-3.1
Meta Movie Gen
Minipack2
mixture of experts
Model FLOPs Utilization
Multi-plane topologies
NAIRR
Network flow
Networking for LLM training
Neuromorphic computing
New Frontiers
NFS
NVIDIA GB200
Obsidian
OPT-175B
Palisades nuclear plant
PCIe Gen6
pod
Podman
Productivity tools
Read-it-later apps
Reasoning models
SC Conference
Signal modulation
Slingshot
Small Language Models
Social media platforms
Storage for LLM training
Structured sparsity
Superintelligence
Synthetic data
System architect
Tensor cores and Matrix cores
test-time compute
thermodynamics
Three Mile Island
VAST
wisdom
working at Microsoft
Home
❯
tags
❯
Tag: organization
Tag: organization
3 items with this tag.
Jan 25, 2025
AWS
organization
Jan 25, 2025
Meta
organization
Jan 25, 2025
Microsoft
organization