Found 42 total tags.

artificial-intelligence

Artificial intelligence (AI) encompasses a bunch of topics, but I use it as shorthand for generative artificial intelligence (Gen AI) which involves using transformers to generate stuff—text (including chat), images, videos, audio, and other media.

This page is a collection point for all the pages I’ve written that are pertinent to AI and the AI industry.

canada

Government-sponsored Canadian HPC is funded through ISED and, as of 2025, has two major components:

  1. Digital Research Alliance of Canada, a non-profit which deploys national digital research infrastructure
  2. Canadian Sovereign AI Compute Strategy, a $2 billion effort to develop and promote domestic human and capital to support AI.

As of September 2024, Canada is the only G7 nation without a Top 25 supercomputer, and Canada’s public supercomputers are not available for industrial uses.[^2]

Unlike the US national HPC efforts, Canada favors an arm’s-length nonprofit model of funding national cyberinfrastructure. Whereas DOE or NSF might sponsor FFRDCs or universities to deploy and manage HPC infrastructure, Canada sponsors non-profits (like DRAC) which are created and established explicitly to do this. Government Canada does not take as much of a hands-on approach in these nonprofits as DOE or NSF would for their investments.

Organizations and efforts

Government Canada maintains a list of organizations that have signed its Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems which is a good starting point of Canadian parties interested in AI.1

Digital Research Alliance of Canada

Formerly Compute Canada. See Digital Research Alliance of Canada.

Mitacs

Mitacs is a Canadian arm’s-length nonprofit that promotes partnerships between academia, industry, and government through technology. While not exclusively focused on HPC or AI, it sounds like they route government funding to industry to promote collaboration with public researchers.

Pan-Canadian AI compute Environment (PAICE)

Established in 2023, PAICE is an initiative led by a coalition of2

  • Digital Research Alliance of Canada
  • Canadian Institute for Advanced Research (CIFAR)
  • Canadian National AI Institutes (Amii, Mila, and Vector)
  • Regional organizations (Calcul Québec and Compute Ontario)
  • National host sites (Université Laval, University of Alberta and University of Toronto)

Like the Canadian sovereign AI efforts, it is funded by ISED, and is the organization that is the steward of three AI supercomputers:

NameLocationSpecifications
TamIAMila (Québec City)22x 4-way H100
KillarneyVector Institute (Toronto)168x 4-way L40 + 10x 8-way H100
TBAAmii (Edmonton)

Who is evaluating SCIP SOIs and proposals?

ISED is, but they have been receiving education on how other nations do these evaluations.

What is the balance of public/private partnership expected for this effort?

SCIP is similar; companies like Hypertec or Denvr cloud are unlikely to “prime” since the structure is likely to take one of a nonprofit with a large university-led component. Expect the SCIP systems to be large, on-prem systems.

Footnotes

  1. Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems

  2. Pan-Canadian AI Compute Environment (PAICE) | Digital Research Alliance of Canada

conference

4 items with this tag.

CPU

5 items with this tag.

europe

European HPC is divided into a couple notable segments:

  • Programs sponsored as part of the European Union
  • Programs sponsored by non-EU states including Switzerland (e.g., at CSCS) and the UK (e.g., Bristol)
  • Industry programs such as those run by major European energy companies (e.g., TotalEnergies and Equinor).

GPU

20 items with this tag. Showing first 10 tags.

node

17 items with this tag. Showing first 10 tags.

organization

3 items with this tag.

paper

Let’s try keeping notes of the interesting papers and articles I read in Obsidian so that I can reference them in my other notes.

person

6 items with this tag.

personal

3 items with this tag.

place

1 item with this tag.

reliability

In HPC, there are two ways to look at the reliability of a supercomputer.

Top-down reliability is where you start with what a full-scale system job would experience in practice and begin breaking that down. Top-down reliability is governed by metrics that characterize job reliability.

Bottoms-up reliability is where you start with individual components and build a reliability model by connecting those components in series and in parallel. Bottoms-up reliability is governed by metrics that characterize component reliability.

The statistics that govern how parts of a system affect the whole thing are described in MTBF.

review

3 items with this tag.

seedling

Seedlings are a tag I use to indicate that a page has not been tidied up. I likely just pasted in some notes while I was in the middle of something else and will come clean it up later.

Seedling pages do not show up in the Explorer sidebar, so if you stumble upon a page but can’t find a way to navigate back to it, it’s probably a seedling.

startup

1 item with this tag.

storage

I spent much of my career in HPC focused on storage and high-performance I/O. Specifically,

  • 2014-2015: When I worked at 10x Genomics, I managed the FAS NetApp filers to which all the company’s DNA sequencers output and all its downstream processing relied upon.
  • 2015-2022: When I joined NERSC, I was 50% funded on the TOKIO project to build tools to understand I/O performance through the full stack. I went on to help lead the Perlmutter storage subsystem design and develop NERSC’s ten-year storage strategy, dubbed Storage 2020
  • 2022-2024: I joined Microsoft as a product manager in Azure Storage, the unit of Azure responsible for Azure Blob, Azure Files, and all other first-party storage services.

It is a complicated and not-well-understood aspect of HPC, so the community of researchers and practitioners around HPC storage is small and dedicated. I stopped working on storage when I joined Microsoft Azure’s AI infrastructure team, where the biggest challenges arose from compute, not storage.

I don’t miss working on storage, as I felt I had caught up to the cutting edge of it and there wasn’t enough innovation to warrant my focusing on it as a full-time job. Being “the storage guy” also required that I pay attention to some uninteresting aspects of HPC storage such as data management, compression, and topics like that. However, I miss the community and the smart people who dedicated their careers to I/O.

13 items with this tag. Showing first 10 tags.

system/japan

2 items with this tag.

usa/doe

The U.S. Department of Energy is the largest sponsor of basic science research in the United States. It’s sometimes jokingly called the “Department of Everything” because of how broad its scope is.

usa/doe/ascr

ASCR (pronounced like the name “Oscar”) is the part of the U.S. Department of Energy dedicated to open-science research. By definition, ASCR does not touch nuclear weapons.

usa/doe/nnsa

NNSA is the part of the U.S. Department of Energy responsible for nuclear weapons.

usa/nsf

NSF is the National Science Foundation.

4 items with this tag.