I'm still figuring out techniques and best practices for using Azure coming from a background in HPC. A lot of the documentation takes a Windows/Powershell/GUI-first approach which does not compute for me, so I'm taking notes here so I don't lose track of how to do common tasks.
So far I'm converging on using a combination of Azure CLI and Bicep as the foundational tools for standing up infrastructure in Azure. I'm not sure this is the best way to start for people who are new to both Azure itself and infrastructure-as-code since the declarative approach to IaC adds extra verbosity and syntactic complexity to operations, so I think starting with the imperative Azure CLI approach is a good place to start.
Getting instance metadata
Every VM can access the Azure Instance Metadata Service (IMDS) which is a magic REST endpoint that allows you to inspect properties of a VM from inside that VM. For example, this is how you can get the managed identity of a VM from within that VM:
curl 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fmanagement.azure.com%2F' -H Metadata:true -s
where 169.254.169.254
is the magical REST endpoint for the IMDS.
Getting physical server info
In addition to the Instance Metadata Service which lets you learn about the VM in which you are running, you can also query the hypervisor for information about the bare metal server on which you are running. This relies on a service referred to as KVP:
cat /var/lib/hyperv/.kvp_pool_3 | tr -s '\000' '\n'
HostName
CO2AA1050611004
HostingSystemEditionId
168
HostingSystemNestedLevel
0
HostingSystemOsMajor
10
HostingSystemOsMinor
0
HostingSystemProcessorArchitecture
9
HostingSystemProcessorIdleStateMax
0
HostingSystemProcessorThrottleMax
100
HostingSystemProcessorThrottleMin
100
HostingSystemSpMajor
0
HostingSystemSpMinor
0
PhysicalHostName
CO2AA1050611004
PhysicalHostNameFullyQualified
CO2AA1050611004
VirtualMachineDynamicMemoryBalancingEnabled
0
VirtualMachineId
670093CC-C644-4167-8076-E9678D22C459
VirtualMachineName
902d6ba5-9ac4-4223-a8d9-1c7ca05a9a2c
It is fully documented on Azure's KVP/Hyper-V Data Exchange page.
VM SKU / hypervisor conflicts
I tried this:
$ az deployment group create --resource-group glock-rg \
--template-file template.bicep \
--parameters @parameters.json
The selected VM size 'Standard_A1_v2' cannot boot Hypervisor Generation '2'. If
this was a Create operation please check that the Hypervisor Generation of the
Image matches the Hypervisor Generation of the selected VM Size. If this was
an Update operation please select a Hypervisor Generation '2' VM Size.
It turns out this is because my properties.storageProfile.imageReference.sku
was 18_04-lts-gen2
- the gen2
was the incompatible part. To find all the
compatible VMs:
$ az vm image list-skus --location westcentralus --publisher Canonical --offer UbuntuServer --output table
Location Name
------------- --------------------
westcentralus 12.04.5-LTS
westcentralus 14.04.0-LTS
westcentralus 14.04.1-LTS
westcentralus 14.04.2-LTS
...
See this page for the full documentation.
Query filtering
Figuring out what you can do in a subscription often involves running a bunch of
az XYZ list -o table
commands and poring through the results. For example, you
may wish to look up how much quota you have for H-series VMs:
az vm list-usage --location "East US" -o table | grep 'Standard H'
But a better way to do this may be to filter the query as such:
az vm list-usage --location "East US" --query '[?contains(name.localizedValue, `Standard H`)]' -o table
This works because the az vm list-usage
command returns a list of dicts of
the form
[
{
"currentValue": "0",
"limit": "8",
"localName": "Standard H Family vCPUs",
"name": {
"localizedValue": "Standard H Family vCPUs",
"value": "standardHFamily"
}
},
...
]
Searching for capacity
Specialized VM types (like those used in HPC) are not available in all regions, so requesting quota for them requires knowing which region to target. I haven't figured out the API for this yet, but there is a website that lets you search for service availability by region. To search for NDv2 (8-way NVIDIA V100 instances), you can check out this example.