These are raw notes I’ve taken over the few years I spent supporting commercial HPC customers.

Seismic imaging

Seismic imaging aka seismic processing aka reverse time migration (RTM)

  • converting raw sound files into useful results that a geoscientist can look at and visually identify anomalies that imply the presence of oil or gas
  • 80-90% of the compute workload for oil & gas

Is famously data-intensive - compared to Reservoir modeling which is not.

Becomes “easier” with more memory capacity.

Mostly not bulk-synchronous MPI - original seismic imaging app was MPI-based, but it was being used for task scheduling

Seismic imaging codes produce the images required to make investment decisions for drilling operations and capex investments for production facilities that drive the organization revenue streams.

  • Costs about ~100M USD per well for marine

  • Production capex marine platform investment of $7-10B USD which takes 7-10 years from investment to first return of revenue

  • Low-margin business so very cost-sensitive

  • One project typically takes about a year

  • Reprocessing can happen a few times over the 10-30 year lifetime of an oil field

    • Raw input data (from vessels using sonar) is highly valuable
    • Outputs are also valuable if cost to recompute is high
  • Input data can grow by 10x during workflow, but it doesn’t stay that high

  • After each phase of processing, smaller amount of data must be shared for QA/QC and interpreted via Windows

  • Compute requirements:

    • Used to fit multiple jobs on a single node and use node-local storage
    • Nowadays are using tightly-coupled MPI-style across a few nodes
  • 85%-90% of computing cycles

Reservoir modeling

Also known as “reservoir simulation”

Reservoir modeling by itself is not hugely data intensive within geosciences, though it may be compared to other HPC workloads.

Example applications include SLB Intersect or Nexus

It is

  • memory bandwidth-limited - they are stencil codes
  • not GPU-enabled
  • doesn’t need memory capacity, just memory bandwidth
  • only uses a subset of the data from Seismic imaging
  • fundamentally an ensemble of scenarios
    • hundreds of wells with many ways to extract oil which will affect flow to other wells
    • most O&G companies now do batch-style computing
    • resolution is often downsampled so much that it can be run on a single core
    • can run a huge, tightly-coupled (MPI), high-fidelity simulation like a full CFD, but it’s typically not worth the cost/complexity
  • 10-15% of compute is for reservoir modeling
  • the intermediate data here is not backed up since it can be reproduced at low computational cost