These are raw notes I’ve taken over the few years I spent supporting commercial HPC customers.

Workflow

  1. Acquisition: Data is collected from a seismic survey (on a boat/vessel) by injecting sound into the ground using air guns and detecting whatever bounces back using arrays of receivers (hydrophones). One “shot” involves one firing of the seismic source followed by a couple seconds of listening.
  2. Data transfer: Data from the seismic survey is transferred to the datacenter via tape, USB drive, or some other method.
  3. Preprocessing: Shots are preprocessed in a trivially parallel fashion. This is where data is filtered and cleaned up. It is I/O intensive, but data is read sequentially. The output are seismic traces that have
  4. Velocity model building: Full waveform inversion (FWI) generates a refined velocity model based on the preprocessed seismic traces. This is very computationally expensive.
  5. Migration: Reverse time migration (RTM) is used to create spatial images based on the velocity model and the seismic traces. This is very computationally expensive, but not as much as velocity model building.

Seismic imaging

Seismic imaging aka seismic processing aka reverse time migration (RTM)

  • converting raw sound files into useful results that a geoscientist can look at and visually identify anomalies that imply the presence of oil or gas
  • 80-90% of the compute workload for oil & gas

Is famously data-intensive - compared to Reservoir modeling which is not.

  • 90% reads, 10% writes; reflects the significant data reduction happening
  • sequential access, which has historically mapped well to parallel file systems like Lustre

Becomes “easier” with more memory capacity.

Mostly not bulk-synchronous MPI - original seismic imaging app was MPI-based, but it was being used for task scheduling

Seismic imaging codes produce the images required to make investment decisions for drilling operations and capex investments for production facilities that drive the organization revenue streams.

  • Costs about ~100M USD per well for marine

  • Production capex marine platform investment of $7-10B USD which takes 7-10 years from investment to first return of revenue

  • Low-margin business so very cost-sensitive

  • One project typically takes about a year

  • Reprocessing can happen a few times over the 10-30 year lifetime of an oil field

    • Raw input data (from vessels using sonar) is highly valuable
    • Outputs are also valuable if cost to recompute is high
  • Input data can grow by 10x during workflow, but it doesn’t stay that high

  • After each phase of processing, smaller amount of data must be shared for QA/QC and interpreted via Windows

  • Compute requirements:

    • Used to fit multiple jobs on a single node and use node-local storage
    • Nowadays are using tightly-coupled MPI-style across a few nodes
  • 85%-90% of computing cycles

Reservoir modeling

Also known as “reservoir simulation”

Reservoir modeling by itself is not hugely data intensive within geosciences, though it may be compared to other HPC workloads.

Example applications include SLB Intersect or Nexus

It is

  • memory bandwidth-limited - they are stencil codes
  • not GPU-enabled
  • doesn’t need memory capacity, just memory bandwidth
  • only uses a subset of the data from Seismic imaging
  • fundamentally an ensemble of scenarios
    • hundreds of wells with many ways to extract oil which will affect flow to other wells
    • most O&G companies now do batch-style computing
    • resolution is often downsampled so much that it can be run on a single core
    • can run a huge, tightly-coupled (MPI), high-fidelity simulation like a full CFD, but it’s typically not worth the cost/complexity
  • 10-15% of compute is for reservoir modeling
  • the intermediate data here is not backed up since it can be reproduced at low computational cost