Oil & gas

These are raw notes I’ve taken over the few years I spent supporting commercial HPC customers.

Workflow

Acquisition: Data is collected from a seismic survey (on a boat/vessel) by injecting sound into the ground using air guns and detecting whatever bounces back using arrays of receivers (hydrophones). One “shot” involves one firing of the seismic source followed by a couple seconds of listening.
Data transfer: Data from the seismic survey is transferred to the datacenter via tape, USB drive, or some other method.
Preprocessing: Shots are preprocessed in a trivially parallel fashion. This is where data is filtered and cleaned up. It is I/O intensive, but data is read sequentially. The output are seismic traces that have
Velocity model building: Full waveform inversion (FWI) generates a refined velocity model based on the preprocessed seismic traces. This is very computationally expensive.
Migration: Reverse time migration (RTM) is used to create spatial images based on the velocity model and the seismic traces. This is very computationally expensive, but not as much as velocity model building.

Seismic imaging

Seismic imaging aka seismic processing aka reverse time migration (RTM)

converting raw sound files into useful results that a geoscientist can look at and visually identify anomalies that imply the presence of oil or gas
80-90% of the compute workload for oil & gas

Is famously data-intensive - compared to Reservoir modeling which is not.

90% reads, 10% writes; reflects the significant data reduction happening
sequential access, which has historically mapped well to parallel file systems like Lustre

Becomes “easier” with more memory capacity.

Mostly not bulk-synchronous MPI - original seismic imaging app was MPI-based, but it was being used for task scheduling

Seismic imaging codes produce the images required to make investment decisions for drilling operations and capex investments for production facilities that drive the organization revenue streams.

Costs about ~ $8 M U S D p er w e ll f or l an d,$ 100M USD per well for marine
Production capex marine platform investment of $7-10B USD which takes 7-10 years from investment to first return of revenue
Low-margin business so very cost-sensitive
One project typically takes about a year
Reprocessing can happen a few times over the 10-30 year lifetime of an oil field
- Raw input data (from vessels using sonar) is highly valuable
- Outputs are also valuable if cost to recompute is high
Input data can grow by 10x during workflow, but it doesn’t stay that high
After each phase of processing, smaller amount of data must be shared for QA/QC and interpreted via Windows
Compute requirements:
- Used to fit multiple jobs on a single node and use node-local storage
- Nowadays are using tightly-coupled MPI-style across a few nodes
85%-90% of computing cycles

Reservoir modeling

Also known as “reservoir simulation”

Reservoir modeling by itself is not hugely data intensive within geosciences, though it may be compared to other HPC workloads.

Example applications include SLB Intersect or Nexus

It is

memory bandwidth-limited - they are stencil codes
not GPU-enabled
doesn’t need memory capacity, just memory bandwidth
only uses a subset of the data from Seismic imaging
fundamentally an ensemble of scenarios
- hundreds of wells with many ways to extract oil which will affect flow to other wells
- most O&G companies now do batch-style computing
- resolution is often downsampled so much that it can be run on a single core
- can run a huge, tightly-coupled (MPI), high-fidelity simulation like a full CFD, but it’s typically not worth the cost/complexity
10-15% of compute is for reservoir modeling
the intermediate data here is not backed up since it can be reproduced at low computational cost

Glenn's Digital Garden

Explorer

Oil & gas

Workflow

Seismic imaging

Reservoir modeling

Graph View

Table of Contents