Although I work for VAST Data, these notes are my own personal notes and are not authoritative. They may be wrong.

VAST DataEngine is the serverless runtime that is integrated into VAST that allows you to define

  • serverless functions, written in Python, that execute in response to events
  • triggers, events to which functions react, and events emitted by functions
  • pipelines, a directed graph of functions and triggers that execute

Compared to workflow frameworks like Apache Airflow (or Slurm), which rely on a centralized orchestrator to coordinate tasks, DataEngine is completely event-driven to allow it to operate at any scale. This avoids common problems associated with, for example, one user submitting a thousand jobs at once, or constantly having to poll a resource to determine when the next step should kick off.

Functions

Functions are Python scripts that use the VAST SDK. Here’s an example:1

from vast_runtime.vast_event import VastEvent
 
def init(ctx):
    with ctx.tracer.start_as_current_span("Init"):
        ctx.logger.info(f"secrets available: {list(ctx.secrets.keys())}")
        ctx.my_value = ctx.secrets.get("MY_SECRET", "default")
 
def handler(ctx, event: VastEvent):
    with ctx.tracer.start_as_current_span("Handler") as span:
        data = event.get_data()
        span.set_attribute("data_length", len(data))
        ctx.logger.info(f"handling event with secret={ctx.my_value}")
        return {"status": "ok", "data": data}

Once a function’s code is written, it has to be compiled into a serverless function. This means…2

# building the image
vastde functions build myfunction -t /myfunction
 
# push it to dockerhub or some other registry
docker tag myfunction:latest docker.io/me/myfunction:1.0
docker push docker.io/me/myfunction:1.0
 
# register the function with DataEngine
vastde functions create \
  --name myfunction \
  --container-registry docker.io \
  --artifact-source me/myfunction \
  --image-tag "1.0"

Triggers

Triggers encapsulate messages that appear on a Kafka-compatible broker (such as the VAST Event Broker, a part of the VAST DataBase). They are formatted as CloudEvents and emitted by VAST itself. As of version 5.4, VAST emits events related to S3 object creation, tagging, and deletion.3

The process for creating an event requires defining the following:4

parameterpurpose
namea unique name for this trigger
typetype of trigger. Element is a trigger that reacts to a file/object
source bucketonly trigger off of events originating on this S3 bucket
eventsonly trigger off of this subset of S3 events
broker namelisten to this Kafka-compatible broker
broker typeInternal to use VAST’s built-in broker; something else to use an external Kafka bus
topiclisten for events on this Kafka topic

Pipelines

Pipelines are the glue that connect triggers to functions. These are specified as YAML documents that look vaguely like this:5

kubernetes_cluster_vrn: vast:dataengine:kubernetes-clusters:somecluster
namespace: default
manifest:
  config:
    environment_variables: []
    secrets:
      - somesecret
  function_deployments:
    - function_vrn: vast:dataengine:functions:somefunction
      name: somefunction
# ...
  links:
    - source:
        - sometrigger
      destination:
        - somefunction
      topic: vast:dataengine:topics:mybroker/mytopic
      config:
        events_order: unordered
        retries: 3
# ...
  triggers:
    - name: somefunction
      vrn: vast:dataengine:triggers:sometrigger
# ...

With this manifest in hand, you then have to register and deploy the pipeline:

vastde pipelines create \
    --name mypipeline\ 
    --config @mypipeline.yaml \
    --secret-file mypipeline-secrets.yaml \
    --deploy

Footnotes

  1. cosmos-labs/dataengine-vss-blueprint/source-code/ingest/vastdb-writer/main.py at 36075580a12c1487d63d48cd78bde695a9c85368 · vast-data/cosmos-labs

  2. cosmos-labs/dataengine-vss-blueprint/deployments/dataengine-vss-ingest-pipeline/README.md at 36075580a12c1487d63d48cd78bde695a9c85368 · vast-data/cosmos-labs

  3. Event Publishing

  4. cosmos-labs/dataengine-vss-blueprint/deployments/dataengine-vss-ingest-pipeline/README.md at main · vast-data/cosmos-labs

  5. cosmos-labs/dataengine-vss-blueprint/deployments/dataengine-vss-ingest-pipeline/vss-ingest-pipeline-file.yaml at 1bdf2b6e5873fdc6d945161531c704e71d6f9b4f · vast-data/cosmos-labs