Welcome to dagster-ray
docs¶
Ray integration for Dagster.
dagster-ray
enables working with distributed Ray compute from Dagster pipelines, combining Dagster's excellent orchestration capabilities and Ray's distributed computing power together.
Note
This project is ready for production use, but some APIs may change between minor releases.
π Key Features¶
- π― Run Launchers & Executors: Submit Dagster runs or individual steps by submitting Ray jobs
- π§ Ray Resources: Automatically create and destroy ephemeral Ray clusters and connect to them in client mode
- π‘ Dagster Pipes Integration: Submit external scripts as Ray jobs, stream back logs and rich Dagster metadata
- βΈοΈ KubeRay Support: Utilize
RayJob
andRayCluster
custom resources in client or job submission mode (tutorial) - π Production Ready: Tested against a matrix of core dependencies, integrated with Dagster+
β‘ Quick Start¶
Installation¶
Tip
Tip
See KubeRay tutorial
Basic Usage¶
Execute Dagster steps on an existing Ray cluster¶
Example
Execute an asset on Ray in client mode¶
Example
Define a Dagster asset that uses Ray in client mode
import dagster as dg
from dagster_ray import RayResource
import ray
@ray.remote
def compute_square(x: int) -> int:
return x**2
@dg.asset
def my_distributed_computation(ray_cluster: RayResource) -> int: # (2)!
futures = [compute_square.remote(i) for i in range(10)] # (1)!
return sum(ray.get(futures))
I am already running in Ray!
RayResource
is a type annotation that provides a common interface for Ray resources
Now use LocalRay
for development and swap it with a thick cluster in Kubernetes!
from dagster_ray import LocalRay
from dagster_ray.kuberay import in_k8s, KubeRayInteractiveJob
ray_cluster = LocalRay() if not in_k8s else KubeRayInteractiveJob()
definitions = dg.Definitions(
assets=[my_distributed_computation],
resources={"ray_cluster": ray_cluster},
)
KubeRayInteractiveJob
will create a RayJob
, connect to it, and optionally perform cleanup according to the configured policy.
Learn more by reading the tutorials.
π οΈ Choosing Your Integration¶
dagster-ray
offers multiple ways to integrate Ray with your Dagster pipelines. The right choice depends on your deployment setup and use case:
π€ Key Questions to Consider¶
- Do you want to manage Ray clusters automatically? If yes, use KubeRay components
- Do you prefer to submit external scripts or run code directly? External scripts offer better separation of concerns and environments, but interactive code is more convenient
- Do you need per-asset configuration? Some components allow fine-grained control per asset
π Feature Comparison¶
Feature | RayRunLauncher |
ray_executor |
PipesRayJobClient |
PipesKubeRayJobClient |
KubeRayCluster |
KubeRayInteractiveJob |
---|---|---|---|---|---|---|
Manages the cluster | β | β | β | β | β | β |
Uses Ray Jobs API | β | β | β | β | β | β |
Enabled per-asset | β | β | β | β | β | β |
Configurable per-asset | β | β | β | β | β | β |
No external script needed | β | β | β | β | β | β |
No Dagster DB access needed | β | β | β | β | β | β |
π― Which One Should You Use?¶
You have a Ray cluster already running
- Use
RayRunLauncher
to run the entire Dagster deployment on Ray - Use
ray_executor
to run specific jobs on Ray - Use
PipesRayJobClient
to submit external Python scripts as Ray jobs
Tip
You want dagster-ray
to handle cluster lifecycle
dagster-ray
supports running Ray on Kubernetes with KubeRay.
- Use
KubeRayInteractiveJob
to create aRayJob
and connect in client mode - Use
PipesKubeRayJobClient
to submit external scripts asRayJob
Tip
See KubeRay tutorial
π What's Next?¶
-
Step-by-step guide with practical examples to get you started with
dagster-ray
-
Complete documentation of all classes, methods, and configuration options