Welcome to dagster-ray docs¶
Ray integration for Dagster.
dagster-ray enables working with distributed Ray compute from Dagster pipelines, combining Dagster's excellent orchestration capabilities and Ray's distributed computing power together.
Note
This project is ready for production use, but some APIs may change between minor releases.
π Key Features¶
- π― Run Launchers & Executors: Submit Dagster runs or individual steps by submitting Ray jobs
- π§ Ray Resources: Automatically create and destroy ephemeral Ray clusters and connect to them in client mode
- π‘ Dagster Pipes Integration: Submit external scripts as Ray jobs, stream back logs and rich Dagster metadata
- βΈοΈ KubeRay Support: Utilize
RayJobandRayClustercustom resources in client or job submission mode (tutorial) - π Production Ready: Tested against a matrix of core dependencies, integrated with Dagster+
- β‘ Instant Startup: Leverage
RayClusterwith Cluster Sharing for lightning-fast development cycles with zero cold start times (intended for development environments)
β‘ Quick Start¶
Installation¶
Tip
Tip
See KubeRay tutorial
Basic Usage¶
Execute Dagster steps on an existing Ray cluster¶
Example
Execute an asset on Ray in client mode¶
Example
Define a Dagster asset that uses Ray in client mode
import dagster as dg
from dagster_ray import RayResource
import ray
@ray.remote
def compute_square(x: int) -> int:
return x**2
@dg.asset
def my_distributed_computation(ray_cluster: RayResource) -> int: # (2)!
futures = [compute_square.remote(i) for i in range(10)] # (1)!
return sum(ray.get(futures))
I am already running in Ray!
RayResourceis a type annotation that provides a common interface for Ray resources
Now use LocalRay for development and swap it with a thick cluster in Kubernetes!
from dagster_ray import LocalRay
from dagster_ray.kuberay import in_k8s, KubeRayInteractiveJob
ray_cluster = LocalRay() if not in_k8s else KubeRayInteractiveJob()
definitions = dg.Definitions(
assets=[my_distributed_computation],
resources={"ray_cluster": ray_cluster},
)
KubeRayInteractiveJob will create a RayJob, connect to it, and optionally perform cleanup according to the configured policy.
Learn more by reading the tutorials.
π οΈ Choosing Your Integration¶
dagster-ray offers multiple ways to integrate Ray with your Dagster pipelines. The right choice depends on your deployment setup and use case:
π€ Key Questions to Consider¶
- Do you want to manage Ray clusters automatically? If yes, use KubeRay components
- Do you prefer to submit external scripts or run code directly? External scripts offer better separation of concerns and environments, but interactive code is more convenient
- Do you need per-asset configuration? Some components allow fine-grained control per asset
π Feature Comparison¶
| Feature | RayRunLauncher |
ray_executor |
PipesRayJobClient |
PipesKubeRayJobClient |
KubeRayCluster |
KubeRayInteractiveJob |
|---|---|---|---|---|---|---|
| Manages the cluster | β | β | β | β | β | β |
| Uses Ray Jobs API | β | β | β | β | β | β |
| Enabled per-asset | β | β | β | β | β | β |
| Configurable per-asset | β | β | β | β | β | β |
| No external script needed | β | β | β | β | β | β |
| No Dagster DB access needed | β | β | β | β | β | β |
π― Which One Should You Use?¶
You have a Ray cluster already running
- Use
RayRunLauncherto run the entire Dagster deployment on Ray - Use
ray_executorto run specific jobs on Ray - Use
PipesRayJobClientto submit external Python scripts as Ray jobs
Tip
You want dagster-ray to handle cluster lifecycle
dagster-ray supports running Ray on Kubernetes with KubeRay.
- Use
KubeRayInteractiveJobto create aRayJoband connect in client mode (recommended for production) - Use
KubeRayClusterwith cluster sharing enabled to create a newRayClusteror immediately connect to an existing one (recommended for dev environments) - Use
PipesKubeRayJobClientto submit external scripts asRayJob
Tip
See KubeRay tutorial
π What's Next?¶
-
Step-by-step guide with practical examples to get you started with
dagster-ray -
Complete documentation of all classes, methods, and configuration options