Skip to content

Changelog

All notable user-facing changes to dagster-ray will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

0.4.0

This release introduces a new feature that is very useful in dev environments: Cluster Sharing. Cluster sharing allows reusing existing RayCluster resources created by previous Dagster steps. It's implemented for KubeRayCluster Dagster resource. This feature enables faster iteration speed and reduced infrastructure costs (at the expense of job isolation). Therefore KubeRayCluster is now recommended over KubeRayInteractiveJob for use in dev environments.

Learn more in Cluster Sharing docs.

Added

  • KubeRayCluster.cluster_sharing parameter that controls cluster sharing behavior.
  • dagster_ray.kuberay.sensors.cleanup_expired_kuberay_clusters sensor that cleans up expired clusters (both shared and non-shared). Learn more in docs.
  • dagster-ray entry now appears in the Dagster libraries list in the web UI.

Changed

  • [💣 breaking] - removed cleanup_kuberay_clusters_op and other associated definitions in favor of dagster_ray.kuberay.sensors.cleanup_expired_kuberay_clusters sensor that is more flexible.

0.3.1

Added

  • failure_tolerance_timeout configuration parameter for KubeRayInteractiveJob and KubeRayCluster. It can be set to a positive value to give the cluster some time to transition out of failed state (which can be transient in some scenarios) before raising an error.

Fixes

  • ensure both .head.serviceIP and .head.serviceName are set on the RayCluster while waiting for cluster readiness.

0.3.0

This release includes massive docs improvements and drops support for Python 3.9.

Changes

  • [💣 breaking] dropped Python 3.9 support (EOL October 2025).
  • [internal] most of the general, backend-agnostic code has been moved to dagster_ray.core (top-level imports still work).

0.2.1

Fixes

  • Fixed broken wheel on PyPI.

0.2.0

Changed

  • KubeRayInteractiveJob.deletion_strategy now defaults to DeleteCluster for both successful and failed executions. This is a reasonable default for the use case.
  • KubeRayInteractiveJob.ttl_seconds_after_finished now defaults to 600 seconds.
  • KubeRayCluster.lifecycle.cleanup now defaults to always.
  • [💣 breaking] RayJob and RayCluster clients and resources Kubernetes init parameters have been renamed to kube_config and kube_context.

Added

  • enable_legacy_debugger configuration parameter to subclasses of RayResource
  • on_exception option for lifecycle.cleanup policy. It's triggered during resource setup/cleanup (including KeyboardInterrupt), but not by user @op/@asset code.
  • KubeRayInteractiveJob now respects lifecycle.cleanup. It defaults to on_exception. Users are advised to rely on built-in RayJob cleanup mechanisms, such as ttlSecondsAfterFinished and deletionStrategy.

Fixes

  • removed ignore_reinit_error from RayResource init options: it's potentially dangerous, for example in case the user has accidentally connected to another Ray cluster (including local ray) before initializing the resource.

0.1.0

Changed

  • [💣 breaking] RayResource: top-level skip_init and skip_setup configuration parameters have been removed. The lifecycle field is the new way of configuring steps performed during resource initialization. KubeRayCluster's skip_cleanup has been moved to lifecycle as well.
  • [💣 breaking] injected dagster.io/run_id Kubernetes label has been renamed to dagster/run-id. Keys starting with dagster.io/ have been converted to dagster/ to match how dagster-k8s does it.
  • [💣 breaking] dagster_ray.kuberay Configurations have been unified with KubeRay APIs.
  • dagster-ray now populates Kubernetes labels with more values (including some useful Dagster Cloud values such as git-sha).

Added

  • KubeRayInteractiveJob -- a resource that utilizes the new InteractiveMode for RayJob. It can be used to connect to Ray in Client mode -- like KubeRayCluster -- but gives access to RayJob features, such as automatic cleanup (ttlSecondsAfterFinished), retries (backoffLimit) and timeouts (activeDeadlineSeconds).
  • RayResource setup lifecycle has been overhauled: resources now has an actions parameter with 3 configuration options: create, wait and connect. The user can disable them and run .create(), .wait() and .connect() manually if needed.