dbt cloud is unreachable - 502 bad gateway on US MT

Write-up

Summary

On April 9, 2026, between approximately 10:04 and 11:13 UTC, the dbt Platform experienced a full outage in the AWS US MT environment. During this 69-minute window, customers were unable to access the dbt Platform, trigger jobs, or run scheduled tasks.

The outage was caused by a configuration error during a routine internal infrastructure change.

Timeline (UTC)

Impact

Duration: Approximately 69 minutes (10:04–11:13 UTC)
Affected environment: AWS US MT
Job execution: During the outage window:
- Delayed runs: Scheduled jobs that would have triggered during the outage were delayed until services were restored. If a job had multiple trigger times during the window, only one run was triggered post-recovery.
- Canceled runs: Runs that were in progress or started during the outage failed to maintain heartbeat status and were eventually canceled, even in cases where the underlying dbt execution completed successfully. Affected runs displayed the message: "This run timed out after 10 minutes of inactivity." All logs and artifacts from these runs were preserved.
- Chained jobs: Downstream jobs triggered by upstream job completion did not fire during the outage window, requiring customers to manually re-trigger affected pipelines.
- After recovery, normal job scheduling and execution resumed without further intervention.

Root Cause

During a routine infrastructure deployment, an unintended configuration change caused a critical internal authentication dependency to be removed from the AWS US MT production environment. This authentication resource controls how compute nodes register with and join the cluster. Without it, no nodes can operate, resulting in zero capacity to serve requests. This prevented core services from communicating with one another until the resource was restored.

The result was customers were unable to access the dbt Platform, trigger jobs, or run scheduled tasks in the AWS US MT environment for approximately 69 minutes.

Mitigation and Recovery

Our engineering team identified the root cause and manually restored the missing authentication resource on the AWS US MT environment at approximately 7:06 AM ET. Worker nodes began rejoining the cluster immediately, and all services were fully recovered by 7:13 AM ET.

Impact on Migrated Customers Using Legacy URLs

Some customers who had previously migrated from the AWS US MT environment to dedicated cells were also affected by this outage. These customers were still routing API traffic through the legacy cloud.getdbt.com URL, which proxies requests through the AWS US MT environment before forwarding them to the destination cell. When the AWS US MT environment became unavailable, this proxy path failed, causing API-triggered operations — such as jobs initiated by external orchestrators — to fail despite the destination cell environment itself being fully operational.

Customers who had updated their integrations to use their cell-specific URL (e.g., ACCOUNT_PREFIX.us1.dbt.com) were not affected. As noted in our post-migration documentation, we recommend that all migrated customers update their API integrations and orchestrators to use their account-specific URL. The legacy cloud.getdbt.com URL is scheduled for deprecation in November 2026.

We are proactively identifying migrated accounts that are still routing traffic through the legacy URL and will be reaching out with guidance to complete their URL migration.

Preventative Measures

Protective safeguards implemented:

Infrastructure-as-code delete protection has been added to prevent this specific resource from being removed in future deployments.
Two-person approval is now required for the deletion of any cloud resource in production environments.
A runtime security policy has been added to detect and prevent the deletion of this resource at the cluster level, providing an additional layer of protection independent of the deployment pipeline.

Questions?

If you have any questions about this incident or its impact on your account, please reach out to your dbt Labs account team or contact dbt Labs Support.

Write-up

dbt cloud is unreachable - 502 bad gateway on US MT

Full outage

View the incident

Summary

The outage was caused by a configuration error during a routine internal infrastructure change.

Timeline (UTC)

Impact

Duration: Approximately 69 minutes (10:04–11:13 UTC)
Affected environment: AWS US MT
Job execution: During the outage window:
- Delayed runs: Scheduled jobs that would have triggered during the outage were delayed until services were restored. If a job had multiple trigger times during the window, only one run was triggered post-recovery.
- Canceled runs: Runs that were in progress or started during the outage failed to maintain heartbeat status and were eventually canceled, even in cases where the underlying dbt execution completed successfully. Affected runs displayed the message: "This run timed out after 10 minutes of inactivity." All logs and artifacts from these runs were preserved.
- Chained jobs: Downstream jobs triggered by upstream job completion did not fire during the outage window, requiring customers to manually re-trigger affected pipelines.
- After recovery, normal job scheduling and execution resumed without further intervention.

Root Cause

The result was customers were unable to access the dbt Platform, trigger jobs, or run scheduled tasks in the AWS US MT environment for approximately 69 minutes.

Mitigation and Recovery

Impact on Migrated Customers Using Legacy URLs

We are proactively identifying migrated accounts that are still routing traffic through the legacy URL and will be reaching out with guidance to complete their URL migration.

Preventative Measures

Protective safeguards implemented:

Infrastructure-as-code delete protection has been added to prevent this specific resource from being removed in future deployments.
Two-person approval is now required for the deletion of any cloud resource in production environments.
A runtime security policy has been added to detect and prevent the deletion of this resource at the cluster level, providing an additional layer of protection independent of the deployment pipeline.

Questions?

If you have any questions about this incident or its impact on your account, please reach out to your dbt Labs account team or contact dbt Labs Support.