Investigating API Failures
Incident Report for dbt Cloud
Postmortem

Summary

On Monday, April 22nd between 6:47PM UTC and 7:25PM UTC an issue occurred with internal database connections. This prevented some of our internal services from correctly connecting to our internal database.

Impact

During the outage, API requests returned an error status and dbt Cloud was effectively unavailable across the United States region. This also caused Semantic Layer connections to error out.

We apologize for the outage and to every affected customer. We are making improvements to our configuration and deployment process to ensure that a similar outage is unlikely in the future.

Root Cause

Configuration Change

The root cause of the issue was a configuration change which altered the way dbt Cloud and a handful of other services connected to our internal database. The intent of the change was to force services to connect to our internal database through a connection pooler. A preexisting configuration value in the United States region conflicted with the change and caused the configuration to be applied incorrectly. This ultimately caused the internal database connection to be established with an invalid set of credentials.

Mitigation

Our existing alerting helped us identify an increased error rate on our internal services. We were able to correctly identify a recent deployment that triggered the outage and began the rollback procedure. As the rollback was deployed, the error rate dropped and services resumes normal operation.

Next Steps or Lessons Learned

Planned Remediation

  • We are adding a check to our deployment process to ensure our internal database connections are accessible before deployment.
  • We are working to reduce the set of configuration parameters around database access.
  • We are improving automation in our release process to reduce the time it takes to complete a rollback.
Posted Apr 25, 2024 - 12:50 EDT

Resolved
This incident has been resolved. Please reach out to Support if you continue to notice issues accessing dbt Cloud.
Posted Apr 22, 2024 - 16:02 EDT
Monitoring
We've implemented a fix for this issue and are continuing to monitor dbt Cloud's performance. If you are still having trouble accessing dbt Cloud, please reach out to Support at support@getdbt.com.
Posted Apr 22, 2024 - 15:29 EDT
Identified
We have identified an issue with dbt Cloud that resulted in API call failures. A fix is being implemented, and we will provide an update shortly. Thanks for your patience.
Posted Apr 22, 2024 - 15:14 EDT
Investigating
We're investigating an issue that is causing API calls to fail, resulting in customers being unable to access dbt Cloud. This is impacting calls in US regions anytime after 6:58PM UTC. The team is working on a resolution and we will provide updates at approximately 30 minute intervals or as soon as new information becomes available.
Posted Apr 22, 2024 - 15:06 EDT