US accounts - Editor/Insights/Console is down - Partial Workstation and ActionBot outage
Incident Report for WalkMe US Status
Postmortem

Description of Incident

  • On Mar 6, 2024 ,14:09 UTC, WalkMe experienced an elevated level of service errors in our Design Time API Gateways.
  • This issue primarily affected our WalkMe builders, who may have experienced difficulties when connecting to the WalkMe Editor, Console, or Insights.
  • After urgent investigation by the WalkMe engineering team, and in order to avoid further disruption, a set of recovery steps was performed on the databases and load balancers.
  • Once recovery steps were completed, the Gateway services returned to normal and Design Time APIs were fully functional by March 6, 2024, 20:42 UTC

Scope of Incident

  • This issue primarily affected our WalkMe builders, who may have experienced difficulties when connecting to the WalkMe Editor, Console, or Insights.

Root Cause Analysis

  • A new product configuration caused some of our underlying services to enter an unstable state. This triggered a ripple effect on additional internal services.

WalkMe Corrective Action

  • WalkMe performed a rollback from the latest backup 17:40 UTC, to Database snapshot of 13:20 UTC

Ongoing Commitments

  • To uphold WalkMe's commitment to providing reliable and uninterrupted service, we are actively monitoring our systems to ensure this issue does not recur:

    • WalkMe will add additional designated tests for these specific components and configurations.
    • WalkMe will increase the observability, monitoring and alerting on these specific components and configurations.
    • WalkMe will apply additional load protection layers on core services.
Posted Mar 19, 2024 - 20:00 UTC

Resolved
All WalkMe components impacted by today's outage have been restored. Thank you for your patience as our team worked to ensure WalkMe functionality was fully restored.

Our initial findings indicate an issue with the WalkMe Authorization and Authentication endpoints in our US services to be the cause. Please expect this incident to be updated with a root cause analysis including all relevant details in the coming days.
Posted Mar 06, 2024 - 20:42 UTC
Monitoring
WalkMe Services have been restored and our team is monitoring to ensure all items are resolved.
Posted Mar 06, 2024 - 20:24 UTC
Update
Our Development team has been able to restore most services, however, the Editor's publish functionality is still impacted. We are continuing to work hard to resolve this and will have an update soon.
Posted Mar 06, 2024 - 19:48 UTC
Update
We are continuing to investigate this issue.
Posted Mar 06, 2024 - 19:32 UTC
Update
We are continuing to investigate this issue.
Posted Mar 06, 2024 - 18:22 UTC
Update
We are continuing to investigate this issue.
Posted Mar 06, 2024 - 17:31 UTC
Update
Our Development team is investigating and working on resolving the issue as soon as possible. We will report back with an update shortly.
Posted Mar 06, 2024 - 16:47 UTC
Update
We are continuing to investigate this issue.
Posted Mar 06, 2024 - 15:54 UTC
Update
We are continuing to investigate this issue.
Posted Mar 06, 2024 - 15:23 UTC
Update
We are continuing to investigate this issue.
Posted Mar 06, 2024 - 14:41 UTC
Investigating
editor/insights/console is down for US accounts.
We are investigating and will update shortly.
Posted Mar 06, 2024 - 14:09 UTC
This incident affected: Content Platform (Designer (Editor), DAP Console), Admin Space, Applications (ActionBot), End User Experience (Workstation), and Data (Insights dashboards).