GitHub header
Disruption with some GitHub services
Incident Report for GitHub
Resolved
On October 11, 2024, starting at 05:59 UTC, DNS infrastructure in one of our sites started to fail to resolve lookups following a database migration. Attempts to recover the database led to cascading failures that impacted the DNS systems for that site. The team worked to restore the infrastructure and there was no customer impact until 17:31 UTC.

During the incident, impact to the following services could be observed:

- Copilot: Degradation in IDE code completions for 4% of active users during the incident from 17:31 UTC to 21:45 UTC.
- Actions: Workflow runs delay (25% of runs delayed by over 5 minutes) and errors (1%) between 20:28 UTC and 21:30 UTC. Errors while creating Artifact Attestations.
- Customer migrations: From 18:16 UTC to 23:12 UTC running migrations stopped and new ones were not able to start.
- Support: support.github.com was unavailable from 19:28 UTC to 22:14 UTC.
- Code search: 100% of queries failed between 2024-10-11 20:16 UTC and 2024-10-12 00:46 UTC.

Starting at 18:05 UTC, engineering attempted to repoint the degraded site DNS to a different site to restore DNS functionality. At 18:26 UTC the test system had validated this approach and a progressive rollout to the affected hosts proceeded over the next hour. While this mitigation was effective at restoring connectivity within the site, it caused issues with connectivity from healthy sites back to the degraded site, and the team proceeded to plan out a different remediation effort.

At 20:52 UTC, the team finalized a remediation plan and began the next phase of mitigation by deploying temporary DNS resolution capabilities to the degraded site. At 21:46 UTC, DNS resolution in the degraded site began to recover and was fully healthy at 22:16 UTC. Lingering issues with code search were resolved at 01:11 UTC on October 12.

The team continued to restore the original functionality within the site after public service functionality was restored. GitHub is working to harden our resiliency and automation processes around this infrastructure to make diagnosing and resolving issues like this faster in the future.
Posted Oct 12, 2024 - 01:11 UTC
Update
We’re continuing to work towards recovery of code search service.
Posted Oct 12, 2024 - 00:46 UTC
Update
We’ve identified the issue with code search and are working towards recovery of service.
Posted Oct 12, 2024 - 00:14 UTC
Update
We’re continuing to investigate issues with code search.
Posted Oct 11, 2024 - 23:31 UTC
Update
We’re continuing to investigate issues with code search. Copilot and Actions services are recovered and operating normally.
Posted Oct 11, 2024 - 22:57 UTC
Update
Copilot is operating normally.
Posted Oct 11, 2024 - 22:16 UTC
Update
We are rolling out a fix to address the network connectivity issues. Copilot is seeing recovery. support.github.com is recovered.
Posted Oct 11, 2024 - 22:14 UTC
Update
Actions is operating normally.
Posted Oct 11, 2024 - 21:46 UTC
Update
We continue to work on mitigations. Actions is starting to see recovery.
Posted Oct 11, 2024 - 21:28 UTC
Update
The mitigation attempt did not resolve the issue and we are working on a different resolution path. In addition to the previously listed impacts, some Actions runs will see delays in starting.
Posted Oct 11, 2024 - 20:52 UTC
Update
Actions is experiencing degraded performance. We are continuing to investigate.
Posted Oct 11, 2024 - 20:48 UTC
Update
We continue to work on mitigations. In addition to previously listed impact, code search is also unavailable.
Posted Oct 11, 2024 - 20:15 UTC
Update
A mitigation for the network connectivity issues is being tested.
Posted Oct 11, 2024 - 20:05 UTC
Update
We continue to work on mitigations to restore network connectivity. In addition to the previously listed impact, access to support.github.com is also impacted.
Posted Oct 11, 2024 - 19:28 UTC
Update
We have identified the problem and are working on mitigations. In addition to previously listed impact, new Artifact Attestations cannot be created.
Posted Oct 11, 2024 - 19:05 UTC
Update
We have identified the problem is related to maintenance performed in our networking infrastructure. We are working to bring back the connectivity.

Copilot users in organizations or enterprises that have opted into the Content Exclusions feature will experience disabled completions in their editors.

Customer migrations remain paused as well.
Posted Oct 11, 2024 - 18:41 UTC
Update
We are investigating network connectivity issues. Some Copilot customers will see errors on API calls and experiences. We have also paused the remaining customer migration queue while we investigate due to an increase in errors.
Posted Oct 11, 2024 - 18:25 UTC
Update
We are investigating reports of issues with service(s): Copilot. We will continue to keep users updated on progress towards mitigation.
Posted Oct 11, 2024 - 17:58 UTC
Update
Copilot is experiencing degraded availability. We are continuing to investigate.
Posted Oct 11, 2024 - 17:56 UTC
Investigating
We are currently investigating this issue.
Posted Oct 11, 2024 - 17:53 UTC
This incident affected: Actions and Copilot.