Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The current restart ability works in the following way:
1. Tries to fetch resources on a host
2. Stops the active container if resources are accrued
3. Tried to start the container on host accrued
In production we have seen following observation with ATC / concourse with this
1. CDP jobs are configured to use resources for peak which leads to no headroom left on host for requesting additional resources
2. This leads to restart requests failing due to not able to get resources on that host
A fix to this is to implement a force-restart utility for CDP, in this version we will stop the container first and then accure resources. The upside being we will atleast free up the resources on the host before issuing resource request, downside being it will be a best effort scenario to bring that contianer back up on that host