[Bug] Blue-Green Cluster Migration results in migration of healthcheck topic which causes timeout on GET /admin/v2/brokers/health #23572
Labels
type/bug
The PR fixed a bug or issue reported a bug
Search before asking
Read release policy
Version
Latest on master i.e.4.1.0
Minimal reproduce step
What did you expect to see?
The original "blue" cluster should appear to be healthy after cluster migration.
What did you see instead?
"Blue" cluster reports as not healthy.
The impact in my system is that we have microservices which have readiness and liveness checks which depend on the Pulsar broker health check (based on admin client / API) and since that fails, the microservices then start to fail when the blue cluster is technically still operational.
Anything else?
I'm not sure it even makes sense logically to migrate the healthcheck.
Looking here it is surprising that the healthcheck topic does not register as an Internal topic
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java
Lines 2923 to 2926 in c266db2
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/ExtensibleLoadManagerImpl.java
Lines 822 to 827 in c266db2
The healthcheck (heartbeat) topic name is defined here
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/BrokersBase.java
Lines 422 to 427 in c266db2
I might be interested in submitting a PR but would need some clarity on whether to change the logic in PersistentTopic.java e.g.
or if it makes more sense that
ExtensibleLoadManagerImpl.isInternalTopic
returns true for the healthcheck topic also.Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: