[Bug] managedLedgerMaxReadsInFlightSizeInMB limit doesn't work properly if managedLedgerReadEntryTimeoutSeconds isn't set #23506

lhotari · 2024-10-23T12:33:48Z

Search before asking

I searched in the issues and found nothing similar.

Read release policy

I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

all released versions

Minimal reproduce step

Problem description:
When the system is loaded, an exception such as Time-out elapsed while acquiring enough permits on the memory limiter to read from ledger [ledgerid], [topic], estimated read size [read size] bytes for [dispatcherMaxReadBatchSize] entries (check managedLedgerMaxReadsInFlightSizeInMB) will happen when managedLedgerReadEntryTimeoutSeconds isn't set.
The solution expects that managedLedgerReadEntryTimeoutSeconds has been set.
In addition to this, the solution is inefficient since retries happen in a tight loop. It's also possible that ordering of reads gets mixed up since any next call will get the permits for doing the next read.

What did you expect to see?

managedLedgerMaxReadsInFlightSizeInMB limit should work also with setting managedLedgerReadEntryTimeoutSeconds

What did you see instead?

timeouts could happen when managedLedgerReadEntryTimeoutSeconds isn't set. "fairness" is missing for the waiting read requests. This could cause starvation type of issue when the system is overloaded.

Anything else?

InflightReadsLimiter should be refactored to return a CompletableFuture so that logic can be handled asynchronously and reactively. There should be a queue so that the next waiting acquire call will prevent other calls until there are sufficient amount of permits available.

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

lhotari · 2024-11-01T12:06:06Z

I noticed also multiple other issues in InflightReadsLimiter / PendingReadsManager:

PendingReadsManager will acquire permits for partial reads in de-duplication which is wrong
- This results in acquiring duplicate permits

lhotari · 2024-11-01T13:23:55Z

Possible issue with PendingReadsManager:

when reads are de-duplicated, the same entries are returned to all callers in case of an exact match. This could be problematic since readerIndex is shared state of the ByteBuffer and that gets mutated in some cases.

lhotari added the type/bug The PR fixed a bug or issue reported a bug label Oct 23, 2024

lhotari self-assigned this Oct 23, 2024

lhotari mentioned this issue Oct 23, 2024

[Enhancement] Adjust batch size in dispatcher so that a single batch doesn't exceed managedLedgerMaxReadsInFlightSizeInMB #23482

Open

2 tasks

lhotari mentioned this issue Nov 7, 2024

[fix][broker] Fix reading entries failed due to max in-flight reading #23524

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] managedLedgerMaxReadsInFlightSizeInMB limit doesn't work properly if managedLedgerReadEntryTimeoutSeconds isn't set #23506

[Bug] managedLedgerMaxReadsInFlightSizeInMB limit doesn't work properly if managedLedgerReadEntryTimeoutSeconds isn't set #23506

lhotari commented Oct 23, 2024

lhotari commented Nov 1, 2024 •

edited

Loading

lhotari commented Nov 1, 2024

[Bug] managedLedgerMaxReadsInFlightSizeInMB limit doesn't work properly if managedLedgerReadEntryTimeoutSeconds isn't set #23506

[Bug] managedLedgerMaxReadsInFlightSizeInMB limit doesn't work properly if managedLedgerReadEntryTimeoutSeconds isn't set #23506

Comments

lhotari commented Oct 23, 2024

Search before asking

Read release policy

Version

Minimal reproduce step

What did you expect to see?

What did you see instead?

Anything else?

Are you willing to submit a PR?

lhotari commented Nov 1, 2024 • edited Loading

lhotari commented Nov 1, 2024

lhotari commented Nov 1, 2024 •

edited

Loading