Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Optimize Pulsar Java client zlib compression performance on Java 11+ by passing direct buffers #23586

Open
1 of 2 tasks
lhotari opened this issue Nov 11, 2024 · 1 comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@lhotari
Copy link
Member

lhotari commented Nov 11, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Here's an example of CompressionCodeZLib which has several opportunities for optimizations:

public ByteBuf encode(ByteBuf source) {
byte[] array;
int length = source.readableBytes();
int sizeEstimate = (int) Math.ceil(source.readableBytes() * 1.001) + 14;
ByteBuf compressed = PulsarByteBufAllocator.DEFAULT.heapBuffer(sizeEstimate);
int offset = 0;
if (source.hasArray()) {
array = source.array();
offset = source.arrayOffset() + source.readerIndex();
} else {
// If it's a direct buffer, we need to copy it
array = new byte[length];
source.getBytes(source.readerIndex(), array);
}
Deflater deflater = this.deflater.get();
deflater.reset();
deflater.setInput(array, offset, length);
while (!deflater.needsInput()) {
deflate(deflater, compressed);
}
return compressed;
}

Solution

The java.util.zip.Deflater class has contained methods for using ByteBuffer input and output since Java 11.

In the case of Java 11+, the code could be optimized.
Since the Pulsar Java client is Java 8+, using the ByteBuffer methods would require the use of reflection (unless a multi-release jar file is used with separate classes for Java 8 and Java 11). There's a reflection example in different situation in BookKeeper's Java9IntHash class.

Regarding performance on Java 11+, the first problem is that it's using a heap buffer for the compressed buffer. A direct buffer would be more optimal when using the ByteBuffer methods with Deflater.
For Netty ByteBuf input, it's possible to achieve zero copy in most cases by using Netty ByteBuf's nioBuffer method. It's notable that using nioBuffer method will cause copies when the Netty ByteBuf input is a CompositeByteBuf. Netty doesn't have a good way for zero copy of CompositeByteBuf input. In BookKeeper, there's a solution for checksum calculation in the https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java class, which can visit all buffer parts to avoid extra copies. A similar solution would be applicable to compression.

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@lhotari lhotari added the type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages label Nov 11, 2024
@liangyepianzhou
Copy link
Contributor

Since the Pulsar Java client is Java 8+, using the ByteBuffer methods would require the use of reflection (unless a multi-release jar file is used with separate classes for Java 8 and Java 11). There's a reflection example in different situation in BookKeeper's Java9IntHash class.

This is indeed an optimization direction, but I am worried whether upgrading the JDK version of the client will cause trouble for users to upgrade.

For Netty ByteBuf input, it's possible to achieve zero copy in most cases by using Netty ByteBuf's nioBuffer method. It's notable that using nioBuffer method will cause copies when the Netty ByteBuf input is a CompositeByteBuf. Netty doesn't have a good way for zero copy of CompositeByteBuf input. In BookKeeper, there's a solution for checksum calculation in the https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/util/ByteBufVisitor.java class, which can visit all buffer parts to avoid extra copies. A similar solution would be applicable to compression.

My concern is, does the Pulsar client really use CompositeByteBuf to send messages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
Development

No branches or pull requests

2 participants