Skip to content

More flexible extension support #192

Open
@crabmusket

Description

@crabmusket

TLDR:

I'd like to suggest a tweak to the extension API. As well as the current process where an extension returns a blob which the encoder copies, I propose allowing an extension to use the "private" API of the encoder to directly perform writes.

I have prototyped this extension mechanism in a fork. The changes are minimal. See comparison here.

If you agree that this idea is desirable I'm happy to clean up my branch, add tests and make a PR.


I came across this need when I was trying to design an extension to handle TypedArrays natively, but also to align them to the correct byte indexes so that they can be used efficiently after decoding. You can see my extension here - the readme explains the alignment situation. (And my original comment here.)

// Encoding and decoding a TypedArray with no boilerplate:
const floatArray = new Float32Array([1, 2, 3, 4, 5]);
const encoded = encode({ floatArray }, { extensionCodec });
assert.deepStrictEqual(decode(encoded, { extensionCodec }), { floatArray });

Using the current extension API to do this, I would need to:

  1. Do an extra copy, from the TypedArray to a Uint8Array containing my extra wrapper and alignment bytes
  2. Predict which of MsgPack's extension headers the base encoder will use when writing my extension data, and adjust alignment accordingly
  3. Use a reference to the encoder anyway, to get the current value of pos. I'd have to send this circular reference into the extension codec's context object

Problem 1 isn't such a big deal; there's a lot of copying during encoding anyway, though I minimise it by choosing appropriate initial buffer sizes. And if copying can be avoided, why not?

Problem 2 is the main issue I am interested in; I was frankly too lazy to do the maths required to predict the ext header size and then modify the alignment accordingly. My fork allows extensions to directly call the encoder's write methods, which means that I was able to always choose an ext 32 header. This makes it easy to predict the alignment requirements. A small amount of bytes are wasted, but my use-case is for large data arrays where a handful of bytes will make almost no difference to the final payload size.


In order to implement this extension efficiently, I added a write method to ExtData. My plugin returns a subclass of ExtData which has its own implementation of write.

This was just a rough cut to see if the concept worked. If I were designing this properly, I'd replace the use of the ExtData class with an interface to avoid mandatory inheritance hierarchies. You can see in my TypedArrayExtData I'm already being silly by calling the parent constructor with a new Uint8Array() which is completely unused.


Pros of this idea:

  • Allows fewer copies during encoding
  • Allows me to implement alignment much more easily by controlling which ext header is written

Cons of this idea:

  • Introduces an additional method call during the encoding process (extData.write, which currently calls straight back to encoder.encodeExtension. I think this is an example of "double dispatch" in OOP parlance?)
  • Extensions using this approach rely on the private API of the Encoder
  • I haven't thought about recursive encoding 🤔

For my use case, I didn't mind using the private API. This library seems stable enough. But for my use-case, I would also have to depend on internal implementation details when guessing alignment: if the library ever used different ext headers for the same data size, then my alignment guesses would be wrong. I see no reason why that logic would ever change, but it's still an implementation detail.

However, you might not want to allow clients to depend on this behaviour. It could be worked around, e.g. by providing an adapter with stable public methods which call the encoder's write methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions