Skip to content

Using html5ever in wasm package for an isomorphic html sanitizer #497

Open
@dejang

Description

@dejang

Hello,

I am looking at ways to build an HTML5 sanitizer capable of running in both Browser, NodeJS and Java environments, Java being the lowest priority at the moment. The most important requirement is to not rely on a DOM to be able to operate in these environments. I stumbled upon html5ever and it looks like the perfect tool to use for my scenario with the added benefit that it's part of the Servo project.

For Browser and NodeJS environments I would have to produce WASM artifacts given the simplicity of dealing with multiple platforms in NodeJS but also because of environments where I may not be able to load NodeJS binary native plugins. For Browser environments or mobile WebView there is no other option than producing a WASM artifact so these are the restrictions around the distribution process which I am fine with.

I am using Rust to build the sanitizer so this keeps things easy to manage staying in the same programming language all the way in the development process.

Currently, when compiling html5ever to WASM I get an output of 450kb even when running it through wasm-opt and being very aggressive on the optimizations for size. Unfortunately that is way too big of a file for the Web. Ideally, if it can be around 50kb it would make html5ever a much more desirable alternative to existing Javascript sanitizers for the browser.

I would like to ask if there is a way to either compile html5ever to WASM so that I can reach my desired target size or, alternatively, use only features from the parser that I currently need in hopes that by doing this I will manage to shave off a considerable amount of code.
My main scenario is the following: given a string containing HTML, produce a DOM tree which can be traversed to identify tags, attributes and attribute values which should be eliminated. Return a string.

Thank you for taking the time to read this issue, hopefully with your help I'll be able to use html5ever to achieve my goals.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions