It's great that the rust community are finding ways to improve the performance of decoding strings from WASM to js, it's one of the major performance holes you hit when using WASM.
The issue comes down to the fact that even if your WASM code can return a utf16 buffer, to use it as a string in JS code the engine needs to make a copy at some point. The TextDecoder api does a first good job of making this efficient, ensuring there is just a single copy, but it's still overhead.
Ideally there should be a way to wrap an array buffer with a "String View", offloading the responsibility of ensuring its utf16 to the WASM code, and there being no copy made. But that brings a ton of complexities as strings need to be immutable in js, but the underlying buffer could still be changed.
The JS string built-ins proposal for WebAssembly:
https://github.com/WebAssembly/js-string-builtins/blob/main/...
Personally I feel this is backwards - I don't want access to js literals and objects from WASM, I just want a way to wrap an arbitrary array buffer that contains a utf16 string as a js string.
It keeps WASM simple and provides a thin layer as an optimisation.
> It keeps WASM simple
At the cost of complicating JS string implementations, probably to the point of undoing the benefits.
Currently JS strings are immutable objects, allowing for all kinds of optimization tricks (interning, ropes, etc.). Having one string represented by a mutable arraybuffer messes with that.
There's probably also security concerns with allowing mutable access to string internals inside the JS engine side.
So the simple-appearing solution you suggested would be rejected all major browser vendors who back the various WASM and JS engines.
Access to constant JS strings without any form of mutability is the only realistic option for accessing JS strings. And creating constant strings is the only one for sending them back.
The whole UTF-8 vs UTF-16 thing makes this way more messy than it should be.
I'd love for some native way of handling UTF-8 in JavaScript and the DOM (no, TextEncoder/TextDecoder do not count). Even a kind of "mode" you could choose for the whole page would be a huge step forward for the "compile native language to WASM + web" thing.
The TC39 proposal for "Resizable ArrayBuffer" and "String.prototype.isWellFormed" methods are steps in this direction, though we still need proper zero-copy UTF-8 string views.
100%. If we could get a DomString8 (8-bit encoded) interface in addition to the existing DomString (16-bit encoded) and a way to wrap a buffer in a DomString8, we could have convenient and reasonably performant interfaces between WASM and the DOM.
The extra DOM complexity that would entail seems like a loss for the existing web.
The current situation is that we have limited uptake of WASM. This is due, in part, to lack of DOM access. We could solve that but we would have to complicate WASM or complicate the DOM. Complicating WASM would seem to undermine its purpose, burdening it forever with the complexity of the browser. The DOM, on the other hand, is already quite complex. But providing a fresh interface to the DOM would make it possible to bypass some of the accretions of time and complexity. The majority of the cost would be to browser implementors as opposed to web developers.
At least some of the implementation complexity is already there under the hood. WebKit/Blink have an optimization to use 8-bit characters for strings that consist only of latin1 characters.
This is a pressing concern for anyone trying to move large amounts of text data in and out of WASM. The batching makes good sense, I should try this for SQLite result sets!
> Wasm-bindgen calls TextDecoder.decode for every string. Sledgehammer only calls TextEncoder.decode once per batch.
So they decode one long concatenated string and then on the JS side split it into substrings? I wonder if that messes with the GC on the JS side of things.
How would splitting it into substrings be different from decoding individual strings from an allocation/gc perspective? If anything I'd assume splitting a substring was more efficient - i expect there's a ton of optimisations in js for sliced strings or whatever as it's been around for ages.
I imagine it's faster during creation because there's fewer allocations for a backing array for the string content (one, basically, unless they move stuff around). But then that can also mean holding on to the entire backing array even if only one of the strings is still "alive", unless there are optimizations for reclaiming memory in those situations too.
I'm pretty sure turbofan handles this, you might need to do a little hoisting or tagging
I wrote a while back about a somewhat related issue:
https://www.nhatcher.com/post/should_i_import_or_should_i_ro...
The code is a bit outdated, but the principle of linking against the browser implementation stands
Sad that this isn’t natively in browsers…
How does the performance compare to projects like Wasmtime?
The two projects have different usecases so they can't be directly compared. Slegehammer bindgen makes calling javascript from rust faster in the browser. Wasmtime is a native runtime for WASM outside of the browser
I think there is a ton of room left on the table here for innovation.
Context: as far as I know Electron is still the king if you want to do (unsafe but performant) "IPC/RPC" between native and a webview.
All of the other options that exist in other languages (Deno, Rust, you name it) do the same "stringified JSON back and forth" which really isn't great for performance in my opinion.
It'd be cool if (obviously in a sandboxed or secure way) you could opt in to something albeit a bit reckless, but some way to provide native methods for the WASM part of V8 and its WebView (thinking Electron-esque here) to call.
I'm not sure if I'm understanding you correctly, but vanilla wasm ipc works by sharing linear memory, where it's up to the implementation to choose the data encoding (arrow/proto/whatever). In the case of wasm-bindgen's dom manipulation api, the implementation serialises individual commands and sends them over the boundary, with any string params for each command being deserialised individually, and this project improves on that by batching them all into one big string thus reducing the deserialisation overhead. However, the string encoding is specific to that use case - it's not a general wasm ipc mechanism.
VSCode IPC is kinda similar as it's designed to facilitate comms over an enforced process isolation barrier to protect the main thread from slow extensions etc. but it's actually IPC there (as in, there are multiple processes at the os level). The wasm/js stuff is handled within the same v8 context - it's not actually ipc.
(Happy to be corrected here, but this is my understanding)
https://github.com/webview/webview_deno
Tell me how you'd do "native C/C++ FFI (to like a .so or .dylib or .dll)" between the webview using WASM or anything other than "WebKit's built in JSON-string based IPC"
Like a <button> that triggers a DLL call. How would you achieve it with WASM? How does WASM act as the bridge to the DOM and/or native? It doesn't, right?
Most of the optimizations apply equally to calling into webview JS from a native application for something like dom manipulation. There is a wry example that uses the system webview in the repo: https://github.com/ealmloff/sledgehammer_bindgen/blob/master...
https://github.com/ealmloff/sledgehammer_bindgen/blob/master...
This is fetch(), which I have to imagine is much less performant than Electron with a require('native-module')
We're building a brand new webview that has a native code (Rust) API to the DOM. That way your native extension doesn't have to go through JavaScript at all. Currently it doesn't have any JS support, but it could be added.
How could/would you do "DOM event -> native FFI dlsym-type call"?