• samwillis 20 hours ago

    It's great that the rust community are finding ways to improve the performance of decoding strings from WASM to js, it's one of the major performance holes you hit when using WASM.

    The issue comes down to the fact that even if your WASM code can return a utf16 buffer, to use it as a string in JS code the engine needs to make a copy at some point. The TextDecoder api does a first good job of making this efficient, ensuring there is just a single copy, but it's still overhead.

    Ideally there should be a way to wrap an array buffer with a "String View", offloading the responsibility of ensuring its utf16 to the WASM code, and there being no copy made. But that brings a ton of complexities as strings need to be immutable in js, but the underlying buffer could still be changed.

    • breve 18 hours ago

      The JS string built-ins proposal for WebAssembly:

      https://github.com/WebAssembly/js-string-builtins/blob/main/...

      • samwillis 16 hours ago

        Personally I feel this is backwards - I don't want access to js literals and objects from WASM, I just want a way to wrap an arbitrary array buffer that contains a utf16 string as a js string.

        It keeps WASM simple and provides a thin layer as an optimisation.

        • vanderZwan 15 hours ago

          > It keeps WASM simple

          At the cost of complicating JS string implementations, probably to the point of undoing the benefits.

          Currently JS strings are immutable objects, allowing for all kinds of optimization tricks (interning, ropes, etc.). Having one string represented by a mutable arraybuffer messes with that.

          There's probably also security concerns with allowing mutable access to string internals inside the JS engine side.

          So the simple-appearing solution you suggested would be rejected all major browser vendors who back the various WASM and JS engines.

          Access to constant JS strings without any form of mutability is the only realistic option for accessing JS strings. And creating constant strings is the only one for sending them back.

    • andyferris 20 hours ago

      The whole UTF-8 vs UTF-16 thing makes this way more messy than it should be.

      I'd love for some native way of handling UTF-8 in JavaScript and the DOM (no, TextEncoder/TextDecoder do not count). Even a kind of "mode" you could choose for the whole page would be a huge step forward for the "compile native language to WASM + web" thing.

      • ethan_smith 18 hours ago

        The TC39 proposal for "Resizable ArrayBuffer" and "String.prototype.isWellFormed" methods are steps in this direction, though we still need proper zero-copy UTF-8 string views.

        • theSherwood 19 hours ago

          100%. If we could get a DomString8 (8-bit encoded) interface in addition to the existing DomString (16-bit encoded) and a way to wrap a buffer in a DomString8, we could have convenient and reasonably performant interfaces between WASM and the DOM.

          • continuational 18 hours ago

            The extra DOM complexity that would entail seems like a loss for the existing web.

            • theSherwood 9 hours ago

              The current situation is that we have limited uptake of WASM. This is due, in part, to lack of DOM access. We could solve that but we would have to complicate WASM or complicate the DOM. Complicating WASM would seem to undermine its purpose, burdening it forever with the complexity of the browser. The DOM, on the other hand, is already quite complex. But providing a fresh interface to the DOM would make it possible to bypass some of the accretions of time and complexity. The majority of the cost would be to browser implementors as opposed to web developers.

              • zetafunction 10 hours ago

                At least some of the implementation complexity is already there under the hood. WebKit/Blink have an optimization to use 8-bit characters for strings that consist only of latin1 characters.

          • jitl 9 hours ago

            This is a pressing concern for anyone trying to move large amounts of text data in and out of WASM. The batching makes good sense, I should try this for SQLite result sets!

            • vanderZwan 15 hours ago

              > Wasm-bindgen calls TextDecoder.decode for every string. Sledgehammer only calls TextEncoder.decode once per batch.

              So they decode one long concatenated string and then on the JS side split it into substrings? I wonder if that messes with the GC on the JS side of things.

              • boomskats 14 hours ago

                How would splitting it into substrings be different from decoding individual strings from an allocation/gc perspective? If anything I'd assume splitting a substring was more efficient - i expect there's a ton of optimisations in js for sliced strings or whatever as it's been around for ages.

                • vanderZwan 13 hours ago

                  I imagine it's faster during creation because there's fewer allocations for a backing array for the string content (one, basically, unless they move stuff around). But then that can also mean holding on to the entire backing array even if only one of the strings is still "alive", unless there are optimizations for reclaiming memory in those situations too.

                  • monster_truck 9 hours ago

                    I'm pretty sure turbofan handles this, you might need to do a little hoisting or tagging

              • nhatcher 19 hours ago

                I wrote a while back about a somewhat related issue:

                https://www.nhatcher.com/post/should_i_import_or_should_i_ro...

                The code is a bit outdated, but the principle of linking against the browser implementation stands

                • CyanLite2 19 hours ago

                  Sad that this isn’t natively in browsers…

                  • bcardarella 19 hours ago

                    How does the performance compare to projects like Wasmtime?

                    • Evan-Almloff 19 hours ago

                      The two projects have different usecases so they can't be directly compared. Slegehammer bindgen makes calling javascript from rust faster in the browser. Wasmtime is a native runtime for WASM outside of the browser

                    • MuffinFlavored 14 hours ago

                      I think there is a ton of room left on the table here for innovation.

                      Context: as far as I know Electron is still the king if you want to do (unsafe but performant) "IPC/RPC" between native and a webview.

                      All of the other options that exist in other languages (Deno, Rust, you name it) do the same "stringified JSON back and forth" which really isn't great for performance in my opinion.

                      It'd be cool if (obviously in a sandboxed or secure way) you could opt in to something albeit a bit reckless, but some way to provide native methods for the WASM part of V8 and its WebView (thinking Electron-esque here) to call.

                      • boomskats 14 hours ago

                        I'm not sure if I'm understanding you correctly, but vanilla wasm ipc works by sharing linear memory, where it's up to the implementation to choose the data encoding (arrow/proto/whatever). In the case of wasm-bindgen's dom manipulation api, the implementation serialises individual commands and sends them over the boundary, with any string params for each command being deserialised individually, and this project improves on that by batching them all into one big string thus reducing the deserialisation overhead. However, the string encoding is specific to that use case - it's not a general wasm ipc mechanism.

                        VSCode IPC is kinda similar as it's designed to facilitate comms over an enforced process isolation barrier to protect the main thread from slow extensions etc. but it's actually IPC there (as in, there are multiple processes at the os level). The wasm/js stuff is handled within the same v8 context - it's not actually ipc.

                        (Happy to be corrected here, but this is my understanding)

                        • MuffinFlavored 13 hours ago

                          https://github.com/webview/webview_deno

                          Tell me how you'd do "native C/C++ FFI (to like a .so or .dylib or .dll)" between the webview using WASM or anything other than "WebKit's built in JSON-string based IPC"

                          Like a <button> that triggers a DLL call. How would you achieve it with WASM? How does WASM act as the bridge to the DOM and/or native? It doesn't, right?

                      • nicoburns 10 hours ago

                        We're building a brand new webview that has a native code (Rust) API to the DOM. That way your native extension doesn't have to go through JavaScript at all. Currently it doesn't have any JS support, but it could be added.

                        https://github.com/DioxusLabs/blitz/

                        • MuffinFlavored 7 hours ago

                          How could/would you do "DOM event -> native FFI dlsym-type call"?