-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First WAMR function call takes significantly longer time due to stack boundary calculation #3966
Comments
I think the problem is that this check in glibc pthreads returns |
Would the WAMR project be amenable to moving the cost of stack boundary computation earlier in the process, e.g. in exec env creation? This is done in #3967. |
For boundary checking, WAMR calls `pthread_attr_np`, which is unfortunately quite slow on Linux when not called on the main thread (see bytecodealliance#3966 for discussion). This change moves the cost of stack bounds checking earlier in the wasm_exec_env creation process. The idea is that it's perhaps better to pay the price when creating the execution environment rather than in the first function call. The original code is left in place inside `call_wasm_with_hw_bound_check` in case the `wasm_exec_env` is created via `wasm_runtime_spawn_exec_env`.
Just changed the title, because this happens with hardware bounds checking disabled as well. |
have you (or someone) reported the issue to glibc? |
There is a discussion of a proposed fix here: https://sourceware.org/pipermail/libc-alpha/2022-September/141932.html, in response to https://internals.rust-lang.org/t/who-is-doing-read-proc-self-maps-1024-at-startup/17348/9. So the issue is known, but I don't think a solution has been agreed and there hasn't been any discussion for a while. |
For boundary checking, WAMR calls `pthread_attr_np`, which is unfortunately quite slow on Linux when not called on the main thread (see bytecodealliance#3966 for discussion). This change moves the cost of stack bounds checking earlier in the wasm_exec_env creation process. The idea is that it's perhaps better to pay the price when creating the execution environment rather than in the first function call. The original code is left in place inside `call_wasm_with_hw_bound_check` in case the `wasm_exec_env` is created via `wasm_runtime_spawn_exec_env`.
For boundary checking, WAMR calls `pthread_attr_np`, which is unfortunately quite slow on Linux when not called on the main thread (see #3966 for discussion). This change moves the cost of stack bounds checking earlier in the wasm_exec_env creation process. The idea is that it's perhaps better to pay the price when creating the execution environment rather than in the first function call. The original code is left in place inside `call_wasm_with_hw_bound_check` in case the `wasm_exec_env` is created via `wasm_runtime_spawn_exec_env`.
Engine: WAMR Fast Intepreter
On the first call to
wasm_runtime_call_wasm
with HW bounds checking enabled, WAMR ultimately ends up callingcall_wasm_with_hw_bound_check
, which has this:Unfortunately this call to
pthread_getattr_np
is very slow on the main thread, an issue that has been noted in other projects (e.g. golang/go#68587)In a particular environment we have, the first call takes 9ms, the second and subsequent calls take 0.5ms.
The text was updated successfully, but these errors were encountered: