-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always emit constants in bytecode #3976
base: main
Are you sure you want to change the base?
Conversation
Currently, the bytecode generator adds constants to a const_buf. In doing this, it searches the const_buf for existing consts of the same value. This ends up being quadratic in the number of constants. I thought of adding a hashmap to make the lookup quicker, but I thought we could maybe just emit an instruction to push the constant onto the stack (which is what happens already if the const_buf is full). This will increase the generated bytecode size, but it makes the bytecode generator a lot faster (e.g. I saw get_const_offset taking 25% of the loading time on a 12kb module).
930086d
to
440a16b
Compare
I just wanted to open this PR to start a discussion about the tradeoffs between having this constant list and emitting loads vs. simply pushing the value onto the stack. |
I haven't looked at how much this increases module size, but it definitely decreases bytecode generation time (previously it was like 25% of the time in get_const_offset) |
The data will be helpful since involving constant space is based on a trade-off. @xujuntwt95329 @wenyongh Please join us. |
If we didn't want to pay the price in bytecode size, we could use a hashmap of value->offset to do the searches in ~constant time, at the cost of O(n) memory. |
The bytecode size or memory consumption is really important for us since fast-interp already increases footprint, the main design goal of constant offsets is to reduce the footprint since there may be many duplicated constants. Using hashmap is a choice, another choice may be to just sort the |
Would this approach require a second pass over the module data to collect all the constants? If you've already emitted slot indexes, we now cannot move the const from that slot (e.g. to insert a const before it) without rewriting the bytecode that was previously emitted. Also insertion sort like this is going to have average n^2 performance. Am I missing something? So, I think the choices are either a separate pass to collect all the constants, then sort them and binary search, or a hashmap of value -> index which we maintain alongside the const_buf to speed up searches. |
I mean that the sort/search operations are on the array of
In fact the prepare_bytecode of fast-interp scans the bytecode twice, now it seems that the two passes do the same on the Using hashmap is a good way, my understanding is that the Const struct array is kept and you will create a new hashmap? |
A different process for second traverse is a good idea, but it may waste some space when there are multiple duplicated values. Maybe there can be some special process for some common value (e.g. 0, 1) to reduce the waste. But anyway this is a tradeoff, seems there isn't a best way for all scenarios. |
Agree, there is realy not a best way. Since there are already two traverses in the loader of fast-interp, my suggestion is not to introduce an extra traversing, which also takes time and degrades performance. Instead, had better leverage the first traverse to do something. One way that I can think is to mark down the bytecode positions and calculate the total i64/f64 and i32/f32 const counts in the first traverse, for example: we add two link lists to the loader context, one is for i64/f64 with node like typedef struct I64Node {
uint8 *bytecode;
struct I64Node *next;
} I64Node;
typedef struct I32Node {
uint8 *bytecode;
struct I32Node *next;
} I32Node;
struct WASMLoaderContext {
...
I64Node *i64_node_list;
I32Node *i32_node_list;
uint32 i64_node_count;
uint32 i32_node_count;
...
}; And then add i64/f64.const bytecodes one by one to the first list and add i32/f32.const bytecodes one by one to the second list, and increase Then after the first traverse, we allocate a big buffer (an int64 array) for i64 nodes, traverse i64_node_list and copy i64/f64 const to the array one by one, and qsort the array buffer, and then remove duplicated consts. And same for i32_node_list. After that we can allocate a merged buffer for the two array, copy data to the new buffer and then free all previous buffer (two lists, two array buffer). Then in the second traverse, we can just use binary search (bsearch) to get the const offset when emitting bytecode. |
No description provided.