Skip to content

Conversation

@laurenthuberdeau
Copy link
Collaborator

@laurenthuberdeau laurenthuberdeau commented Mar 17, 2025

Making this PR to see if it reduces the execution time of dash and zsh, it may or may not be merged.

Context

On certain shells, scripts slow down the more variables there are in the environment. This is generally caused by the data structure storing the environment not being adapted to contain this many variables (i.e. dash's 37 entry hash table), and can make the execution time quadratic even when algorithms are linear, as each variable lookup takes a linear time. This means small memory reductions can significantly reduce execution time, as demonstrated in #77.

This PR optimizes the instruction encoding to use more compact encoding of common instructions:

  • Using 8-bit displacement for memory operands when possible
  • Zeroing out a register with xor is done on 32-bit registers since 32-bit operations clear the upper 32-bits of the 64-bit registers.
  • Encode immediates as bytes (most literals are small) or sign-extended 32-bit literals.
  • Remove redundant mov operations

This PR doesn't touch the exe code generator, where a lot more optimizations are possible but would increase code complexity.

Results

Methodology:

> ./bootstrap-pnut-exe.sh --backend <backend>
> wc --bytes build/pnut-x86-by-pnut-x86-by-gcc.exe
Backend Size before Size after pnut-sh.sh pnut-exe.c pnut-exe.sh pnut-exe.c
i386_linux 180826 146726 TODO TODO
x86_64_linux 243114 178599 TODO TODO

Other possible memory optimizations

The code buffer stores bytes of machine code but has int[] as its type. Using the full 32 bits of ints could cut by 4 the number of environment variables used to store machine code, but labels rely on the extra space to store metadata that's used to resolve addresses that are not yet known.

Base automatically changed from laurent/support-fixed-width-data-types to main April 2, 2025 16:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

x86.c:263

  • Using -large_imm for non-positive values in mov_reg_large_imm is incorrect because it reverses the sign of negative immediates, potentially moving a different value than intended. Consider passing large_imm directly or handling negative values appropriately.
if (large_imm <= 0) { mov_reg_imm(dst, -large_imm); }

On x86, x86 instructions touching 32-bit registers
generally zero-out the upper 32 bits of the register.
Since xoring a register with itself is a common way to zero it out,
we can use the 32-bit instruction to same the REX prefix and have a
smaller instruction.
@laurenthuberdeau laurenthuberdeau force-pushed the laurent/optimize-exe-size branch from cb408cd to 6e1d544 Compare December 28, 2025 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants