Added AVX1 support for salsa and chacha rounds by kangaderoo · Pull Request #1 · ghostlander/cpuminer-neoscrypt

kangaderoo · 2015-02-15T15:36:35Z

Code is in C for better maintainabilty. ASM derived from these files
might increase speed slightly.
Current speed increase compared to SSE routines about 10%

Code is in C for better maintainabilty. ASM derived from these files might increase speed slightly. Current speed increase compared to SSE routines about 10%

Make the config work with the new files

ghostlander · 2015-02-17T16:14:06Z

Thanks, I plan to add the AVX/XOP assembly code in the future and may use your inline assembly as a reference. SSE2 4-way is also going to be improved.

kangaderoo · 2015-02-18T10:45:21Z

I was kind of wondering where your speed increase from the 4-way is
originated.
Guess I still have to rewrite the the KDF compress to inline assembly.
I guess this function is a good candidate to optimize to 4-way, or maybe
8-way, depending on the XMM requirements.

The original CpuMiner had a scrypt 3-way and a SHA256 4-way, resulting
is the best result running a 12-way on AVX1.
Scrypt 3-way contained 3 'matrices' in XMM registers, keeping 4 XMM
register free for calculating functions etc.
It seems that XMM//XMM operations run 3 times faster then XMM//Memory
operations.

Due to the mixing behavior (4 times a 4x4 matrix) of neo-scrypt it looks
like that for salsa and cha-cha 1-way would need the minimum of
memory moves.

Unfortunately my development environment doesn't have AVX2, but the
in-line assembly code could easily be rewritten to
support the 256bits YMM registers.

John Doering schreef op 2/17/2015 om 5:14 PM:

Thanks, I plan to add the AVX/XOP assembly code in the future and may
use your inline assembly as a reference. SSE2 4-way is also going to
be improved.

—
Reply to this email directly or view it on GitHub
#1 (comment).

Increase hashing speed by running 3 calc in parallel. Eliminate simd latency by smart sequencing. ~25% speed increase observed.

kangaderoo added 2 commits February 15, 2015 16:33

Added AVX1 support for salsa and chacha rounds

8fb5263

Code is in C for better maintainabilty. ASM derived from these files might increase speed slightly. Current speed increase compared to SSE routines about 10%

Merge the new files and the build env.

2ae7b82

Make the config work with the new files

kangaderoo added 5 commits March 8, 2015 16:58

use a 128 bit xor with sse/avx

f534bfc

use blake2 avx code

1f59593

memory alloc alligment for avx/sse and clean-up

3b69b45

Added a hashing X3

06c5d4b

Increase hashing speed by running 3 calc in parallel. Eliminate simd latency by smart sequencing. ~25% speed increase observed.

Enable extranonce subscription

7532b59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Added AVX1 support for salsa and chacha rounds#1

Added AVX1 support for salsa and chacha rounds#1
kangaderoo wants to merge 7 commits intoghostlander:masterfrom
kangaderoo:master

kangaderoo commented Feb 15, 2015

Uh oh!

ghostlander commented Feb 17, 2015

Uh oh!

kangaderoo commented Feb 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

kangaderoo commented Feb 15, 2015

Uh oh!

ghostlander commented Feb 17, 2015

Uh oh!

kangaderoo commented Feb 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants