Skip to content

Conversation

@kmaasrud
Copy link
Collaborator

@kmaasrud kmaasrud commented Mar 15, 2023

I've started implementing the CowStr type, heavily based on pulldown_cmark's similar type. Most necessary traits should be in place, but I have not yet tried introducing it to the codebase (currently, I am just importing it via a custom name JotdownCowStr.)

hellux added 4 commits March 20, 2023 23:39
better to provide the original url, the event is already tagged as email

also avoids a string allocation
keep the tag for unresolved links, and allow distinguishing between
`[tag][tag with empty url]` and `[tag][non-existent tag]`.

closes #26
Merge branch 'unresolved_links'

closes #27
ci job still goes green when fuzzing fails, otherwise
@kmaasrud kmaasrud force-pushed the cowstr branch 2 times, most recently from efbc56a to 75dcfb2 Compare March 21, 2023 17:01
hellux added 8 commits March 21, 2023 20:57
Try to make rendering of each event independent.

The only case where we need to peek is when a backref link should be
added to the last paragraph within a footnote.

Before, when exiting a paragraph, we would peek and add the link before
emitting the close tag if the next event is a the footnote end.

Now, the paragraph end event skips emitting a paragraph close tag if it
is within a footnote. The next event will then always close the
paragraph, and if it is a footnote end, it will add the backref link
before closing.
no longer useful as no peeking is needed, use simple early exit instead
separate input/output from rendering state
They apply more to the Render trait now than the implementation in the
html module
derive push/write automatically from these
allow rendering iterators with borrowed events

resolves #24
allow comparing between rendering owned, borrowed or cloned events
@kmaasrud kmaasrud force-pushed the cowstr branch 2 times, most recently from f4b9e8d to 4d4d1c1 Compare March 21, 2023 21:39
@kmaasrud kmaasrud marked this pull request as ready for review March 21, 2023 21:39
@kmaasrud kmaasrud requested a review from hellux March 21, 2023 21:39
@hellux
Copy link
Owner

hellux commented Mar 21, 2023

e.g. readme/all benchmarks seem to suggest no significant benefit currently (4d4d1c1 compared to 16491a4):

block
  Instructions:             1212464 (+0.164150%)
  L1 Accesses:              1607567 (+0.196083%)
  L2 Accesses:                 1810 (-0.330396%)
  RAM Accesses:                2181 (+0.553250%)
  Estimated Cycles:         1692952 (+0.209303%)

block_inline
  Instructions:             5222063 (-0.180426%)
  L1 Accesses:              7517519 (-0.143724%)
  L2 Accesses:                 3109 (-11.39926%)
  RAM Accesses:                2549 (+0.750988%)
  Estimated Cycles:         7622279 (-0.159213%)

full
  Instructions:             5719667 (-0.113706%)
  L1 Accesses:              8098320 (-0.031009%)
  L2 Accesses:                 5213 (-21.37255%)
  RAM Accesses:                3023 (+0.800267%)
  Estimated Cycles:         8230190 (-0.106288%)
full/readme             time:   [226.23 µs 226.28 µs 226.33 µs]
                        thrpt:  [53.096 MiB/s 53.108 MiB/s 53.120 MiB/s]
                 change:
                        time:   [+2.3752% +2.4656% +2.5533%] (p = 0.00 < 0.05)
                        thrpt:  [-2.4897% -2.4063% -2.3201%]
                        Performance has regressed.

having a custom type adds some complexity and inconvenience, so it should have a noticeable performance benefit if we want to go through with it.

might be possible to look at benchmarks where there are more short cowstrings, though.

@hellux
Copy link
Owner

hellux commented Mar 21, 2023

should be possible to avoid allocation for concatenation when we have an inline? e.g. in Attributes::insert:

            if key == "class" {
                *prev = format!("{} {}", prev, val).into();
            } else {
                *prev = val;
            }

@kmaasrud
Copy link
Collaborator Author

I think only the CowStr itself is needed for it to be functional, but we would expect no change in performance then (unless e.g Box is used for owned variant). But adding the Inlined variant might not make any difference unless the push_str is added. replace shouldn't have any impact as it is not used internally.

That's why I was thinking that a complete CowStr (with the Inlined variant and String contained within the Owned variant) would be good to benchmark against our existing Cow. After that, I could add another commit that replaces Owned(String) with Owned(Box) and assess the performance of that change separately.

@kmaasrud kmaasrud force-pushed the cowstr branch 3 times, most recently from ae8d721 to 24418f9 Compare April 1, 2023 20:18
@kmaasrud
Copy link
Collaborator Author

kmaasrud commented Apr 2, 2023

@hellux Here is the full/all benchmark comparing the commit just before and after the introduction of the new CowStr. I have not added any new benchmarks with many short strings, so this seems promising, doesn't it?

Commit Throughput Change
14f6c41 inline: store attributes in vec (squash) 67.958 MiB/s -
18363ea feat: add custom CowStr type 73.363 MiB/s +8.0656%

@hellux
Copy link
Owner

hellux commented Apr 3, 2023

@hellux Here is the full/all benchmark comparing the commit just before and after the introduction of the new CowStr. I have not added any new benchmarks with many short strings, so this seems promising, doesn't it?
Commit Throughput Change
14f6c41 inline: store attributes in vec (squash) 67.958 MiB/s -
18363ea feat: add custom CowStr type 73.363 MiB/s +8.0656%

I'm not able to reproduce similar results on my machines:

AMD 5950X, Zen 3:

commit                                                      cycles                 throughput
14f6c4188 inline: store attributes in vec (squash)          6127389 (-4.265530%)   55.252
c5814c912 feat: add custom CowStr type                      6251397 (+2.023831%)   54.922
200091463 cowstr: add docstrings                            6251397 (No change)    54.820
f067cbd17 cowstr: add tests for push, push_str and replace  6251397 (No change)    55.039

Intel i5-8250U, Coffee Lake:

commit                                                    cycles                 throughput
14f6c41 inline: store attributes in vec (squash)          6564735 (+0.965820%)   34.945
c5814c9 feat: add custom CowStr type                      6526309 (-0.585340%)   34.929
2000914 cowstr: add docstrings                            6526309 (No change)    34.891
f067cbd cowstr: add tests for push, push_str and replace  6526309 (No change)    34.891

I wouldn't expect an improvement to the full/all benchmark unless the cowstr size has been reduced, behavior should be similar in most cases. Should only notice a difference if there are many small cowstrs, which I think we need to add a benchmark for.

Would be interesting to see benchmarks when the cowstr size has been reduced, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CowStr

4 participants