The parent comment noted that using jemalloc for Ruby *without* compiling is alr...

FooBarWidget · on July 18, 2021

In my comment, as well as in the FAQ, I explain why even using Jemalloc _without_ compiling Ruby has its own caveats.

The Jemalloc version matters a lot. For reasons that are not yet clear, significant memory savings are only achieved with Jemalloc 3, not with Jemalloc 5. Your distribution only ships one Jemalloc version. So likely you need to compile Jemalloc 3 yourself. Here you are already entering compilation land.

But Jemalloc 3 no longer compiles by default on some modern distributions, such as Debian 10. Fullstaq Ruby fixes this by patching Jemalloc for you.

Furthermore, which Ruby binaries are you using? The ones provided by the Linux distribution are perpetually outdated. Another of Fullstaq Ruby's value proposition is that we supply binaries for the latest Ruby version, quickly. We packaged Ruby 3.0 on the same day it came out.

"Fullstaq Ruby vs LD_PRELOADing Jemalloc yourself": https://github.com/fullstaq-labs/fullstaq-ruby-server-editio...

wgjordan · on July 18, 2021

Thanks, these extra details are helpful and seem like the more significant motivations underlying this distribution.

> For reasons that are not yet clear, significant memory savings are only achieved with Jemalloc 3, not with Jemalloc 5.

So the real advantage of this package is that it bundles a 6+ year-old, unsupported version of Jemalloc, because the more recent versions found in current OS distributions don't yield memory savings in practice- for unknown reasons. This doesn't instill very much confidence.

I would be much more excited by efforts to investigate the jemalloc > 3.x changes so Ruby can work optimally with current releases packaged in modern Linux distributions, rather than double-down on a workaround that requires bundling an increasingly-ancient version of the software.

I should also add - as mentioned by the jemalloc author [1], the addition of the time-based purging feature is likely responsible for memory-usage differences between jemalloc 3.x and 5.x, so you can reduce `dirty_decay_ms` and `muzzy_decay_ms` to get 3.x-like memory usage. I have been using this configuration in production since 2018 for significant memory savings in Ruby using jemalloc 5.x.

[1] https://bugs.ruby-lang.org/issues/14718#note-86

FooBarWidget · on July 18, 2021

I think that is an overly cautious take on things. I'll explain why.

First, "for unknown reasons" deserves more nuance. The vague, high-level reason is clear: Jemalloc 3 behaves differently from Jemalloc 5, having different algorithms and data structures. What I mean by unknown is not so much an indication of incomprehensible arcane magic, and that things can collapse at any time.

What I mean is that it's not known in what way the algorithms and data structures are different. Consider that before I did my 2019 research on why Ruby memory bloating occurs[1], Ruby apps suffered from memory bloat "for unknown reasons". That didn't mean that before 2019, all Ruby apps were houses of cards waiting to fall over.

It's like saying "I don't understand why this Linux kernel upgrade made things faster" -- the kernel developers know but they have better things to do than to answer your questions. And the fact that knowledge about a new optimization in the Linux kernel is not widespread, does not mean that that kernel version is unstable.

Nobody truly understands every single detail about all parts of the stack. Yet I can build reliable, high-available web apps just fine without understanding how for example how 5G works and why users on 5G can access my app faster than on 4G.

The differences between Jemalloc 3 and 5 are not explicitly documented anywhere, and to find out requires research. I intend on doing that some time in the future, but not now. Jemalloc 3 is proven to work, it's proven to be stable. The combination of Ruby + Jemalloc is proven to work well, not only because we've had several years of user feedback now, but also because Github has tested this combination for years now even before Fullstaq Ruby.

The pragmatic thing to do is not to prioritize figuring out exactly how Jemalloc 5 works. It's to continue the packaging work to make Ruby + Jemalloc 3 available to the public. Jemalloc 5 can wait.

[1] https://www.joyfulbikeshedding.com/blog/2019-03-14-what-caus...

wgjordan · on July 18, 2021

Aside- thanks for engaging in detailed discussion here and more generally for all of the Ruby-performance contributions you've made over many years, I truly appreciate them.

First, my lack of confidence in depending on Jemalloc 3 in production is not only the 'unknown reasons' underlying such a frozen dependency, but also due to the fact that this particular dependency is over six years old and unmaintained. Not only does this lack more recent security/bug fixes and features, but also makes it more complex to integrate with up-to-date Linux distributions (e.g., your need to maintain custom compilation patches instead of simply depending on the OS's jemalloc package).

> What I mean is that it's not known in what way the algorithms and data structures are different. [...] The differences between Jemalloc 3 and 5 are not explicitly documented anywhere, and to find out requires research.

As I mentioned, the jemalloc developer already highlighted the exact differences back in 2018, and he even provided a MALLOC_CONF environment variable to use that makes memory usage in jemalloc 5 behave like jemalloc 3:

> You could verify this by setting dirty decay and muzzy decay to 0 in the MALLOC_CONF environment variable (i.e. MALLOC_CONF="dirty_decay_ms:0,muzzy_decay_ms:0", unless I've typoed something).

There is also a very readable page on performance tuning in the jemalloc 5 documentation [1].

In your research, have you ever tried running jemalloc 5 with this configuration? I did this back in 2018, tested/verified against my production workload, and have been running Ruby on jemalloc 5 without any issues since.

All that's involved is installing your OS's 'jemalloc' package and setting two environment variables (LD_PRELOAD and MALLOC_CONF). Simple enough and more confidence-inspiring than maintaining a patch against a six-year-old frozen dependency if you ask me.

> The pragmatic thing to do is not to prioritize figuring out exactly how Jemalloc 5 works. It's to continue the packaging work to make Ruby + Jemalloc 3 available to the public. Jemalloc 5 can wait.

I disagree about the relative priorities- I spent a day tuning Jemalloc 5 for my team's Ruby application back in 2018 [2] and it's been a done issue for us since then.

[1] https://github.com/jemalloc/jemalloc/blob/dev/TUNING.md

[2] https://github.com/code-dot-org/code-dot-org/pull/24676#issu...

FooBarWidget · on July 19, 2021

I did try those settings, but was unable to reproduce the same amount of memory savings as Jemalloc 3. But that was quite a while ago, and I need to revisit this some time in the future.

By the way, Fullstaq Ruby does not only provide Jemalloc-patched versions. We also provide an unpatched version (and also a version with only malloc_trim) — in which case Fullstaq Ruby's main value add becomes DEB/RPM packaging only.