The 🤯 truth about heap memory!

A.k.a. the complexities of computational complexity

Probably only two people read my arena allocation article last week (even my employer, who begged me to write, did not post), but one person did seem to read it (and to the other: hi, mom!). Okay, maybe a few other people skimmed it, but stopped when they realized it was just an arena allocator as a precursor to a garbage collector.

I know one person read it fully, because this person let me know I was full… of garbage. And not because I had stopped before the actual garbage, collection.🄁 No, apparently, the complaint came due to an article last week people actually did read:

šŸŽ¤"Hey, you’re wrong about how unused pages work. See https://danielchasehooper.com/posts/segment_array/. The point of his article is that pre-allocating pages like you’re doing just wastes memory."

I believe the thing I said that was at issue is, if you ask mmap() for zero-mapped pages, every single page will start out mapped to the same physical page of memory that only contains zeros. Only when you write to a page will you start using memory (this is ā€œCopy-on-Writeā€ semantics).

When someone is so brazenly telling me off on something like this, young me would have quipped back, ā€œIncorrect; don’t believe me? Check for yourself.ā€ (actually, nothing even remotely that nice). And, not everyone has enough credits remaining with their AI to get it to write the code this month, and it’s a good question (and I believe it was a question, even if they didn’t play by Jeopardy rules, and TOLD me defiantly).

So I did what mature me does— totally fail to send a response. But, I guess the weekend came, and I’ve decided to write one. I’ve sat on it overnight, worried that I might alienate my one reader (sorry mom, you don’t count, so I’m just going to address the other person directly).

About that other article

Sure, I read that article too. I liked it; especially because it does a good job surfacing its own systems-level truth that most engineers don’t understand. Specifically, it’s absolutely true that memory speeds are significantly slower than CPU speeds these days, and in many cases, the work you’re going to do per memory access is not worth worrying about, until profiling data shows otherwise.

I’m also a big fan of the complexity theory hack he’s using to claim O(1) (constant-time) array accesses. I’ve used the same basic idea many times myself. For instance, for many of the lock-free algorithms I’ve worked on:

  1. Doubling the store each time you need to resize generally helps make it easy to demonstrate you haven’t ruined the computational complexity of the single-threaded algorithm.

  2. Using exponential backoff on contention is often all you need to be able to say that your previously ā€œlock-freeā€ algorithm is a ā€œwait-freeā€ algorithm.

While h4x0ring computational complexity CAN turn into real-world performance gains, it often does not. Make no mistake, the approach is absolutely a hack. I’ll show you why in a minute.

The segment array essentially provides an array-like API, but the backing store is grown as needed, without ever moving anything (by adding new segments). As the array grows, eventually the segments double in size when they get added.

Here’s his example data structure:

typedef struct {
    u32 count;
    int used_segments;
    u8 *segments[26];
} SegmentArrayInternal;

The array in the last field is a pointer to memory allocations, but only the first pointer starts out populated. If you keep inserting, then at some point, every time you add a segment they double in size.

Dear reader, I don’t think you came to your conclusion on how mmap() works from the fact that segments are not pre-filled in. You may have, but I respect you (you did take the time to read and give me feedback). And, more to the point, you’re obviously smart enough to have considered that he just might not have known.

So after a careful re-read of that other article, I did find a tiny bit of evidence, a single place that’s misleading–an aside about using the data structure as a backing store for a dictionary:

āœļø"You can ensure the segment array’s capacity is always a power-of-two by making the first two segments the same size. It makes the code less nice, but it’s useful when using it as the backing array for a power-of-two-sized hash tableĀ and you don’t want it to waste ~50% of its memory."

That does kind of sound like he believes that, if you have pages alloced, and the dictionary only uses up to the power-of-two, unused pages are wasting memory.

But I’m positive he knows what he’s talking about, and that sentence is just confusing:

  1. His article explicitly states that segment arrays are a good choice, when they’re backed by an arena.

  2. When he compares options (albeit quickly), he mentions a ā€œvirtual memory arrayā€, and mentions that you pay for what you use. This makes it clear he absolutely does understand how paging works.

He even very explicitly says that:

āœļø"I’ve found the segment array useful in situations where I’m generating an unknown number of items over time and can’t use a Virtual Memory Array [sic]"

Though, as smart as you are, you might rightfully ask, if a “virtual memory array” is basically done as a single, arena dedicated data structure, and if you can pre-allocate tons of stuff and not pay for it, then why would you ever need anything else?

Well, to see that, let’s first go ahead and make sure I was right in the first place.

If I’m not right, what’s left?

I’m currently using a laptop with 64GB of RAM. Let’s see what I can mmap(), and see how that impacts memory.

What I’m going to do is:

  1. Start by asking the OS to map enough pages, so that they would, if all loaded at once, take up 1/4 of my system’s physical memory.

  2. See how much memory I’m actually using (whether in physical memory or not).

  3. Unmap those pages.

  4. Double the amount of memory I’m going to ask for next time.

  5. On subsequent iterations, write onto some different pages, just so you can see how the OS handles it.

1. First, comma

Before we begin, let’s take a quick ADD-driven diversion… one easy trick to make number formatting more sane in C.

This may be me imparting knowledge you maybe already had, but it took me decades of C experience to learn, so I’d like to share while I’m thinking about it. Consider it an olive branch.

When we print the full number of bytes, the lack of commas we get from *printf() can make things hard to read. There’s a magic incantation that will allow us to put a single quote in our numeric format specifiers, to get commas to print.

While the option is in the man pages, it never seemed to work when I’d tried it over the years. But sometime in the last six months, I stumbled upon the answer (probably in someone’s code who is far less ADD).

Here’s an example to show:

#include <stdio.h>
#include <stdint.h>
#include <locale.h>

int
main(int argc, cha<r *argv[], char *envp[])
{
    uint32_t number = 8675309;

    printf("Before setlocale(): %'d\n", number);
    setlocale(LC_NUMERIC, "");
    printf("After setlocale(): %'d\n", number);

    return 0;
}

But run the above code, and I get:

Before setlocale(): 8675309
After setlocale(): 8,675,309

Why did it decades for me to figure this out? I guess because I assumed that my platform just didn’t support the modifier, because every time I tried it, it didn’t do anything. If it works, then why wouldn’t doing something be the default behavior? It wouldn’t make sense for the modifier to not do anything by default, right?

I guess I should assume neither anglo-centrism, nor rational thought?

As for our numbers, if we’re just printing power of two, do we really need to print the whole number? Of course not.

Here’s a helper we’re about to use, that essentially truncates, rounding down (we’re just going to be writing into a stack-allocated buffer):

#include <stdint.h>
#include <string.h>

// Up to 4 digits, 1 space, 1 null, 2 letters.
#define MAX_LEN 8

const char *sizes[] = {
    "b",
    "kb",
    "mb",
    "gb",
    "tb",
    "pb",
};

void
convert(size_t bytes, char buf[MAX_LEN])
{
    uint32_t i = 0;
    char    *p = buf;
    uint32_t n;

    for (; i < sizeof(sizes) / sizeof(char *); i++) {
        if (bytes < 1024) {
            break;
        }
        bytes /= 1024;
    }

    if (bytes >= 1000) {
        *p++ = (bytes / 1000) + '0';
        n    = bytes % 1000;
    }
    if (bytes >= 100) {
        *p++ = (bytes / 100) + '0';
        n    = bytes % 100;
    }
    else {
        n = bytes;
    }

    if (bytes >= 10) {
        *p++ = (n / 10) + '0';
    }

    *p++ = n % 10 + '0';
    *p++ = ' ';
    strcpy(p, sizes[i]);
}

Hopefully the rest of the program is self-explanatory:

#if defined(__linux__)
#define _DEFAULT_SOURCE
#endif

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <locale.h>
#include <sys/mman.h>
#include <sys/resource.h>

// On my laptop, the page size is 16K, so this is 16GB,
// which is 1/4 of my physical memory.
#define START_SIZE 1 << 20

int
main(int argc, char *argv[], char *envp[])
{
    size_t  page_size = getpagesize();
    size_t  n         = START_SIZE;
    uint8_t ix        = 0;
    char    buf[MAX_LEN]   = {
        0,
    };
    uint64_t      total_size;
    char         *pages;
    struct rusage usage;

    setlocale(LC_NUMERIC, "");
    getrusage(RUSAGE_SELF, &usage);
    convert(usage.ru_maxrss, buf);
    printf("Page size: %'zu bytes\n", page_size);
    printf("Baseline rss usage: %'ld bytes. (%s)\n", usage.ru_maxrss, buf);

    while (true) {
        total_size = page_size * n;
        convert(total_size, buf);
        printf("Try %u: mmap %'zu pages %'llu bytes (%s / %p): \n",
               ix,
               n,
               (unsigned long long)total_size,
               buf,
               (void *)total_size);
        pages = mmap(NULL,
                     total_size,
                     PROT_READ | PROT_WRITE,
                     MAP_ANON | MAP_PRIVATE,
                     -1,
                     0);
        if (pages != MAP_FAILED) {
            getrusage(RUSAGE_SELF, &usage);
            convert(usage.ru_maxrss, buf);
            printf("Success (%p - %p); rss after mmap: %s\n",
                   pages,
                   pages + total_size,
                   buf);

            if (ix) {
                for (int i = 0; i < ix; i++) {
                    pages[i * page_size] = 'a';
                }
                getrusage(RUSAGE_SELF, &usage);
                convert(usage.ru_maxrss, buf);
                printf("Touched %d pages. New rss: %'ld bytes (%s).\n",
                       ix,
                       usage.ru_maxrss,
                       buf);
            }
            ix++;
            munmap(pages, total_size);
        }
        else {
            printf(
                "Nope, that's past our power-of-two limit.\n"
                "errno = %d (%s).\n",
                errno,
                errno == ENOMEM ? "ENOMEM" : strerror(errno));
            exit(0);
        }
        n <<= 1;
        printf("\n");
    }
}

Time for the moment of truth:

Page size: 16,384 bytes
Baseline rss usage: 1,359,872 bytes. (1 mb)
Try 0: mmap 1,048,576 pages 17,179,869,184 bytes (16 gb / 0x400000000):
Success (0x300000000 - 0x700000000); rss after mmap: 1 mb

Try 1: mmap 2,097,152 pages 34,359,738,368 bytes (32 gb / 0x800000000):
Success (0x7000000000 - 0x7800000000); rss after mmap: 1 mb
Touched 1 pages. New rss: 1,359,872 bytes (1 mb).

Try 2: mmap 4,194,304 pages 68,719,476,736 bytes (64 gb / 0x1000000000):
Success (0x7000000000 - 0x8000000000); rss after mmap: 1 mb
Touched 2 pages. New rss: 1,376,256 bytes (1 mb).

Try 3: mmap 8,388,608 pages 137,438,953,472 bytes (128 gb / 0x2000000000):
Success (0x7000000000 - 0x9000000000); rss after mmap: 1 mb
Touched 3 pages. New rss: 1,392,640 bytes (1 mb).

...

Try 11: mmap 2,147,483,648 pages 35,184,372,088,832 bytes (32 tb / 0x200000000000):
Success (0x7000000000 - 0x207000000000); rss after mmap: 1 mb
Touched 11 pages. New rss: 1,523,712 bytes (1 mb).

Try 12: mmap 4,294,967,296 pages 70,368,744,177,664 bytes (64 tb / 0x400000000000):
Success (0x7000000000 - 0x407000000000); rss after mmap: 1 mb
Touched 12 pages. New rss: 1,540,096 bytes (1 mb).

Try 13: mmap 8,589,934,592 pages 140,737,488,355,328 bytes (128 tb / 0x800000000000):
Nope, that's past our power-of-two limit.
errno = 12 (ENOMEM).
šŸ—’ļøThe above is on my Mac laptop. My linux box by default gives me 2K pages, and limits me to 33M of them, capping me out at 128gb as-configured. It only ever uses 2kb of memory.

When the process starts up, we haven’t explicitly reserved any memory beyond what is being used to load the process, and we’re using a little more than 1Mb of memory, just doing nothing.

By the way, rss here is not for blog reading. It means, ā€œResident Set Sizeā€, which is essentially how much physical memory we’re using.

So, I’m proved right just in the first iteration of the loop… we reserve over a million pages of memory, yet, since they are all zero-mapped and we haven’t written to any of them, we’re not actually using any more memory.

Even when, in subsequent iterations, we start writing to pages, we’re never hitting too many pages, so we never end up getting to even 2Mb of memory being used for the life of our program.

YET! Our program does end with an out-of-memory error. What gives? If you look at the last successful try, how many pages did we map? Notice anything about the number?

In a 64-bit world, address space should never be a scarce resource. But the number of bits the page table reserves for page mapping is… on my platform, that’s 32 bits.

So, we have a hard limit of 4,294,967,296 pages we can map, even if we are not using any of them.

So, if you want your arrays to be ā€œarbitrary sizeā€, and in reality, you want them to be able to get up to 2^32 items, even if those items are each 8 bytes, I’d only be able to have a couple thousand arrays.

Way back when I was in college, my programming languages book (well, Bruce MacLennan’s book, but I nearly went broke buying a copy) introduced me to the zero, one, infinity principle:

“The only reasonable numbers are zero, one and infinity.ā€

Hardware obviously has hard limitations here, but this is one of the reasons why I’ve developed a strong allergy to 32-bit platforms. More than once, I’ve had a 32-bit value where I thought I was never going to need anything higher, only to be proven wrong. I now strongly prefer 64-bits by default, because it’s closer to the ideal… infinity.

When it comes to memory, what this really means is, zero-mapping insane numbers of pages is totally fine for the memory manager, but it does NOT generally make sense for individual data structures, especially when you don’t want to mandate hard caps on their sizes.

Therefore, for a growable array, the segment array is a pretty reasonable approach. The only downside is that, the power-of-two approach can be more wasteful of your page table space the bigger it gets. Specifically, if an array just sneaks over the segment threshold with its size, you reserve twice as many pages as you’re already using, even if you’re never going to use them. That’s generally not the end of the world.

But, if you’re at any risk of that, then the segment array is not the right approach; you’re better off giving up on constant time, because, when I talked about this being a ā€œcomplexity theory hackā€, I meant it. My only critique of that other article was, ā€œwhy are you worried about O(1) array access here, when you clearly realize CPU is far cheaper than memory?ā€

Lies, Damn Lies, and Computational Complexity

Computational complexity can easily be invoked to win an argument you should really lose. Because, even when it is being used to give us hard upper and/or lower bounds on work, it won’t always be able to say what’s right for a particular scenario.

For many developers, including most who took computer science in college (and didn’t sleep through the ā€œscienceā€ part), they should know quite well that: O(1) 😃> O(log n) 😃> O(n) 😃> O(n log n) 😃> O(n^2) , where today, 😃> means, ā€œis better thanā€.

But that, ā€œis better thanā€, is a bit wishy-washy. I think the most well-known example is, ā€œif the constant on your O(1) algorithm is high enough, it might be slower in practice than the O(n) algorithm.

But if you want to go to the extreme end of the complexity scale, when we talk about some problems being NP-Hard, I’ve often seen people take that as synonymous with, ā€œwe shouldn’t even address itā€. But that’s often a huge mistake. If instances of the problem are small enough, even brute forcing an NP-Hard problem can be good enough. And there can be more clever approaches, as evidenced by the number of solvers out there that can handle many real instances of 3-SAT (which is NP-Hard).

In the case of the segment array, you might say that it’s going through great lengths to avoid a O(log n) solution. Actually, I’d say it’s not substantially different from an O(log n) solution– taking an operation that has O(log n) runtime, and then claiming your implementation is O(1) can generally be done with a totally straight face.

Abstract data structures should be generalizable to any future platform. If you think that way, the segment array itself doesn’t strictly adhere to the zero, one, infinity principle.

Specifically, how would we design it to be future proof, so that, no matter how much memory we might be able to address in the future, the thing will still work?

At some point, if we’re bound by a fixed number of segment pointers, we will have to eventually add more. For instance, we can have the last segment point to a list of the next segments. That doesn’t add much, especially when arrays have to get huge before you ever get to the extra indirection.

The level of indirection knocks us into O(log n) land. If you’re going to generalize the algorithm to handle arbitrary sizes, there’s no way to avoid turning it into an O(log n) algorithm.

But, specific instances in the real world, we may already have a constant-time instance. If we have enough segments before we have to reach for indirection, then we never will use it, which is where the blog post comes down. But, even if you use indirection, you can bound the number to a very small number (thanks math!). There’s now a firm upper bound on your hardware, and it’s actually really close in practice to the case where there’s no indirection at all.

Back to O(1) with no effort at all! In some sense, it’s smoke and mirrors, sure. But there really is that constant upper bound, even if we have infinite memory, thinks to the limit of a logarithmic function as memory approaches infinity…

To be very clear, I’m not saying there’s deception here. What I’m saying is, we often lose site of what really matters, because computational complexity actually is incredibly useful, it’s often hard to get perspective on the subtleties of real-world performance, especially when we have so many layers of abstraction from the machine to most software, that it’s hard to truly internalize the gap between CPU speed and memory speed (for instance).

And, more importantly, because humans don’t really tend to internalize exponential (or logarithmic) distributions, it’s even hard to internalize how meaningful the difference between complexities may be. Let’s say you’re comparing two algorithms. If A is O(n^2) and B is O(n^3), then hopefully you’d have the intuition that A’s more likely to be better in general, than if A is O(n log n) and B is O(n^2).

But one thing I believe people fail to internalize is, how close O(1) and O(log n) really are to each other. In practice, you can reliably take something that’s O(log n), and call it O(1) with confidence, explicitly because we do not have infinite memory.

Let’s take tree access, for example. We’ll just consider a pure binary tree, where each node contains at most two children. A single lookup is pretty trivially log(n) to the number of items.

Let’s say my nodes each take a mere 32 bytes of space. If I use my entire 64Tb of addressable memory on this tree, I could cram in up to 2^48 nodes. That means, my lookup has a hard worst case of 48 lookups, empty or full.

While the overall algorithm is log(n), on my computer it is absolutely O(1)– because we have a concrete upper bound on the work. The whole idea of O(1) is a “constant upper bound”, meaning we can guarantee some constant amount of work, even if the constant in question is relatively high.

Here our “constant” is not high– it is 48 operations. And, allowing more nodes per layer brings that constant down fast.

In the real world, when using log(n) algorithms, there’s a good chance that they’ll perform at least as well as an O(1) algorithm, and could possibly perform better. For instance, a lot is made of the fact that common dictionary operations can all be O(1). But, it’s really easy to use a O(1) algorithm to build a dictionary that’s far slower than a O(log n) algorithm, because the O(log n) algorithms have far fewer places where you might accidentally ratchet up the constant overhead extraordinarily.

And, the cost of page faults are very high compared to the cost of, say, sequential memory access. For the two styles of dictionary, given recursion depth is never that long with a log(n) algorithm, you’re very likely to find log(n) 😃> O(1) if:

  • The dictionary access pattern requires lots of page faults (which we can expect on large dictionaries, due to the hash function); and

  • Then tree is structured to minimize page faults, which is easier to do the more structured the access patterns are, and the more nodes you keep per level.

That’s right, you don’t even have to waste all those page table entries to get effective constant time! Just extend by fixed amounts, and keep a tree!

You may be saying to yourself, this is all too much– one logical conclusion would be, you can play the same kinds of games to call O(n^2) algorithms O(1). They’ve got a constant upper bound on my hardware!

You could try, but pretending the constant is reasonable is just clowning.

The reason it’s an acceptable argument for O(log n) and O(1) is not just the logarithmic curve, it’s because the real-world performance impact for worst-case instances is generally so slight (a small constant multiple, sometimes less than 1), it becomes an acceptable argument. Try it anywhere else, it’s like proving 1 == 2. You can do the magic, but nobody will believe it, and many people will know how the trick works, and roll their eyes at you.

File this one away

Since we were kind of on the topic of mmap(), there’s ā€œone easy trickā€ for conserving memory that lots of people tend to miss.

Sometimes, developers know stuff is possible, but assume it’s incredibly hard. That tends to be true with anything having to do with mmap(), because even most C developers are afraid of its ridiculous complexity.

You’ll notice above, even just the simple act of, ā€œgive me some memory mapped pagesā€ is complex– the call requires a whopping 6 parameters. I long ago gave up on fitting a single mmap() call on a single line (I’m an 80-columns fan). And I’m not here to address that problem, as much as I hate it.

But, we can use it to read files into our address space in a way where we can delegate to the OS to manage when to load what parts of a file into memory.

Typically, people tend to go for a one-shot wrapper for reading files, which will typically actually READ the file, copying the whole thing directly into the resident page set (even if pages are likely to be evicted from the first tier of cache).

Why not let the OS manage all of that, and pay for what you use? For instance, here’s a little command for you (using the convert() function and headers from above) that you can use to convince yourselves that we can read the file into memory via mmap() without having to even pass over the file once to do it.

Yes, you can “read” the file without ever having to “read” the file. You can totally skip the whole IO interface, and go straight to unchecked direct memory access, woo!

#if defined(__linux__)
#include <sys/file.h>
#include <sys/stat.h>
#define _DEFAULT_SOURCE
#endif

int
main(int argc, char *argv[], char *envp[])
{
    struct stat   info;
    struct rusage usage;
    uint32_t      aligned_sz;
    uint32_t      modulus;
    void         *addr;
    char          buf[10] = {
        0,
    };

    setlocale(LC_NUMERIC, "");
    if (argc < 2) {
        printf("No files specified.\n");
    }
    else {
        for (uint32_t i = 1; i < argc; i++) {
            int fd = open(argv[i], O_RDONLY, O_CLOEXEC);

            if (fd < 0) {
                printf("Can't open file: %s\n", argv[i]);
                continue;
            }

            flock(fd, LOCK_EX);
            fstat(fd, &info);

            addr = mmap(NULL,
                        info.st_size,
                        PROT_READ,
                        MAP_FILE | MAP_PRIVATE,
                        fd,
                        0);

            if (addr == MAP_FAILED) {
                printf("Can't mmap() file: %s\n", argv[i]);
            }
            else {
                printf("Mapped file %s @%p (%'llu bytes)\n",
                       argv[i],
                       addr,
                       (unsigned long long)info.st_size);
            }

            // We intentionally leave fds open, so you can
            // see the (non-)impact to memory.
        }
    }

    getrusage(RUSAGE_SELF, &usage);

    convert(usage.ru_maxrss, buf);
    printf("Total memory used: %'ld bytes. (%s)\n",
           usage.ru_maxrss,
           buf);
    exit(0);
}

Some important things to note:

  1. While most people never really bother with file race conditions, here, we try to avoid it by requesting an exclusive lock when opening the file (via flock). If we don’t do that, the file can change from under us, and the size we get when we stat the thing can easily be wrong (which isn’t a big deal in that it’s unlikely to lead to a crasher, but then you’re on your own for figuring out where the file ends).

    It’s true, the effectiveness of file locks isn’t something to be counted on (last I checked, it’s platform specific whether the lock can be broken without killing the process with the lock). But, even if it doesn’t work, it’s better than being an osterich, pretending that file-based race condtions aren’t a problem.

    To be fair, the downside to this kind of lock is denial of service– if another process holds a long-term lock to the file, you’d block on flock(), without a bunch more dancing (to get it right). Still, for the vast majority of things, where it’s reasonable to expect exclusive access to a file unless there’s a security issue, I recommend it.

  2. And while we’re on this, do note that, if the underlying file is unlocked, it’s generally undefined what happens if another process modifies it while you hve a memory map. That’s not particularly dissimilar from when you read() and write() directly– the OS may be using a cached page under the hood… or it may not be. šŸ¤·ā€ā™‚ļø

  3. When using mmap() to memory-map a file, it’s totally fine if the size you provide is not page-aligned. Platforms should typically pad to the page with zeros anyway.

  4. If you don’t write beyond the actual current end of a file, the file won’t grow, even if you gave yourself wiggle room. That is, it’s always fine using a huge map… it just counts to your 2^32 total number of pages (or whatever your machine allows).

  5. You can close the file descriptor after the mmap() call succeeds, but you might want to wait, because of the next note.

  6. If you want to be able to modify the contents, you can map into an arbitrary size. As with a basic memory map, we won’t incur memory cost for pages we don’t use, in that they’ll all be mapped to the zero page. However, this does incur a cost in that the file itself will be extended to the requested size. So when you’re done, call ftruncate() on the file descriptor, to tell the OS the proper size, before you close (which will yield the lock).

  7. Oh, and don’t forget to munmap() the memory range, for good measure. Not doing so is akin to leaving the file descriptor open… the OS will keep accounting info around, and you should let it jettison that, already, sheesh. It’s already got enough on its plate.

Final Thoughts

All in all, don’t worry about optimization so much… not just CPU, memory too. And we didn’t even cover the part where, when kernels page out memory, they generally compress it. For typical program data, the Linux kernel gets 80%+ compression, letting you use even more memory. Trust that some good engineers have your back. Even when you do get to the point where performance is a problem, they still have your back… that’s when you go to the profile.

Anyway, I thought today I’d be posting part one of the garbage collector. But boy, your question made me self-reflect. Specifically, you’ve helped me realize, there’s too much good stuff to read, so if I want a hope of any more readers, I need to ignore all the advice my mom has given me to get on Twitter (or whatever it’s called these days), and instead, try clickbait headlines!

So, lone reader, I’m forever in your debt. If I ever sell out and go to Substack, free subscription for life (though, I will never move to Substack, so no risk of that)!

šŸ”„Oh, and I will never make you sign up to a newsletter to get all my code, either. That was the only truly disappointing thing about Daniel Hooper’s article, tbqh.

I’ll get that garbage collection article out soon enough. Maybe give it some time for the clickbait to get my to, I dunno, like 3 readers?

— Leigh T. H.

p.s., Mom, the T.H. does not stand for ā€˜The Handsome’. Stop saying that! I know you’re proud, but it’s embarrassing!