<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">I appreciate this thread. Henrik, I

      know you specifically asked about memory, but processes will

      suffer from any resource starvation, and one kind of resource

      starvation can bleed into another. This is why thrashing is

      particularly bad, right? Thrashing impacts disk I/O as well as

      memory (assuming you're using a swap file on disk). Maybe that's

      not the best example, but I do think it's useful to consider

      multiple different kinds of resources when scheduling workloads

      (running processes).<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">So apologies in advance, I think I've

      gone off-topic below, looking at multiple forms of resource

      starvation. You could ignore the parts about CPU, GPU, network

      I/O, and disk I/O. Or we could start a new thread.<br>

    </div>

    <div class="moz-cite-prefix"><br>

    </div>

    <div class="moz-cite-prefix">Henrik wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:ZxjtHfHFUykSsz1o@morsing.cc">Of

      course, the ultimate solution would be to measure thrashing but

      that would be tricky and by the time anyone noticed it might be

      too late.</blockquote>

    <p>I agree the point of thrashing is too late. It gets so difficult

      to do any useful manual sysadmin tasks on a Linux server when a

      process <i>approaches</i> the point of thrashing (especially

      if/once the OOM killer is "helping"), right?<br>

    </p>

    <p>Are you working with one or multiple actual Linux servers or

      desktops, or is your original question academic? I'm assuming

      you're talking about one single machine, is that right? Single or

      multi-user? Are you also considering CPU usage?</p>

    <p>I like how this thread is approaching monitoring of specific

      services/applications/workloads/programs/processes (I'm fudging

      and treating all these terms as roughly equivalent). Processes can

      have unique resource usage profiles over time. I don't think

      you'll find one memory metric that can meaningfully inform you

      whether or not the next process you'll create will suffer, because

      it depends on how the process you're trying to run behaves, what

      else is running right then (and what it is doing), and what

      resources you have to work with.<br>

    </p>

    <p>Can you say more about the particular workloads you're trying to

      schedule? Are they bursty, is someone sitting there

      waiting/watching for hopefully not too long, are they I/O heavy,

      can they be nice'd, can they co-exist peacefully... stuff like

      that. And as others have mentioned: sitting in memory is one

      thing, but paging in and out is another.<br>

    </p>

    <p>Zooming out to all resources in general (not just memory), I want

      to get back to using some combination of metrics to make

      decisions. It doesn't have to be complex to work. Personally I've

      found it useful enough to do rough estimations of available CPU

      cores and RAM, and what my workloads require of both.<br>

    </p>

    <p>For example, say you have a server with 4GB RAM and 4 CPU cores.

      You want to run two services for a family of five. Assume usage on

      both is intermittent--you happen to know it's unlikely they'll all

      be streaming video and uploading and downloading huge files

      simultaneously.<br>

    </p>

    <p><font face="monospace">service   | purpose      | CPU cores | RAM<br>

        ----------|--------------|-----------|----<br>

        jellyfin  | stream music | 2         | 2GB<br>

      </font><font face="monospace">nextcloud | file share   | 2        

        | 2GB</font></p>

    <p>This is obviously missing a lot. What about GPUs? What about disk

      and network I/O? Personally I have a feel for these resources in

      my head for the relatively few services I self-host, but if I was

      being more careful I'd look at those too.<br>

    </p>

    <p>Have you heard of PSI (Pressure Stall Information) -- <a

        class="moz-txt-link-freetext"

        href="https://docs.kernel.org/accounting/psi.html">https://docs.kernel.org/accounting/psi.html</a>

      ? It's another "trailing indicator" (not a "leading indicator")

      but maybe that approaches something like one or a few useful

      longitudinal metrics in the manner you're seeking.</p>

  </body>

</html>