[GLLUG] Monitoring memory usage

Mon Nov 11 14:58:42 UTC 2024

On Wed, Oct 23, 2024 at 09:00:39AM -0700, Adam Monsen via GLLUG wrote:
[...]
>
>Are you working with one or multiple actual Linux servers or desktops, 
>or is your original question academic? I'm assuming you're talking 
>about one single machine, is that right? Single or multi-user? Are you 
>also considering CPU usage?

It's a real-world problem of mine stretching back 25 years over 15 companies. In my current role, about 800 servers.

And that's funny about single-/multi-user. Not long ago, on a forum far, far away, someone was told that "no-one has had a multi-user Unix system since the '80s"!

>
>Can you say more about the particular workloads you're trying to 
>schedule? Are they bursty, is someone sitting there waiting/watching 
>for hopefully not too long, are they I/O heavy, can they be nice'd, 
>can they co-exist peacefully... stuff like that. And as others have 
>mentioned: sitting in memory is one thing, but paging in and out is 
>another.

Most systems I ever work with will have latency sensitive loads during the day, an I/O heavy backup early eveing and heavy batch-jobs in the night.

One annoying problem is always non-IT users scheduling mad queries from a GUI front-end and the database of application programmers not always having enough built-in protection to catch it.

>Have you heard of PSI (Pressure Stall Information) -- 
>https://docs.kernel.org/accounting/psi.html ? It's another "trailing 
>indicator" (not a "leading indicator") but maybe that approaches 
>something like one or a few useful longitudinal metrics in the manner 
>you're seeking.

I had not, that is very interesting and I have just showed that to my team. I had a quick look at one of our systems but it did not have /proc/pressure so I assume I need to enable something. It looks CGroup related.

Regards,
Henrik Morsing