Finding Memory Usage

Home » CentOS » Finding Memory Usage
CentOS 11 Comments

I have a CentOS 7 server that is running out of memory and I can’t figure out why.

Running “free -h” gives me this:
              total        used        free      shared  buff/cache  
available Mem:           3.4G        2.4G        123M        5.9M       
928M        626M
Swap:          1.9G        294M        1.6G

The problem is that I can’t find 2.4G of usage.  If I look at resident memory usage using “top”, the top 5 processes are using a total of
390M.  The next highest process is using 8M.  For simplicity, if I
assume the other 168 processes are all using 8M (which is WAY too high), that still only gives a total of 1.7G.  The tmpfs filesystems are only using 18M, so that shouldn’t be an issue.

Yesterday, the available memory was down around 300M when I checked it. 
After checking some things and stopping all of the major processes, available memory was still low.  I gave up and rebooted the machine, which brought available memory back up to 2.8G with everything running.

How can I track what is using the memory when the usage doesn’t show up in top?


Bowie

11 thoughts on - Finding Memory Usage

  • On a lark, what kind of file systems is the system using and how long g had it been up before you rebooted?

  • The filesystems are all XFS.  I don’t know for sure how long it had been up previously, I’d guess at least 2 weeks.  Current uptime is about 25
    hours and the system has already started getting into swap.


    Bowie

  • Are your results from “top” similar to:

      ps axu | sort -nr -k +6

    If you don’t see 2.4G of use from applications, maybe the kernel is using a lot of memory.  Check /proc/slabinfo.  You can simplify its content to bytes per object type and a total:

      grep -v ^# /proc/slabinfo | awk ‘BEGIN {t=0;} {print $1 ” ” ($3 *
    $4); t=t+($3 * $4)} END {print “total ” t/(1024 * 1024) ” MB”;}’ | column -t

  • The misunderstanding was mostly related to an older version of “free”
    that included buffers/cache in the “used” column.  “used” in this case does not include buffers/cache, so it should be possible to account for the used memory by examining application and kernel memory use.

  • This was brought to my attention because one program was killed by the kernel to free memory and another program failed because it was unable to allocate enough memory.

    Right, and that website says that you should look at the “available”
    number in the results from “free”, which I what I was referencing.  They say that a healthy system should have at least 20% of the memory available.  Mine was down to 17% in what I posted in my email and it was at about 8% when I rebooted yesterday.


    Bowie

  • That looks the same.

    The total number from that report is about 706M. 

    My available memory has now jumped up from 640M to 1.5G after one of the processes (which was reportedly using about 100M) finished.

    I’ll have to wait until the problem re-occurs and see what it looks like then, but for now I used the numbers from “ps axu” to add up a real total and then added the 706M to it and got within 300M of the memory currently reported used by free.

    What could account for a process actually using much more memory than is reported by ps or top?


    Bowie

  • Bowie Bailey wrote:

    Um, wait a minute – are you saying the oom-killer was invoked? My reaction to that is to define the system, at that point, to be in an undefined state, because you don’t know what some threads that were killed are.

    mark

  • I’ve had multiple systems (and VMs) with XFS filesystems that had troubles on the 693 series of kernels. Eventually the kernel xfs driver deadlocks and blocks writes, which then pile up in memory waiting for the block to clear. Eventually you run out of RAM and OOM killer kicks in. The only solutions I had a the time was to revert to booting a 514 series kernel or converting to EXT4, depending on the needs of the particular server. Everything I’ve converted to EXT4 has been rock stable since, and the very few I had to run a 514 kernel on have been stable, just not ideal. It may be fixed on the newer 8xx series but I haven’t dived into them on those systems yet.

    If it happens again the look for processes in the D state and see if logging is continuing or if it just cuts off (when the block started).

  • Probably true, but the system has been rebooted since then and the oom-killer has not been activated since then.  When I first noticed the problem, I also found that my swap partition had been deactivated, which is why the oom-killer got involved in the first place instead of just having swap usage slow the system to a crawl.

    I think I have identified the program that is causing the problem
    (memory usage went back to normal when the process ended), but I’m still not sure how it ended up using 10x the memory that top reported for it.


    Bowie

  • Did it move at all after killing that “one process”?

    Are you counting both resident and shared memory?  If the process that you terminated had around 900MB of shared memory, and you aren’t looking at that value, then that’d explain your memory use.

    We’re kinda guessing without seeing any of your command output.