Disk Read Io Very High, But No Process Perform Io Read

Home » CentOS » Disk Read Io Very High, But No Process Perform Io Read
CentOS 13 Comments

We have experienced a very weird problem. The load of the server machine is very high. We use “pidstat” and find that the disk read io is very high.
But the processes running on this server do not perform any disk read operation. We also noticed that when we execute “top” command, for most of processes, the values in “SHR” column are zero. Compared with other normal servers, we found that by executing “free -m”, the result shows that the buff/cache value in this server is lower than the value of other normal servers. We also found that there were lots of major page faults by executing “ps -o majflt,minflt”. Swap is not enabled on this server. What could be the reason for this issue?

The CentOS version:CentOS Linux release 7.3.1611
kernel version:3.10.0-693.21.1.std7a.el7.0.x86_64

13 thoughts on - Disk Read Io Very High, But No Process Perform Io Read

  • Hi,

    If disk read IO is very high, what does it read and what generates the reads?

    Simon

  • Install a program called iotop. iotop will show you, in real time, which processes are using the most i/o bandwidth.

  • I use pidstat and find all processes on this server perform read operation. but I don’t know where these read operation come from

    | |
    yf chu
    |
    |

  • the processes on this server do not involve io read operation. the code is developed by ourself. so I don’t know where these io read come from. by the way, pidstat only show disk io ,not network io, is that right?

    | |
    yf chu
    |
    |

  • yes. I suspect it has something to do with swapping. but swap is turned off on this server. here is the result of free -m. total used free shared buff/cache available Mem: 128174 97449 24400 4158 6325 25232
    Swap: 0 0 0

    We have other servers. The processes running on these servers are same. but on other servers, the size of buff/cache is larger than the size on the server which experienced the problem and the size of “free” is smaller than the size on the server which experienced the problem.

  • yes. I suspect it has something to do with swapping. but swap is turned off on this server. here is the result of free -m. total used free shared buff/cache available Mem: 128174 97449 24400 4158 6325 25232
    Swap: 0 0 0

    We have other servers. The processes running on these servers are same. but on other servers, the size of buff/cache is larger than the size on the server which experienced the problem and the size of “free” is smaller than the size on the server which experienced the problem.

  • I haven’t been following this thread closely, so may be off target.

    When pages are moved out of the working set they are either “clean” or
    “dirty”. Clean pages have not been modified since they were originally moved into memory whereas dirty pages have been changed. A dirty page can become clean if it is written back to disk. Typically (though not always) this is a write to swap. Read only pages, such as code or data will always be clean, so can be dropped when required.

    When a hard fault occurs then pages have to be read from disk, somewhere. That somewhere could be swap, but can also be program images or files. It may be that what you observe is this latter process.

    HTH, Martin

  • Hi,

    You said that you have multiple systems running this same application. But, do they work with the same data on disk or are there big differences?

    From how I understand the figures below, your buff/cache seems a bit low if you read a lot of data from disk. If you read a lot of data and filesystem caches are slow, it will result in heavy reading from disk directly.

    One more thing comes to mind: is your hardware behaving fine? I mean, if your storag has difficulties to read data from disk due to hardware issues, this could also lead to such problems you’re facing.

    Regards, Simon

  • The applications on all those servers are same. They are working on same data. I still don’t know why the size of buff/cache is different between different servers.

    At 2021-03-12 16:35:09, “Simon Matter” wrote:

  • You might want to check the kernel threads… If you use md arrays you can have very high load during md check..

    Adrian