Performance Issues/difference Of Two Servers Running Same Task (one Is Quicker)

Home » CentOS » Performance Issues/difference Of Two Servers Running Same Task (one Is Quicker)
CentOS 8 Comments

Hi

I need some advice what to do next, even if someone tells me to check out (an)other mailing list(s), tuning site or point me in a better direction how to solve my annoying problem: one server is much faster for certain tasks although on “shitty” hardware.

I have tried many things to solve my issue
– changed buffer/pool/cache/etc mysqld
– changed server settings apache/php
– changed various OS settings (sysctl) e.g. turned off IPV6
but havent figured it out.

I have a development server (local) and life servers (data center)
Used mainly for many different websites and one online training site.

the development and life server in question run the same software setup:
CentOS Linux release 7.6.1810
– bind 32:9.9.4-74.el7_6.1
– Apache/2.4.6 (CentOS)
– PHP 7.1.29
– mysqld Ver 5.7.26
– wordpress, woocommerce, wishlistmember, Sensei etc
– software are all in the same stages of updates.
– even many of the linux conf files are the same (/etc/host, bind, etc)
– the databases are copies/identical

Life server is a Poweredge M710,48GB,2xXeon L5630,LSI Raid1 SSD
Dev server is a DIY, GIGABYTE MX31-BS0, 32GB, 1xXeon E3-1245,MDADM RAID0 1TB Seagate Spinners

Clearly the development server is hardware wise way below the specs of the Dell but software wise they are identical (they get upgraded at the same time).

During normal operations (i.e. display websites, online training courses etc) the DELL
displays the websites faster although it sits 1000KM up north in a datacenter on a different network than the local server on the same network as my machine.

Yet the DEV server outshines the DELL when creating a few large custom tables, ie the local server takes 5s while the DELL takes 15s (small tables), more for bigger tables.

The task is based on:
– level, member, course, group are all ID’s
– members can belong to a group, a level and can access many courses
– the ID restricts what they can access and what they belong to.
– a course for each member can have various stages of completion
– using an API (wishlist member) that performs LOCAL calls when accessed locally
I can get who belongs to what and make up my info I need, then use PHP
to make up the table.
– DB calls ARE LOCAL!

Now when I try to create a table of members belonging to the same group level doing the same course with different stages of completion the DELL takes on average
3 times longer to complete the table (normally about 20 to 30 rows).

I have put microtime() calls before and after certain calls, and it’s visibly different:
DEV
Jul 04 04:57:26 UTC _members took 0.0005459785461425 ms
Jul 04 04:57:26 UTC _members took 0.0005321502685546 ms
LIFE
Jul 04 05:00:36 UTC _members took 0.0014369487762451 ms
Jul 04 05:00:36 UTC _members took 0.0013291835784912 ms If I do this 300+ times, the outcome is very different.

So my questions:

– How can it be that the DELL takes so much longer alltough on the far better hardware?
– How can it be allthough everything (software/os/plugins) is the same?
– This even happens if the DELL is on low load (i.e. middle of the night) and
only serves a few requests.

Same software, same config, same database, same amount of data in the database yet on better hardware it’s slower?

Any ideas anyone?

8 thoughts on - Performance Issues/difference Of Two Servers Running Same Task (one Is Quicker)

  • Two ideas:

    a) the DELL maybe faster over all but if I’m right single core speed is slower than on DEV machine.

    b) how do the LSI/SSD perform compared to the MDADM/RAID0 on the DEV
    server? I’m not sure the DELL is a clear winner here.

    Regards, Simon

  • As a first step, you have to test subsystems one by one.

    Try this to see how fast the CPU and kernel are (including meltdown/spectre slowdowns):

    time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count00000

    Then try this to see how fast your disks are for DB operations:

    cd /a/directory/on/the/filesystem/you/want/to/test time bash -c “for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done”
    rm test

    Regards.

  • It looks like the DIY system has a CPU that’s nearly twice as fast as the Dell’s.  The additional CPU in the Dell will run more tasks concurrently, but it won’t make a single process faster.

    You might also think that the SSD RAID would make the Dell faster, but that will only be true if the process that you’re testing performs a significant amount of IO.  If your DB operations are happening mostly in memory (that is, if the data is cached), then the faster CPU will be the primary determining factor.

    The other thing that you left out of your description is the amount of data on each server.  If your live server has a lot of data in its DB
    and the dev system has a small dataset suitable for testing, then generally you’d expect that the dev system’s data is more likely to live in cache and avoid disk IO, and processing the smaller set will also take less CPU time.

    https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+E3-1245+%40+3.30GHz&id=1202

    https://www.cpubenchmark.net/cpu.php?cpu=Intel+Xeon+L5630+%40+2.13GHz&id=2086

  • As others have said the DEV server is a generation newer CPU. For CPU
    details I often reference Intels “ark” pages:

    https://ark.intel.com/content/www/us/en/ark/products/47927/intel-xeon-processor-l5630-12m-cache-2-13-ghz-5-86-gt-s-intel-qpi.html
    12M Cache, 2.13 GHz, 5.86 GT/s Intel® QPI

    https://ark.intel.com/content/www/us/en/ark/products/52274/intel-xeon-processor-e3-1245-8m-cache-3-30-ghz.html
    8M Cache, 3.30 GHz

    The “generations” I mentioned are:
    Code NameProducts formerly Westmere EP
    <https://ark.intel.com/content/www/us/en/ark/products/codename/54534/westmere-ep.html>
    Code NameProducts formerly Sandy Bridge
    <https://ark.intel.com/content/www/us/en/ark/products/codename/29900/sandy-bridge.html>

    Westmere systems used DDR at 800/1066MHz. Sandy Bridge systems used DDR at 1066/1333MHz. Not a huge difference, but likely another contributing factor of performance.

    I would also look at power settings in the BIOS and c-state settings in the BIOS and OS as disabling c-states (often enabled by default to meet green/energy star compliance) can make a noticeable performance difference.

    Hope that helps.

  • Thank you for the tips. Here are the results (DELL is faster overall):

    [DIY ~] #>time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count00000
    real 0m1.931s user 0m1.022s sys 0m0.896s
    [DELL ~] #>time dd 2>/dev/null if=/dev/zero of=/dev/null bs=1 count00000
    real 0m1.308s user 0m0.389s sys 0m0.919s

    Dell faster overall

    [DIY /mnt] #>time bash -c “for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done”
    real 1m12.944s user 0m1.604s sys 0m2.595s
    [DELL /mnt] #>time bash -c “for((i=0;i<1000;i++)); do dd 2>/dev/null if=/dev/zero of=test bs=1 count=1 conv=fsync;done”
    real 0m2.270s user 0m0.509s sys 0m1.475s

    Expected the DIY to be slower here, it’s running MDADM RAID1 on Seagete Spinners compared to LSI RAID1 SSD

    The result shows the DELL overall is faster, back to the drawing board after I followed all the other hints in this thread.

    Jobst

  • Yes, but since BOTH have “other” things to do at the same time the sheer number of CPUs of the DELL should help

    See my answer to the disk task test to another email.

  • I made the buffer pool size on the DELL double the size of the DIY
    when I started trying to figure out why the speed difference.

    Most of the DB’s are small as they contain websites. The biggest DB is the Online Training DB, which are the same on both machine as I constantly copy the data from the life server to the DIY.

    Very good analysis indeed. Makes total sense.

  • Could you try the same operations on COPIES of the databases, on both machines?
    An original live DB can be slower than a copy, because of data structure fragmentation, garbage collections etc. (on the filesystem, but also in tables)

    Just a thought about another thing to try, since we have established that the production hardware is indeed faster.

    Regards.