Loss Of Ethernet Adaptor

Home » CentOS » Loss Of Ethernet Adaptor

June 6, 2014 James B. CentOS 16 Comments

At ~07:40 (UTC-4:00) this morning our gateway host lost its WAN Ethernet adaptor. Subsequent to recovery, which required a reboot, the following entries were find in /var/log/messages:

Jun 6 07:39:50 gway02 kernel: PING_FLOOD: IN=eth0 OUT= MAC:25:90:61:74:c0:00
:24:14:2b:f2:80:08:00 SRCt.205.112.125 DST!6.185.71.33 LENd TOS=0x00 PREC0x00 TTLP ID0954 PROTO=ICMP TYPE=8 CODE=0 ID%496 SEQ=0
Jun 6 07:39:53 gway02 kernel: PROBE_BLACKIST: IN=eth0 OUT=eth1 SRC2.235.101.
24 DST!6.185.71.249 LENR TOS=0x08 PREC=0x20 TTLE ID&123 DF PROTO=TCP SPT
T197 DPTD5 WINDOW

16 thoughts on - Loss Of Ethernet Adaptor

says:

June 6, 2014 at 8:45 am

James B. Byrne wrote:

Well, let’s start with you being probed/attacked from China: whois
122.235.101.24

inetnum: 122.235.0.0 – 122.235.127.255
netname: CHINANET-ZJ-HZ
country: CN
descr: CHINANET-ZJ Hangzhou node network descr: Zhejiang Telecom
<...>
role: CHINANET-ZJ Hangzhou address: No.352 Tiyuchang Road,Hangzhou,Zhejiang.310003
country: CN
phone: +86-571-85157929
fax-no: +86-571-85102776
e-mail: anti_spam@mail.hz.zj.cn remarks: send spam reports to anti_spam@mail.hz.zj.cn remarks: and abuse reports to anti_spam@mail.hz.zj.cn

And whois reports the puppy above is not only from Hong Kong, but remarks: -+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+
remarks: This object can only be updated by APNIC hostmasters. remarks: To update this object, please contact APNIC
remarks: hostmasters and include your organisation’s account remarks: name in the subject line. remarks: -+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+

which suggests that the IP or range or domain is an ex….

So, next question is, is the card working again? If so, then this is an attack I’ve not heard of, that affects what’s this, layer 0?

mark
Alexander Dalloz says:

June 6, 2014 at 8:58 am

Am 06.06.2014 14:50, schrieb James B. Byrne:

[ … ]

https://isc.sans.edu/forums/diary/Intel+Network+Card+82574L+Packet+of+Death/15109

http://www.itwalker3.com/2013/02/packet-of-death-attack-a-deadly-dos-against-intel-nics/

Worth to verify in your case.

Alexander
Steve Clark says:

June 6, 2014 at 9:02 am

Hi,

We ran into this problem also – the interface would disappear. There is newer e1000e driver that fixes it or you could add pcie_aspm=off to your kernel command line.

HTH, Steve
James B. says:

June 9, 2014 at 10:34 am

Re: Packet of Death attack: a deadly DoS against Intel NICs

It appears that my problem is caused by something else as the EPROM
fingerprint matches the ‘good’ version (mostly).

ethtool -e eth0
. . .
0x0010:01 01 ff ff 6b 02 d3 10 d9 15 d3 10 ff ff 58 a5
. . .
0x0030:c9 6c 50 31 3e 07 0b 46 84 2d 40 01 00 f0 06 07
. . .

However this matches neither the known ‘bad’ nor the reputed ‘good’ EPROM image:

0x0060:00 01 ff ff ff ff ff ff ff ff ff ff ff ff ff ff

But it seems a lot closer to the ‘bad:

0
Steve Clark says:

June 9, 2014 at 10:46 am

Hi,

Don’t know if you saw my prior email, but we experienced this exact same problem see log excerpts below:
… Jul 31 17:05:18 wolfpac kernel: pciehp 0000:00:1c.5:pcie04: Card not present on Slot(37)
Jul 31 17:05:18 wolfpac kernel: pciehp 0000:00:1c.5:pcie04: Card present on Slot(37)
Jul 31 17:05:18 wolfpac kernel: device eth5 left promiscuous mode Jul 31 17:05:19 wolfpac kernel: e1000e 0000:07:00.0: PCI INT A disabled Jul 31 17:05:20 wolfpac ntpd[2726]: Deleting interface #7 eth5, 192.168.198.95#123, interface stats: received=517, sent=522, dropped=0, active_time=108106 secs Jul 31 17:05:20 wolfpac ntpd[2726]: Deleting interface #8 eth5, fe80::290:bff:fe2a:acf3#123, interface stats: received=0, sent=0, dropped=0, active_time=108039 secs
…

This would randomly happen on systems that weren’t connected directly to the internet. We experienced this on multiple systems. Since we upgraded to the latest elrepo driver and added pcie_aspm=off to our kernel command line we have never experienced the issue again.
James B. says:

June 9, 2014 at 11:27 am

Thank you. I did get your message and I simply have not had time to test its implementation as it necessarily involves a restart of the test system. I am trying to discover if there is some way of restarting a headless server and use a specific grub entry instead of the default. I want to leave the default unchanged until I can prove that any manual changes I make do not negatively impact a system restart.

If anyone knows if this is possible and if so, how it is done, I would welcome the information.

Regards,
says:

June 9, 2014 at 11:45 am

James B. Byrne wrote:

That’s a no-brainer: change the default= line in grub from 0 to whatever the entry number is. Note that I’m not sure what happens if you add a kernel update in there, whether the post-install scripts will increment the number so as to continue to point to the correct kernel.
says:

June 9, 2014 at 11:46 am

James B. Byrne wrote:

That’s a no-brainer: change the default= line in grub from 0 to whatever the entry number is. Note that I’m not sure what happens if you add a kernel update in there, whether the post-install scripts will increment the number so as to continue to point to the correct kernel.

mark
SilverTip257 says:

June 9, 2014 at 11:53 am

If you happen to be fortunate enough and have (ipmi v2) Serial over LAN
configured, you can reboot and change the boot selection.

Not really. James wrote that he does not want to “negatively impact a system restart”. If I was in his shoes, I wouldn’t change the grub default boot item without serial-over-lan access, a KVM switch with network access, or
“remote hands” on site. Otherwise you just changed your default boot item and it could cyclically crash and (possibly) reboot.
Steve Clark says:

June 9, 2014 at 11:58 am

This show how to do a one time boot and then the next time boot the original system

https://www.gnu.org/software/grub/manual/legacy/Booting-once_002donly.html
James B. says:

June 10, 2014 at 1:27 pm

Thank you. Based on my readings of this reference there are two mechanisms available: 1. boot once, 2. fallback.

The critical step seems to be issuing the command ‘grub-set-default n’ where n is a value between 0 and the number of entries in boot.conf less one.

Reading the boot-once fallback documentation it recommends fallback as the superior alternative.

<-
James B. says:

October 15, 2014 at 10:41 am

This is a return to an issue I first raised back in June. We had a similar occurrence in September while I was away and so I am revisiting the entire matter.

Steve Clark on 6 Jun 16:02 2014 wrote:

I have run into other reports of similar occurrences and some of these refer to this bug report: https://bugzilla.redhat.com/show_bug.cgi?idc2650

However, that report is closed as being a duplicate of:
https://bugzilla.redhat.com/show_bug.cgi?idV2273

Which is not available to viewing by the great unwashed.

Nonetheless, following the discussion thread in the bug report that I can view it appears that this issue was supposedly resolved sometime in late 2012. From what I can gather the fix was to disable ASPM L1 for this model adaptor in the e1000e driver module.

* Upstream commit d4a4206ebbaf48b55803a7eb34e330530d83a889 – e1000e: Disable ASPM L1 on 82574

However, when I run lspci -vvv on the host that exhibited the problem I see this:

. . .
03:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
Subsystem: Super Micro Computer Inc Device 10d3
Physical Slot: 0-2
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL
Akemi Yagi says:

October 16, 2014 at 11:47 am

I’m the one who did the submission. Some of my comments (which I
thought were helpful) have been hidden by Red Hat.

I don’t have access, either.

My suggestion for you is to give ELRepo’s kmod-e1000e a try. It has the latest version from Intel (3.1.0.2) as opposed to the version in the EL kernels (2.3.2-k). There are known cases in which a later version resolved issues.

Akemi
Marcelo Ricardo says:

October 21, 2014 at 1:03 pm

Both BZs above are RHEL 5 specific, being 562273 a “driver update” one. Did you report this against any RHEL6 too?

Marcelo
James B. says:

October 22, 2014 at 9:41 am

No, I did not.
Akemi Yagi says:

October 24, 2014 at 12:10 am

The e1000e bug report against EL6 is in this CentOS bug tracker and you can find all the details:

http://bugs.CentOS.org/view.php?idh10

RH bugzilla is here but it is private:

https://bugzilla.redhat.com/show_bug.cgi?id38754

Here again, I recommend use of ELRepo’s kmod-e1000e package. It is possible that the driver in the upcoming CentOS 6.6 fixes the problem.

Akemi

Loss Of Ethernet Adaptor

16 thoughts on - Loss Of Ethernet Adaptor

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta