Thanks To Every One

Home » CentOS » Thanks To Every One

July 16, 2017 Andreas Benzler CentOS 10 Comments

Thanks to all offers free service to provide CentOS.

Why? After I had the BIOS for the laptop T440 & docking station had I
had immediately after the first installation works, without any failures.

Since I know how hard such a work is, I thank everyone in the community to make that possible.

Without a stable base I can not perform my Linux tests.

Since I often have the time micht Rolling releases like ArchLinux or every half year, I know at the latest after a year, a Fedora update to perform is CentOS exactly the correct. I do not need the GUI support as in OpenSuSE, because who comes times from the shell knows what I mean.

Most cluster nodes in my work are computing machines with an average run time of 220 days, usually over a year before they need a reboot. When I
think about my virtual Windows 10, which every week requires a restart.

I had already more than 30 distros active in my hands, but none read so easily. Of course you can always improve something here and there and yet it comes on a good basis.

Furthermore, I find the handling of rpm easily understandable.

Small supplement to my repo:

– I provide the packages of my work
– They are not signed because I do not see you directly as an additional repository for everyone, rather than as an exponential fundus.
– Each external repostory changes the basic system.
– new dependencies to the packages.
– Users also indirectly depend on it.
– Repositories in the past have already led to a new split of the basis. Best example: Debian -> Ubuntu -> Linux Mint. I don’t like that point at all.
– The holder is bound, in a certain way, to the care and demand of the Community. – I definitely miss the time.

All in all … thanks to everyone

Andy

10 thoughts on - Thanks To Every One

Valeri Galtsev says:

July 16, 2017 at 10:26 am

Please, teach me: how do you manage to get Linux uptime this long (a year, or even 220 days). In my observation at least once every 45 days there is either Kernel security update or glibc one. And each one of them required machine reboot. And all Linux distributions are based both on Linux kernel and on glibc.

Thanks in advance for your insight!

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
Andreas Benzler says:

July 16, 2017 at 11:30 am

Halo Valeri,

let’s think about what a hpc cluster is for. Second, one should always ask the question where security is to be applied,then one can come to the following decision:

– The firewall is placed in front of the cluster.
– After you have found a safe base for this, you freeze it.

– We have a rsync of CentOS and epel on the head node. From here, we can always reinstall a node (tftp / http)
– To relieve the internal network printing, I create rpm packages that are installed on the nodes.

All this happened about 3 years ago. CentOS 1511 was established as a stable variant for the environment.

It was one of my many different tasks in my work.

The physicists and mathematicians who count there need high durations.

My decision much on CentOS because:

– free
– Maintaining until the year 2024,
longer than the cluster will live.

My way in the beginning was hard, because I had to learn everything from the scratch and I’m no longer the youngest, but my feeling gave me right.

Sincerely

Andy
Pete Biggs says:

July 16, 2017 at 12:03 pm

+1

You have to assess your environment and weigh up the benefits of uptime vs security. Sometimes the security that is fixed in a new kernel is inconsequential in your environment; sometimes the external security on your network is such that the attack vector is tiny. You make a judgement based on your needs.

Yes. I too run HPC clusters and I have had uptimes of over 1000 days –
clusters that are turned on when they are delivered and turned off when they are obsolete. It is crucial for long running calculations that you have a stable OS – you have never seen wrath like a computational scientist whose 200 day calculation has just failed because you needed to reboot the node it was running on. And stability …

P.
Jonathan Billings says:

July 18, 2017 at 8:01 am

I too was a HPC admin, and I knew people who believed the above, and their clusters were compromised. You’re running a service where the weakest link are the researchers who use your cluster — they’re able to run code on your nodes, so local exploits are possible. They often have poor security practices (share passwords, use them for multiple accounts).

Also, if your researchers can’t write code that performs checkpoints, they’re going to be awfully unhappy when a bug in their code makes it segfault 199 days into a 200 day run.

Scheduled downtime and rolling cluster upgrades is a necessity of HPC cluster administration. I do wish that the ksplice/kpatch stuff was available in CentOS.
Valeri Galtsev says:

July 18, 2017 at 9:20 am

Thanks, Jonathan! Before your reply I had bad feeling that I’m the only one in this World who still respects security considerations… The only thing is: I still shy away from ksplice/kpatch, and do reboot machines instead of patching running kernel on the fly.

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
Paul Heinlein says:

July 18, 2017 at 9:45 am

+1
Peter Kjellström says:

July 18, 2017 at 10:13 am

I work at a quite large hpc site and fully agree.

HPC resources need possibly more smart and active security work than your average server.

With 1000+ users that can compile and run jobs and get their credentials misplaced etc. we typically move even faster than CentOS
updates to fix/half-patch/mitigate security vulnerabilities.

/Peter
Peter Larsen says:

July 20, 2017 at 8:07 am

Sorry, but this statement really urks me in a wrong way. Why do you think a firewall is the ONLY part that needs to be provide security?
That’s the way I read this statement – that it doesn’t matter anywhere else. In addition, the majority of attacks and compromises come from INSIDE the firewall – ie. the “wannacry” and similar attacks are all distributed via email, executed on a local workstation and it propagates from there – your external firewall is not even hit before your servers/cluster is scanned.

Another aspect here is all the other stuff outside the kernel. Even if you do “yum update” frequently if you don’t restart, there are several daemons and features of your system that doesn’t get patched – the code is in memory and changing the disk has no effect at all.

Bottom line is, I would not be proud of tripple digit single server uptimes. It simply tells me, I can find lots of ways in – not that you’re running a rock solid setup.
Valeri Galtsev says:

July 20, 2017 at 10:12 am

I will second that. I personally run servers under assumption that bad guys are already inside. Doesn’t negate other measures as firewall, brute force attack protection etc. But I’ve seen bad guys attempting to elevate privileges (unsuccessfully) twice during last over decade and a half. Both times I thanked myself for taking appropriate security measures.

I am really unimpressed how MicroSoft’s misconception “safe internal network” became widely spread over allegedly much more intelligent community which Linux community is (or should be). There is nothing safe on the network for me if:

1. there is at least one computer on this network which is installed and maintained not by me (assuming all machines I maintained are secured appropriately, include here sysadmins who do the same)

2. there is at least one user except for me (and my mate sysadmins who are same security aware as hopefully I am)

In other words: if you are sysadmin, paranoia is one of the words in your job description. I really find it difficult to have people take it to their hearts (except sysadmins who _had_ an incident, and had to sweep up after that, and had to tell their users that machine/cluster he administers was hacked and why).

I hope, this helps someone.

Valeri

++++++++++++++++++++++++++++++++++++++++
Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247
++++++++++++++++++++++++++++++++++++++++
says:

July 20, 2017 at 10:32 am

Valeri Galtsev wrote:

A cluster for heavy duty computing, of which I run several, is a whole
‘nother ballgame. I think I mentioned, but let me recap: 1. only a few people have access to the systems (/bin/noLogin, otherwise); 2) my users have jobs that can be running one, two, or even three weeks straight. And several users’ jobs can overlap. We cannot update something that might affect the running jobs (like, say, glibc).

Now, some things, like say bind, no problem. But more serious things might break their jobs, and that’s not acceptable. We make arrangements to update a few times a year.

Note that there was an update to, I believe, glibc early in 6.x that *did*
break computations – results with the update were different than the glibc before that, so we have to be cautious.

As most folks here where I work know, my job here is to keep the researchers going, not to run systems to run systems (another group here does seem to feel the latter way….) Oh, and my personal mission statement is xkcd 705.

mark

Thanks To Every One

10 thoughts on - Thanks To Every One

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta