Bash Help
I know this is for CentOS stuff, but I’m at a loss on how to build a script that does what I need it to do. It’s probably really logically simple, I’m just not seeing it. Hopefully someone will take pity on me and at least give me a big hint.
I have a file with two columns ’email’ and ‘total’ like this:
me@example.com 20
me@example.com 40
you@domain.com 100
you@domain.com 30
I need to get the total number of messages for each email address. This type of code has always been the hardest for me for whatever reason, and honestly, I don’t write many scripts these days. I’m struggling to get psuedocode that works, much less a working script. I know this is off topic, and if it gets modded out, that’s fine. I just can’t wrap my brain around it.
—
Mark Haney Network Engineer at NeoNova
919-460-3330 option 1
mark.haney@neonova.net www.neonova.net
30 thoughts on - Bash Help
here is a python solution
#!/usr/bin/python
#python 2 (did not check if it works)
f=open(‘yourfilename’)
D={}
for line in f:
email,num = line.split()
if email in D:
D[email] = D[email] + num
else:
D[email] = num f.close()
for key in D:
print key, D[key]
That gets me closer, I think. It’s concatenating the number of messages, but it’s a start. Thanks.
—
Mark Haney Network Engineer at NeoNova
919-460-3330 option 1
mark.haney@neonova.net http://www.neonova.net
Not bash but perl:
#####
#!/usr/bin/perl my %dd;
while (<>) {
my @f=split;
$dd{$f[0]}{COUNT}+=$f[1];
}
print “\nSums:\n”;
for (keys %dd) { print “$_\t $dd{$_}{COUNT}\n”; };
####
It takes the data on stdin, sums it into an associative array and prints out the result
Results:
######
$ ./ppp me@example.com 20
me@example.com 40
you@domain.com 100
you@domain.com 30
Sums:
you@domain.com 130
me@example.com 60
######
I’m sure some perl monk can come up with a single line command to do the same thing.
P.
I do this kind of thing on a fairly regular basis with a Perl one-liner:
perl -ne ‘($email, $num) = split; $tot{$email} += $num; END { for $email
(keys %tot) { print “$email $tot{$email}\n” } }’ < yourfile -- Bowie
This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.)
That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature.
#!/bin/bash declare -A totals
while read line do
IFS=”\t ” read -r -a elems <<< "$line" email=${elems[0]} subtotal=${elems[1]} declare -i n=${totals[$email]} n=n+$subtotal totals[$email]=$n done < stats for k in "${!totals[@]}" do printf "%6d %s\n" ${totals[$k]} $k done You’re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript…probably dozens of languages.
Although “not my question”, thanks, I learned a lot about array processing from your example.
—– Original Message —–
From: “warren”
To: “CentOS”
Sent: Wednesday, October 25, 2017 11:47:12 AM
Subject: Re: [CentOS] [OT] Bash help
This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.)
That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature.
#!/bin/bash declare -A totals
while read line do
IFS=”\t ” read -r -a elems <<< "$line" email=${elems[0]} subtotal=${elems[1]} declare -i n=${totals[$email]} n=n+$subtotal totals[$email]=$n done < stats for k in "${!totals[@]}" do printf "%6d %s\n" ${totals[$k]} $k done You’re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript…probably dozens of languages.
Yeah, it’s amazing how many obscure corners of the Bash language must be tapped to solve such a simple problem. I count 7 features in that script that I almost never use, because I’d have just written this one in Perl if not required to write it in Bash by the OP.
I expect that’s why the features are obscure to you, too: once you need to step beyond POSIX 1988 shell levels, most people just switch to some more powerful language, owing to the dark days when even a POSIX shell was sometimes tricky to find, much less a post-POSIX shell. (Can you say /usr/xpg4/bin/sh ? Yyyeahh…)
That situation threw a long shadow over the shell scripting landscape, where relatively few dare to tread, even today.
Yeah, you’re right, I am. An associative array was the first thing I
thought of, then realized BASH doesn’t do those. I honestly expected there to be a fairly straight forward way to do it in BASH, but I was sadly mistaken. In my defense, I gave virtually no thought on the logic of what I was trying to do until after I’d committed significant time to a BASH script. (Well maybe that’s not a defense, but an indictment.)
As I said, I don’t do much scripting anymore as the majority of my time is spent DB tuning and Ansible automation. Not really an excuse, and I
appreciate your indulgence(s) in giving me a hand. As embarrassed as I
am, I’ll just go sit in the corner the rest of the day.
Thanks again.
—
Mark Haney Network Engineer at NeoNova
919-460-3330 option 1
mark.haney@neonova.net http://www.neonova.net
Warren Young wrote:
Associative arrays?
Awk! Awk! (No, I am not a seagull…)
sort file | awk ‘{ array[$1] += $2;} END { for (i in array) { print i “\t”
array[i];}’
mark “associative arrays, how do I love thee? Let me tot the arrays…”
Warren Young wrote:
Let me say this: among the many reasons I like *Nix: in any other o/s, it’s “how co I create this report, and it takes from 2 days to 2 weeks. In
*Nix, it’s “of all the ways I can create this report, how would I *prefer*
to do it….”
No kidding, but in that “other OS” the answer to the question “how can I create that report” is usually “You can’t unless you spend money for a third-party application”.
—– Original Message —–
From: “m roth”
To: “CentOS”
Sent: Wednesday, October 25, 2017 12:27:28 PM
Subject: Re: [CentOS] [OT] Bash help
Warren Young wrote:
Let me say this: among the many reasons I like *Nix: in any other o/s, it’s “how co I create this report, and it takes from 2 days to 2 weeks. In
*Nix, it’s “of all the ways I can create this report, how would I *prefer*
to do it….”
starts getting weird, but this is absolutely elegant. No offense to the other examples, they are all awesome, but I had no idea awk could do this with such little effort. Well, I know what I’m studying up on this weekend.
—
Mark Haney Network Engineer at NeoNova
919-460-3330 option 1
mark.haney@neonova.net http://www.neonova.net
Mark Haney wrote:
first got into *nix, in ’91. Had a project where We were going to be the center and Tell All Agencies The Format of the data they would give us, and we’d load a d/b…. I wrote the d/b loader in C..and then they all said, “sorry, no budget for that, here’s the format we’ve got it in, ya want it or not?”
Before that project finished, I had 30 awk scripts, ranging in length from
100-200 lines (yes, really), to reformat, and validate the data before feeding it to the loader I’d written. The other thing – there may be more succinct ways to write it (my manager, these days, uses regular expressions to the point I have to look what it’s doing up), while more than half my career was as a programmer, and I write code such that if I
get hit by a car, or take another job, or get called at 16:30 on a Friday, or 02:00, I want to fix the problem without spending hours trying to remember how clever I’d been last year… so I make it easily readable and comprehensible.
awk is just fun.
mark
Leroy Tennison wrote:
mark “been around the block”
Not enough experience with the mainframe: I meant WinDoze.
—– Original Message —–
From: “m roth”
To: “CentOS”
Sent: Wednesday, October 25, 2017 1:02:54 PM
Subject: Re: [CentOS] [OT] Bash help
Leroy Tennison wrote:
mark “been around the block”
hrm.. seems like you were missing a }
sort file | awk ‘{array[$1] += $2;} END { for (i in array) {print i “\t”
array[i];}}’
regards,
Jason
Jason Welsh wrote:
Oops. Well, it’s not vi, it’s webmail, so I couldn’t check… Thanks.
mark
not to be outdone, python can sort them based on the totals
for k in sorted(D, key=d.get, reverse=True):
print k, D[k]
oops. that’s a capital D.get
for k in sorted(D, key=D.get, reverse=True):
In article,
wrote:
Why the sort? It doesn’t matter in what order the lines are read. Wouldn’t this give you the same?
awk ‘{ array[$1] += $2;} END { for (i in array) { print i “\t” array[i];}}’
But it does: in Bash 4, only.
If you mean you must still use Bash 3 in places, then yeah, you’ve got a problem… one probably best solved by switching to some other language once the program grows beyond Bash 3’s natural scope.
I was trying to think of which languages I know well which require even more difficult solutions than the Bash 4 one. It’s a pretty short list: assembly, C, and MS-DOS batch files. By “C” I’m including anything of its era and outlook: Pascal, Fortran…
I think even Tcl beats Bash 4 on this score, and it’s notoriously minimal in its feature set.
Here’s a brain-bender: You could probably do it with sqlite3 with fewer lines of code than my Bash 4 offering. :)
Oh, I don’t know, there must be a way to do it without associative arrays, but you’d only get points for the masochism value in doing without.
Array N holds the names and array T holds the totals. For each line in the file, you iterate through N to find the name and then add the number to the same index in T (or create a new entry in both arrays if you don’t find it). Then you just have to iterate through both arrays and print off the names from N and the totals from T. It’s a pain, but it’s doable.
Sorry, I’m too lazy to write code for this… :)
—
Bowie
Once upon a time, Warren Young said:
Heh, even C on SVR4 and newer (including POSIX from 2001) have pretty straight-forward hash routines: hcreate(), hsearch(), and hdestroy().
—
Chris Adams
A slightly different approach written for ksh but seems to also work with bash 4.
typeset -A arr
while read addr cnt do arr[$addr]=$(( ${arr[$addr]:-0} + cnt))
done < ${1} for a in ${!arr[*]} do printf “%6d %s\n” ${arr[$a]} $a done Jon
Tony Mountifield wrote:
You’re right, not really necessary in this case. I was working with a couple of awk scripts here at work, and it was needed in the middle….
mark
I’d always assumed that shell scripting was a kind of sado masochistic medium allowing people who don’t get out much to inflict horrible torture on each other. It certainly causes me great pain every time I try and read a bash script with more than a couple of clauses.
I’m just taking over a bunch of bash CI plumbing that seems to have been written by a committee of Manson family members.
Nonsense. Every POSIX shell has an associative array called “the filesystem.”
(hash=$(mktemp -d); while read addr msgs; do echo $msgs >>
“$hash/$addr”; done; cd “$hash”; for x in *; do echo “$x $(paste -s
-d+ < $x | bc)"; done;) < msg-counts
This thread started as “I’m not sure if this is offtopic” and it ended as such a great and fun to read discussion. Thank you all for these great script examples. I really enjoyed reading it.
Ah, *there’s* our masochist. I knew we had at least one around here somewhere. :)
Takes one to know one, I suppose: not long ago, I proposed using the filesystem to implement a [DAG][1] in shell. ;)
[1]: https://en.wikipedia.org/wiki/Directed_acyclic_graph
or just awk …