Bash Help

Home » CentOS » Bash Help
CentOS 30 Comments

I know this is for CentOS stuff, but I’m at a loss on how to build a script that does what I need it to do.  It’s probably really logically simple, I’m just not seeing it.  Hopefully someone will take pity on me and at least give me a big hint.

I have a file with two columns ’email’ and ‘total’ like this:

me@example.com 20
me@example.com 40
you@domain.com 100
you@domain.com 30

I need to get the total number of messages for each email address.  This type of code has always been the hardest for me for whatever reason, and honestly, I don’t write many scripts these days. I’m struggling to get psuedocode that works, much less a working script. I know this is off topic, and if it gets modded out, that’s fine.  I just can’t wrap my brain around it.


Mark Haney Network Engineer at NeoNova
919-460-3330 option 1
mark.haney@neonova.net www.neonova.net

30 thoughts on - Bash Help

  • here is a python solution

    #!/usr/bin/python
    #python 2 (did not check if it works)
    f=open(‘yourfilename’)
    D={}
    for line in f:
    email,num = line.split()
    if email in D:
    D[email] = D[email] + num
    else:
    D[email] = num f.close()
    for key in D:
    print key, D[key]

  • Not bash but perl:

    #####
    #!/usr/bin/perl my %dd;
    while (<>) {
    my @f=split;
    $dd{$f[0]}{COUNT}+=$f[1];
    }
    print “\nSums:\n”;
    for (keys %dd) { print “$_\t $dd{$_}{COUNT}\n”; };
    ####

    It takes the data on stdin, sums it into an associative array and prints out the result

    Results:
    ######
    $ ./ppp me@example.com 20
    me@example.com 40
    you@domain.com 100
    you@domain.com 30

    Sums:
    you@domain.com 130
    me@example.com 60
    ######

    I’m sure some perl monk can come up with a single line command to do the same thing.

    P.

  • I do this kind of thing on a fairly regular basis with a Perl one-liner:

    perl -ne ‘($email, $num) = split; $tot{$email} += $num; END { for $email
    (keys %tot) { print “$email $tot{$email}\n” } }’ < yourfile -- Bowie

  • This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.)

    That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature.

    #!/bin/bash declare -A totals

    while read line do
    IFS=”\t ” read -r -a elems <<< "$line" email=${elems[0]} subtotal=${elems[1]} declare -i n=${totals[$email]} n=n+$subtotal totals[$email]=$n done < stats for k in "${!totals[@]}" do printf "%6d %s\n" ${totals[$k]} $k done You’re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript…probably dozens of languages.

  • Although “not my question”, thanks, I learned a lot about array processing from your example.

    —– Original Message —–
    From: “warren”
    To: “CentOS”
    Sent: Wednesday, October 25, 2017 11:47:12 AM
    Subject: Re: [CentOS] [OT] Bash help

    This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.)

    That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature.

    #!/bin/bash declare -A totals

    while read line do
    IFS=”\t ” read -r -a elems <<< "$line" email=${elems[0]} subtotal=${elems[1]} declare -i n=${totals[$email]} n=n+$subtotal totals[$email]=$n done < stats for k in "${!totals[@]}" do printf "%6d %s\n" ${totals[$k]} $k done You’re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript…probably dozens of languages.

  • Yeah, it’s amazing how many obscure corners of the Bash language must be tapped to solve such a simple problem. I count 7 features in that script that I almost never use, because I’d have just written this one in Perl if not required to write it in Bash by the OP.

    I expect that’s why the features are obscure to you, too: once you need to step beyond POSIX 1988 shell levels, most people just switch to some more powerful language, owing to the dark days when even a POSIX shell was sometimes tricky to find, much less a post-POSIX shell. (Can you say /usr/xpg4/bin/sh ? Yyyeahh…)

    That situation threw a long shadow over the shell scripting landscape, where relatively few dare to tread, even today.

  • Yeah, you’re right, I am. An associative array was the first thing I
    thought of, then realized BASH doesn’t do those.  I honestly expected there to be a fairly straight forward way to do it in BASH, but I was sadly mistaken.  In my defense, I gave virtually no thought on the logic of what I was trying to do until after I’d committed significant time to a BASH script.  (Well maybe that’s not a defense, but an indictment.)

    As I said, I don’t do much scripting anymore as the majority of my time is spent DB tuning and Ansible automation.  Not really an excuse, and I
    appreciate your indulgence(s) in giving me a hand.  As embarrassed as I
    am, I’ll just go sit in the corner the rest of the day.

    Thanks again.


    Mark Haney Network Engineer at NeoNova
    919-460-3330 option 1
    mark.haney@neonova.net http://www.neonova.net

  • Warren Young wrote:

    Associative arrays?

    Awk! Awk! (No, I am not a seagull…)

    sort file | awk ‘{ array[$1] += $2;} END { for (i in array) { print i “\t”
    array[i];}’

    mark “associative arrays, how do I love thee? Let me tot the arrays…”

  • Warren Young wrote:

    Let me say this: among the many reasons I like *Nix: in any other o/s, it’s “how co I create this report, and it takes from 2 days to 2 weeks. In
    *Nix, it’s “of all the ways I can create this report, how would I *prefer*
    to do it….”

  • No kidding, but in that “other OS” the answer to the question “how can I create that report” is usually “You can’t unless you spend money for a third-party application”.

    —– Original Message —–
    From: “m roth”
    To: “CentOS”
    Sent: Wednesday, October 25, 2017 12:27:28 PM
    Subject: Re: [CentOS] [OT] Bash help

    Warren Young wrote:

    Let me say this: among the many reasons I like *Nix: in any other o/s, it’s “how co I create this report, and it takes from 2 days to 2 weeks. In
    *Nix, it’s “of all the ways I can create this report, how would I *prefer*
    to do it….”

  • starts getting weird, but this is absolutely elegant. No offense to the other examples, they are all awesome, but I had no idea awk could do this with such little effort.  Well, I know what I’m studying up on this weekend.


    Mark Haney Network Engineer at NeoNova
    919-460-3330 option 1
    mark.haney@neonova.net http://www.neonova.net

  • Mark Haney wrote:
    first got into *nix, in ’91. Had a project where We were going to be the center and Tell All Agencies The Format of the data they would give us, and we’d load a d/b…. I wrote the d/b loader in C..and then they all said, “sorry, no budget for that, here’s the format we’ve got it in, ya want it or not?”

    Before that project finished, I had 30 awk scripts, ranging in length from
    100-200 lines (yes, really), to reformat, and validate the data before feeding it to the loader I’d written. The other thing – there may be more succinct ways to write it (my manager, these days, uses regular expressions to the point I have to look what it’s doing up), while more than half my career was as a programmer, and I write code such that if I
    get hit by a car, or take another job, or get called at 16:30 on a Friday, or 02:00, I want to fix the problem without spending hours trying to remember how clever I’d been last year… so I make it easily readable and comprehensible.

    awk is just fun.

    mark

  • Not enough experience with the mainframe: I meant WinDoze.

    —– Original Message —–
    From: “m roth”
    To: “CentOS”
    Sent: Wednesday, October 25, 2017 1:02:54 PM
    Subject: Re: [CentOS] [OT] Bash help

    Leroy Tennison wrote:

    mark “been around the block”

  • hrm.. seems like you were missing a }

    sort file | awk ‘{array[$1] += $2;} END { for (i in array) {print i “\t”
    array[i];}}’

    regards,

    Jason

  • not to be outdone, python can sort them based on the totals

    for k in sorted(D, key=d.get, reverse=True):
    print k, D[k]

  • In article ,
    wrote:

    Why the sort? It doesn’t matter in what order the lines are read. Wouldn’t this give you the same?

    awk ‘{ array[$1] += $2;} END { for (i in array) { print i “\t” array[i];}}’

  • But it does: in Bash 4, only.

    If you mean you must still use Bash 3 in places, then yeah, you’ve got a problem… one probably best solved by switching to some other language once the program grows beyond Bash 3’s natural scope.

    I was trying to think of which languages I know well which require even more difficult solutions than the Bash 4 one. It’s a pretty short list: assembly, C, and MS-DOS batch files. By “C” I’m including anything of its era and outlook: Pascal, Fortran…

    I think even Tcl beats Bash 4 on this score, and it’s notoriously minimal in its feature set.

    Here’s a brain-bender: You could probably do it with sqlite3 with fewer lines of code than my Bash 4 offering. :)

    Oh, I don’t know, there must be a way to do it without associative arrays, but you’d only get points for the masochism value in doing without.

  • Array N holds the names and array T holds the totals.  For each line in the file, you iterate through N to find the name and then add the number to the same index in T (or create a new entry in both arrays if you don’t find it).  Then you just have to iterate through both arrays and print off the names from N and the totals from T.  It’s a pain, but it’s doable.

    Sorry, I’m too lazy to write code for this…  :)


    Bowie

  • Once upon a time, Warren Young said:

    Heh, even C on SVR4 and newer (including POSIX from 2001) have pretty straight-forward hash routines: hcreate(), hsearch(), and hdestroy().

    Chris Adams

  • A slightly different approach written for ksh but seems to also work with bash 4.

    typeset -A arr

    while read addr cnt do arr[$addr]=$(( ${arr[$addr]:-0} + cnt))
    done < ${1} for a in ${!arr[*]} do printf “%6d %s\n” ${arr[$a]} $a done Jon

  • Tony Mountifield wrote:
    You’re right, not really necessary in this case. I was working with a couple of awk scripts here at work, and it was needed in the middle….

    mark

  • I’d always assumed that shell scripting was a kind of sado masochistic medium allowing people who don’t get out much to inflict horrible torture on each other. It certainly causes me great pain every time I try and read a bash script with more than a couple of clauses.

    I’m just taking over a bunch of bash CI plumbing that seems to have been written by a committee of Manson family members.

  • Nonsense. Every POSIX shell has an associative array called “the filesystem.”

    (hash=$(mktemp -d); while read addr msgs; do echo $msgs >>
    “$hash/$addr”; done; cd “$hash”; for x in *; do echo “$x $(paste -s
    -d+ < $x | bc)"; done;) < msg-counts

  • This thread started as “I’m not sure if this is offtopic” and it ended as such a great and fun to read discussion. Thank you all for these great script examples. I really enjoyed reading it.