Rearrange rows of a csv based on a column value

Rearrange rows of a csv based on a column value

Problem Description:

I am trying to rearrange the rows in a csv based on the key given in 5th column of my data.
My data looks like this (test.csv):

Col A,Col B,Col C,Col D,Col E
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
C,Data 3,Category 3,Name 3,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C

I am trying to rearrange it so that the row containing the key value in first column is at top (in this case the key value is C)

Desired output :

Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C

I have written the below code and also getting the desired result doing so i am generating two temporary files , just wondering if there is a better solution :

sed 1d test.csv > input.csv 
key=`awk -F"," -v 'OFS=,' '{ print $5}' input.csv | uniq`
awk -F"," -v 'OFS=,' '{if($1 == "'$key'") print}' input.csv > temp.csv
cat temp.csv input.csv > temp2.csv

awk '!seen[$0]++' temp2.csv > output.csv

sed -i '1iCol A,Col B,Col C,Col D,Col E' output.csv

Please help !

Solution – 1

Using GNU sed

$ sed -Ez 's/^([^n]*n)(.*n)(([[:alpha:]])[^n]*4n)/132/woutput.csv' input_file
$ cat output.csv
Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C

Solution – 2

You could do something like this:

awk -F ',' '
    !seen[$0]++ {
        if ( $1 == $5 || NR == 1 )
            print
        else
            arr[++n] = $0
    }
    END { for (i = 1; i <= n; i++) print arr[i] }
' input.csv > output.csv

That’ll bufferize the non-matching lines and output them at the end.

output

Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C

Solution – 3

$ cat tst.awk
BEGIN { FS="," }
(NR == 1) || f {
    print
    next
}
$1 == $5 {
    print $0 buf
    f = 1
}
{ buf = buf ORS $0 }

$ awk -f tst.awk test.csv
Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C

The above will only save the lines up to the matching line in memory.

Rate this post
We use cookies in order to give you the best possible experience on our website. By continuing to use this site, you agree to our use of cookies.
Accept
Reject