Rearrange rows of a csv based on a column value
Problem Description:
I am trying to rearrange the rows in a csv based on the key given in 5th column of my data.
My data looks like this (test.csv):
Col A,Col B,Col C,Col D,Col E
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
C,Data 3,Category 3,Name 3,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C
I am trying to rearrange it so that the row containing the key value in first column is at top (in this case the key value is C)
Desired output :
Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C
I have written the below code and also getting the desired result doing so i am generating two temporary files , just wondering if there is a better solution :
sed 1d test.csv > input.csv
key=`awk -F"," -v 'OFS=,' '{ print $5}' input.csv | uniq`
awk -F"," -v 'OFS=,' '{if($1 == "'$key'") print}' input.csv > temp.csv
cat temp.csv input.csv > temp2.csv
awk '!seen[$0]++' temp2.csv > output.csv
sed -i '1iCol A,Col B,Col C,Col D,Col E' output.csv
Please help !
Solution – 1
Using GNU sed
$ sed -Ez 's/^([^n]*n)(.*n)(([[:alpha:]])[^n]*4n)/132/woutput.csv' input_file
$ cat output.csv
Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C
Solution – 2
You could do something like this:
awk -F ',' '
!seen[$0]++ {
if ( $1 == $5 || NR == 1 )
print
else
arr[++n] = $0
}
END { for (i = 1; i <= n; i++) print arr[i] }
' input.csv > output.csv
That’ll bufferize the non-matching lines and output them at the end.
output
Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C
Solution – 3
$ cat tst.awk
BEGIN { FS="," }
(NR == 1) || f {
print
next
}
$1 == $5 {
print $0 buf
f = 1
}
{ buf = buf ORS $0 }
$ awk -f tst.awk test.csv
Col A,Col B,Col C,Col D,Col E
C,Data 3,Category 3,Name 3,C
A,Data 1,Category 1,Name 1,C
B,Data 2,Category 2,Name 2,C
D,Data 4,Category 4,Name 4,C
E,Data 5,Category 5,Name 5,C
F,Data 6,Category 6,Name 6,C
The above will only save the lines up to the matching line in memory.