[GLLUG] Digests of CSV files

Fred Youhanaie fly at anydata.co.uk
Sun Jan 7 10:16:09 UTC 2018

In addition to the other good replies, I think you may find the Unix 
commands, cut and uniq useful here. This will work as long as the only 
commas in the files are the delimiters.

For example, if the order of the columns are "gender,prison,trade", then

The following will give you the breakdown by trade:

	cut -d, -f3 file1.csv file2.csv ... | sort | uniq -c

The following will give you the breakdown by gender and trade:

	cut -d, -f1,3 file1.csv file2.csv ... | sort | uniq -c


Other complementary commands for manipulating such files are grep, 
paste, join, sed, awk.



P.S. If I needed to do this beyond a 10 minute exercise, I would 
probably create a sqlite database and use sql :)

On 05/01/18 18:11, John Levin via GLLUG wrote:
> Dear list,
> I'm having a bad google day, and am not sure what terms to search on, so 
> I hope the list will point me in the right direction.
> I have a number of csv files (of c18th imprisoned debtors). There are 
> three important columns: gender, prison, trade. What I want is a program 
> or script that will simply digest each column and relate them to each 
> other, producing something along the lines of:
> There are 200 weavers.
> There are 190 male weavers.
> There are 20 weavers in Norwich Castle.
> There are 18 male weavers in Norwich Castle.
> etc.
> This strikes me as a very obvious need, but aside from fantastically 
> complex apps like SPSS (which in any case, doesn't seem to have a simple 
> way of doing this) I have not found anything that satisfies it.
> Very happy to try writing some bash script to do this, but am not sure 
> where to start. CSVkit or suchlike?
> Thanks in advance,
> John

More information about the GLLUG mailing list