[GLLUG] Digests of CSV files

James Courtier-Dutton james.dutton at gmail.com
Fri Jan 5 19:52:04 UTC 2018


Hi,

I would probably use Apache Pig Latin for that. About 3 lines of pig code.
1) load ...
2) group ...
3) count ...


Another option could be Jupyter notebook.


On 5 Jan 2018 18:13, "John Levin via GLLUG" <gllug at mailman.lug.org.uk>
wrote:

> Dear list,
>
> I'm having a bad google day, and am not sure what terms to search on, so I
> hope the list will point me in the right direction.
>
> I have a number of csv files (of c18th imprisoned debtors). There are
> three important columns: gender, prison, trade. What I want is a program or
> script that will simply digest each column and relate them to each other,
> producing something along the lines of:
> There are 200 weavers.
> There are 190 male weavers.
> There are 20 weavers in Norwich Castle.
> There are 18 male weavers in Norwich Castle.
> etc.
>
> This strikes me as a very obvious need, but aside from fantastically
> complex apps like SPSS (which in any case, doesn't seem to have a simple
> way of doing this) I have not found anything that satisfies it.
>
> Very happy to try writing some bash script to do this, but am not sure
> where to start. CSVkit or suchlike?
>
> Thanks in advance,
>
> John
>
> --
> John Levin
> http://www.anterotesis.com
> http://twitter.com/anterotesis
>
> --
> GLLUG mailing list
> GLLUG at mailman.lug.org.uk
> https://mailman.lug.org.uk/mailman/listinfo/gllug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20180105/a3bc2e49/attachment.html>


More information about the GLLUG mailing list