[Glastonbury] html wc

Andrew M.A. Cater amacater at galactic.demon.co.uk
Sat Jul 2 12:55:34 BST 2005


On Sat, Jul 02, 2005 at 12:42:59PM +0100, Ian Dickinson wrote:
> Hi Al,
> > How about dumping the html through lynx (the text mode browser) and
> > counting the results.
> That's a good suggestion!
> 
> > alan at wopr:~ $  lynx --dump --nolist http://popey.com/blog/ | wc
> >     138     520    3813
> > 
> > Yup, that appears to work. 
> Concur.
> 
> > That help? Word indeed.. tsk! ;)
> Yes, thanks.  That handles the html nicely.
> 
> Any other suggestions out there? What I'm actually trying to do is wc
> a docbook document. So I can process the docbook to html, then use
> Al's method to wc that. That gives me a working solution, but I'm
> always interested to know if there are other tricks I'm missing.
> 
> Thanks,
> Ian
> 
The O'Reilly reqular expressions book has lots of this sort of thing: I'm
fairly sure he's got an advanced word count/word match - but you a.)
need to have the book and b.) be prepared to type in the Perl.  For
anyone doing any sort of pattern matching this book is a must IMHO.

Andy
> _______________________________________________
> Glastonbury Linux User Group mailing list
> Glastonbury at mailman.lug.org.uk
> http://mailman.lug.org.uk/mailman/listinfo/glastonbury
> 
> User group website: http://www.lugog.org.uk/



More information about the Glastonbury mailing list