[Gllug] Re: www.spews.org - spamming blacklist
Nix
nix at esperi.demon.co.uk
Sun Jun 8 23:18:46 UTC 2003
On 03 Jun 2003, Mike Brodbelt spake:
> SpamAssassin demonstrates quite well that netblock lookups are an
> unnecessary blunt instrument. Content based filtering and Bayesian
> analysis can remove spam more effectively
Actually it demonstrates the exact opposite. Using the statistics from
the last GA run in March (for SA 2.54, IIRC):
No net, no Bayes:
,----[ rules/STATISTICS.txt ]
| # SUMMARY for threshold 5.0:
| # Correctly non-spam: 130678 56.21% (99.92% of non-spam corpus)
| # Correctly spam: 90057 38.74% (88.55% of spam corpus)
| # False positives: 100 0.04% (0.08% of nonspam, 4551 weighted)
| # False negatives: 11640 5.01% (11.45% of spam, 39435 weighted)
| # Average score for spam: 16.4 nonspam: -1.3
| # Average for false-pos: 5.9 false-neg: 3.4
| # TOTAL: 232475 100.00%
`----
Net, no Bayes:
,----[ rules/STATISTICS-set2.txt ]
| # SUMMARY for threshold 5.0:
| # Correctly non-spam: 173760 72.63% (99.91% of non-spam corpus)
| # Correctly spam: 61628 25.76% (94.37% of spam corpus)
| # False positives: 160 0.07% (0.09% of nonspam, 3545 weighted)
| # False negatives: 3680 1.54% (5.63% of spam, 11930 weighted)
| # Average score for spam: 19.0 nonspam: -4.5
| # Average for false-pos: 5.9 false-neg: 3.2
| # TOTAL: 239228 100.00%
`----
56/38% versus 72/25%.
No net, Bayes:
,----[ rules/STATISTICS-set1.txt ]
| # SUMMARY for threshold 5.0:
| # Correctly non-spam: 80750 46.56% (99.93% of non-spam corpus)
| # Correctly spam: 87718 50.58% (94.72% of spam corpus)
| # False positives: 60 0.03% (0.07% of nonspam, 3959 weighted)
| # False negatives: 4892 2.82% (5.28% of spam, 15334 weighted)
| # Average score for spam: 19.7 nonspam: -1.5
| # Average for false-pos: 5.8 false-neg: 3.1
| # TOTAL: 173420 100.00%
`----
Net and Bayes:
,----[ rules/STATISTICS-set3.txt ]
| # SUMMARY for threshold 5.0:
| # Correctly non-spam: 75073 56.87% (99.94% of non-spam corpus)
| # Correctly spam: 54359 41.18% (95.56% of spam corpus)
| # False positives: 46 0.03% (0.06% of nonspam, 2077 weighted)
| # False negatives: 2524 1.91% (4.44% of spam, 8665 weighted)
| # Average score for spam: 20.2 nonspam: -4.8
| # Average for false-pos: 6.0 false-neg: 3.4
| # TOTAL: 132002 100.00%
`----
46/50% versus 56/41%.
Network tests *are* worth it; the trick is to find blacklists with a
good FP ratio. SPEWS is the opposite: duncf's tests indicated that 98%
of mail caught by SPEWS was nonspam.
The SBL, for instance, is a good list: 98.9% spam as of my last
mass-check-with-network-tests. That's worth using. SPEWS is not --- at
least, not if your goal is identifying spam. (But then, the SPEWS people
admit to other goals...)
--
`It is an unfortunate coincidence that the date locarchive.h was
written (in hex) matches Ritchie's birthday (in octal).'
-- Roland McGrath on the libc-alpha list
--
Gllug mailing list - Gllug at linux.co.uk
http://list.ftech.net/mailman/listinfo/gllug
More information about the GLLUG
mailing list