[sclug] Probability problem
Will Dickson
wrd at glaurung.demon.co.uk
Sat Oct 25 09:05:39 UTC 2003
Roland Turner wrote:
> A reasonable first approximation appears to be Ax.Ay/(n-1), but
> how close is this? Over what range? Can an exact form be written
> down?
>
I did some basic monte-carlo simulation and the answer that comes back
is that the metric is right on the money: where Ax = Ay = sqrt(N) then
the expected number of shared associates is one. (Cf. my earlier post
on the difference between probabilities and expectation values.) The
results I got in practice were:
1,000 0.928000
10,000 0.989000
100,000 0.960000
1,000,000 1.025000
10,000,000 1.014000
First column is N, second column is the mean number of shared associates
taken over 1000 runs.
Source code attached.
Will.
-------------- next part --------------
/* Trivial monte-carlo simulation of a (variant on a) question in P2P networking
posed by Roland Turner. It attempts to answer this question:
Given two populations Ax and Ay drawn at random (with replacement) from a
larger population N, what is the expected number of entities which
are members of both populations, and does it agree with the predicted value
Ax.Ay/n? The case where Ax=Ay=sqrt(N) is particularly considered, for various
sizes of N.
*/
/* In the interests of brevity I have deliberately neglected the usual .h/.c
partitioning, and dumped as much as possible into this one file. If this
code is considered more widely useful, it should be trivial enough to
fix it up.
However, ran2.c is Numerical Recipes library code, looked scary, and so
I've left it isolated in its own source file.
This code originally written in MSVC++ 6 (yeah, yeah, I know, but my win32
host is a lot more powerful than my linux host, so there :-). You might need
to tweak it a little - my C is a bit rusty and tends to come out looking like
Java :-/
NB. This assumes that int is 32 bits.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>
/* Random number generator. Since this is a real simulation (albeit a simplistic
one) it's necessary to use a known-good generator. The one in the std lib
may not be adequate, and often isn't. Return range is 0 > x > 1.0 (NB. excluding
both extrema.) Credit: Numerical Recipes in C.
*/
long ranseed = -42L;
float ran2(long *seed);
/* Perform a simulation based on a population size of nPop, with nCyc iterations.
Return the average number of shared associates, divided by the expected value.
*/
float simulate(int nPop, int nCyc);
/* Perform a single cycle. Return the number of hits.
*/
int sim0(int nPop, char *wad);
int main(int argc, char* argv[])
{
int n;
float q;
FILE *fp;
/* 16-bit ints do not make it */
assert(sizeof(int) >= 4);
/* Initialise generator. Currently ranseed is always set to the same value, so
results from multiple runs are the same every time. To make runs different,
set ranseed to a different value before this ran2() call. NB. ranseed must
be negative here (this indicates that the generator is to be initialised).
Don't touch ranseed afterwards.
*/
ran2(&ranseed);
/* perform simulation over several population sizes, print the results and also
write them to a file. NB. You'll need to change the filename and / or arrange
to pass it in properly. NB. the running time of simulate() is O(n); each iteration
takes 10x as long as the previous one. You will have time for a tea-break while
the last iteration runs. NB. The memory requirement is n bytes as well. This might
be significant, esp. if you decide to extend the simulation to 100,000,000;
currently it stops at 10,000,000.
*/
fp = fopen("w:\\temp\\raz.txt", "wt");
for(n = 1000; n < 100000000; n *= 10) {
q = simulate(n, 1000);
printf("%d: %f\n", n, q);
fprintf(fp, "%d %f\n", n, q);
}
fclose(fp);
}
float simulate(int nPop, int nCyc)
{
char *wad;
int i;
int nHit;
float mean;
float expect;
wad = malloc(nPop);
nHit = 0;
for(i=0; i<nCyc; i++) nHit += sim0(nPop, wad);
mean = (float)nHit / (float)nCyc;
expect = 1.0F; /* since size of both sub-pops is sqrt(nPop) */
free(wad);
return mean / expect;
}
int sim0(int nPop, char *wad)
{
int nX;
int i;
int idx;
int q;
memset(wad, 0, nPop);
nX = (int)sqrt(nPop);
for(i=0; i<nX; i++) {
idx = (int)(nPop*ran2(&ranseed));
wad[idx] |= 1;
}
for(i=0; i<nX; i++) {
idx = (int)(nPop*ran2(&ranseed));
wad[idx] |= 2;
}
q = 0;
for(i=0; i<nPop; i++) if (wad[i] == 3) q++;
return q;
}
/* raz.c */
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ran2.c
Type: image/x-xbitmap
Size: 1194 bytes
Desc: not available
Url : http://lists.tmdg.co.uk/pipermail/sclug/attachments/20030515/eaaf01b7/ran2.xbm
More information about the Sclug
mailing list