[sclug] Probability problem

Will Dickson wrd at glaurung.demon.co.uk
Sat Oct 25 09:05:39 UTC 2003


Roland Turner wrote:

> A reasonable first approximation appears to be Ax.Ay/(n-1), but
> how close is this? Over what range? Can an exact form be written
> down?
> 

I did some basic monte-carlo simulation and the answer that comes back 
is that the metric is right on the money: where Ax = Ay = sqrt(N) then
the expected number of shared associates is one. (Cf. my earlier post
on the difference between probabilities and expectation values.) The 
results I got in practice were:

1,000 0.928000
10,000 0.989000
100,000 0.960000
1,000,000 1.025000
10,000,000 1.014000

First column is N, second column is the mean number of shared associates
taken over 1000 runs.

Source code attached.

Will.


-------------- next part --------------
/* Trivial monte-carlo simulation of a (variant on a) question in P2P networking 
	posed by Roland Turner. It attempts to answer this question:
	Given two populations Ax and Ay drawn at random (with replacement) from a 
	larger population N, what is the expected number of entities which
	are members of both populations, and does it agree with the predicted value
	Ax.Ay/n? The case where Ax=Ay=sqrt(N) is particularly considered, for various
	sizes of N.
*/

/* In the interests of brevity I have deliberately neglected the usual .h/.c 
	partitioning, and dumped as much as possible into this one file. If this 
	code is considered more widely useful, it should be trivial enough to
	fix it up.

	However, ran2.c is Numerical Recipes library code, looked scary, and so
	I've left it isolated in its own source file.

	This code originally written in MSVC++ 6 (yeah, yeah, I know, but my win32
	host is a lot more powerful than my linux host, so there :-). You might need
	to tweak it a little - my C is a bit rusty and tends to come out looking like
	Java :-/

	NB. This assumes that int is 32 bits.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>


/* Random number generator. Since this is a real simulation (albeit a simplistic
	one) it's necessary to use a known-good generator. The one in the std lib
	may not be adequate, and often isn't. Return range is 0 > x > 1.0 (NB. excluding
	both extrema.) Credit: Numerical Recipes in C.
*/
long ranseed = -42L;
float ran2(long *seed);


/* Perform a simulation based on a population size of nPop, with nCyc iterations.
	Return the average number of shared associates, divided by the expected value.
*/
float simulate(int nPop, int nCyc);


/* Perform a single cycle. Return the number of hits.
*/
int sim0(int nPop, char *wad);




int main(int argc, char* argv[])
{
int n;
float q;
FILE *fp;

	/* 16-bit ints do not make it */
	assert(sizeof(int) >= 4);
	
	/* Initialise generator. Currently ranseed is always set to the same value, so
		results from multiple runs are the same every time. To make runs different,
		set ranseed to a different value before this ran2() call. NB. ranseed must
		be negative here (this indicates that the generator is to be initialised). 
		Don't touch ranseed afterwards.
	*/
	ran2(&ranseed); 

	/* perform simulation over several population sizes, print the results and also
		write them to a file. NB. You'll need to change the filename and / or arrange
		to pass it in properly. NB. the running time of simulate() is O(n); each iteration
		takes 10x as long as the previous one. You will have time for a tea-break while
		the last iteration runs. NB. The memory requirement is n bytes as well. This might
		be significant, esp. if you decide to extend the simulation to 100,000,000; 
		currently it stops at 10,000,000.
	*/
	fp = fopen("w:\\temp\\raz.txt", "wt");
	for(n = 1000; n < 100000000; n *= 10) {
		q = simulate(n, 1000);
		printf("%d: %f\n", n, q);
		fprintf(fp, "%d %f\n", n, q);
	}
	fclose(fp);
}


float simulate(int nPop, int nCyc)
{
char *wad;
int i;
int nHit;
float mean;
float expect;

	wad = malloc(nPop);
	nHit = 0;
	for(i=0; i<nCyc; i++) nHit += sim0(nPop, wad);
	mean = (float)nHit / (float)nCyc;
	expect = 1.0F; /* since size of both sub-pops is sqrt(nPop) */
	free(wad);
	return mean / expect;
}


int sim0(int nPop, char *wad)
{
int nX;
int i;
int idx;
int q;

	memset(wad, 0, nPop);
	nX = (int)sqrt(nPop);
	for(i=0; i<nX; i++) {
		idx = (int)(nPop*ran2(&ranseed));
		wad[idx] |= 1;
	}
	for(i=0; i<nX; i++) {
		idx = (int)(nPop*ran2(&ranseed));
		wad[idx] |= 2;
	}
	q = 0;
	for(i=0; i<nPop; i++) if (wad[i] == 3) q++;
	return q;
}

/* raz.c */
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ran2.c
Type: image/x-xbitmap
Size: 1194 bytes
Desc: not available
Url : http://lists.tmdg.co.uk/pipermail/sclug/attachments/20030515/eaaf01b7/ran2.xbm


More information about the Sclug mailing list