So I've long been obsessed with exponential distributions. They're so easy to calculate in code
-A*ln(rand()). So if you had some process in a for loop where 1/7 time you did something but for efficiency sake you just wanted to calculate the time and wait to do the same task that formula would give it to you. Just wrap your random number in a log.
What's really cool about it though is that it gives you a number from 0 to infinity at random but still with an average. The other thing it can do is let's say you are simulating load on a server. You have 10,000 users each who log in about once a day. You could go through your list of users and as the question did user 000102 contact the server this second, did 001380 contact the server this second, or rather do it per microsecond. Or you can calculate the time between requests with a O(1).
So I'm obsessed with using exponential distributions in my life. Maybe more than is healthy. So I've started a recent workout habit that introduced a new problem to think about. I've been a hiker for a while and it all started to get too easy. I've seen just about every trail and the distance I can cover in a day is too much or else I run out of trail. So I've introduced weight AKA ruck hikes. (So my particular problem with it is solved but the abstract question is still interesting.)
So lets say for variability we want to go a random distance with a random weight per day. That way we get some easy days, some hard days, some distance days, and some really weighted days.
So the obvious way to do that would be distance=-(some average)ln(rand()), weight=-(some average)ln(rand()). Great, fun. But this produces a problem. When we multiply these distributions we get something pretty unstable. It's already enough having some theoretical possibility to have to go infinity miles in a day (it's just infinitely unlikely), but when we multiply the two we get something pretty ugly not in theory but in practice. In fact the way the math works out no matter how low I set my averages there is a very high probability of being asked to do the impossible within a week. The moral of the story, don't multiply exponential distributions.
So here's where we want to focus on the abstract. How do you find two distributions that when multiplied produce some target distribution? I may have come off as a math nerd but I'm not. I'm only obsessed specifically with exponential distributions and I lack the tools to work this problem out. (I have figured out how to stabilize my workout but I'm always interested in random numbers from other random number problems https://www.youtube.com/watch?v=xHh0ui5mi_E)).
I've worked this one as far as I can go and now that my practical need is satisfied I pass it on to whoever wants to take it on.
view the rest of the comments →
[–] NotAnOctopus 0 points 1 point 1 point (+1|-0) ago
As already stated your problem is that you are treating dependent variables as independent. Dist.=-Aln(rand(x)), Wt=-Bln(rand(t)) but y is a function of both x and t. You need to come up with an algorithm that factors one given the other. Something like Y(W,D) is given by Wt=-Cln(rand(f(Dist))) filling in the necessary variables and equations for D.
[–] bikergang_accountant [S] ago
I'm not quite sure what people are getting at with this independent vs dependent problem with my question.
So I'm calculating distance and weight independently and the Dist*Wt is just an approximation of difficulty. Obviously as an approximation it's pretty lousy and the difficulty would be a manifold of some kind that would take a lot of human factors and could never really be that precise.
But the point is that if we did want a distribution where we could multiply to factors in that distribution to come out the same as -A*ln(rand(0,1)).
Maybe it's the coder in me but I noticed your rand function has an argument. I meant for it to be assumed that was an equal distribution from 0 to 1.
But maybe what you're getting about them not being independent is that we have three numbers we're interested in Dist,Weight,Difficulty, all of which we want to be random but we only have two degrees of freedom.
So your solution is to have the distance effect the actual random number feed into the natural log? Interesting. You wouldn't be able to effect the range because then weight wouldn't be an exponential distribution if there are numbers it can't reach but distance would be so the solution wouldn't be symmetric.
The thought I had at one point is to decide the difficulty at random and then have a random way of divvying up the proportionality of distance to weight randomly as well.
What I also found but didn't share because I didn't want to spoil is that Asqrt(-ln(rand())) solves the stability issue when multiplying it with an independent sibling but it does it too well. It's not the exact same distribution as -Aln(rand()). It's actually more forgiving. Not that that a problem for hiking.
What I suspect if I look into it more is that it may be the same distribution as (ln(rand())+ln(rand())/-2 which is also more forgiving. If so that raises another interesting question. If D(d,n) is sig(i=[1,n],d(rand()))/n, it is easy to find D(d,2), D(d,3) where d=D(ln,2), but what about D(d,1/2). If someone wanted to find D(ln,1/2) that's independently interesting.