Math Help - The ball bag problem

1. The ball bag problem

Hi guys, I've been working on a mathematical model all day and I've lost the ability to think. I've hit a wall. So, I've taken a brake and reformulated the problem into a puzzle. Can anyone shed some light? I'm going to bed and I look forward to your replies in the morning.

The ball bag problem
A room contains an empty bag and a box that contains an infinite number of balls. The balls within the box are coloured and there are an infinite number of colours used, possibly repeatedly. You and another person enter the room. You stand facing the wall so you can’t see the bag, the box or the other person. You also have a piece of paper and a pen. The other person then removes an arbitrary number of balls from the box and places them into the bag. The number of balls removed and the colour of the balls is completely up to the other person. When the other person places a ball into the bag, they call out the colour of the ball and you note it down on the paper. However, you know that the other person lies about the colour of the ball 30% of the time. Once the other person has finished placing balls into the bag, they close it, give it a shake and then select a ball at random from it. What colour do you expect the ball to be and how sure are you?

2. Originally Posted by HMaoTw
Hi guys, I've been working on a mathematical model all day and I've lost the ability to think. I've hit a wall. So, I've taken a brake and reformulated the problem into a puzzle. Can anyone shed some light? I'm going to bed and I look forward to your replies in the morning.

The ball bag problem
A room contains an empty bag and a box that contains an infinite number of balls. The balls within the box are coloured and there are an infinite number of colours used, possibly repeatedly. You and another person enter the room. You stand facing the wall so you can’t see the bag, the box or the other person. You also have a piece of paper and a pen. The other person then removes an arbitrary number of balls from the box and places them into the bag. The number of balls removed and the colour of the balls is completely up to the other person. When the other person places a ball into the bag, they call out the colour of the ball and you note it down on the paper. However, you know that the other person lies about the colour of the ball 30% of the time. Once the other person has finished placing balls into the bag, they close it, give it a shake and then select a ball at random from it. What colour do you expect the ball to be and how sure are you?
Since the balls are selected however the other person wants, the other person might as well just be imagining the ball and then writing the colour on a slip of paper and put that in the bag. (The colour written down is the true colour; the colour called out is true 30% of the time.)

I would start by considering small cases. First of all there's no use guessing a colour that was not called, since there are infinite of those. So suppose there's one ball. Then you guess the colour that was called and have a 30% chance of being right. Now suppose there are two balls selected. Suppose two different colours were called. You guess one of them. Then you have a 15% chance of being right, because the ball whose colour was called has to be selected (50%) and the caller had to be telling the truth (30%). All other probabilities are effectively 0 because of the infinite possibilities. Now suppose the caller called the same colour twice in a row. Now you have a 30% chance. And so on. So just guess the colour that was called most often (or if more than one, just choose one arbitrarily), then suppose it was called k out of n times, then you have a (k/n) * 0.3 probability of being right.

3. Hi thanks for the prompt reply. You are of course completely correct. However, between 5am and now I've realised that my posted problem is an over simplification. The full problem would contains an arbitrary number of 'other people' each with their own 'truth percentages'. Any ideas on how to generalise the problem to many 'other people'?

Thanks again.

4. Originally Posted by HMaoTw
Hi thanks for the prompt reply. You are of course completely correct. However, between 5am and now I've realised that my posted problem is an over simplification. The full problem would contains an arbitrary number of 'other people' each with their own 'truth percentages'. Any ideas on how to generalise the problem to many 'other people'?

Thanks again.
If I'm understanding you correctly, the situation is: Say we have r people,

$A_1, A_2, \dots, A_r$

with corresponding truth-telling probabilities

$t_1, t_2, \dots, t_r$

A total of q distinct colours are called

$c_1, c_2, \dots, c_q$

For each colour $\displaystyle c_i$, define function $f(c_i, k)$ where $1 \le k \le r$, giving the number of times that colour was called out by person $\displaystyle A_k$.

And a total of n balls are picked

$b_1, b_2, \dots, b_n$

What we are interested in is a function of $\displaystyle c_i$, call it g, defined by

$\displaystyle g(c_i) = \frac{1}{n}\sum_{k=1}^{r}f(c_i,k)\cdot t_k$

Choose to get maximal g, where g is the probability of your being right.

5. Hi, again thanks for the reply. I've been working the problem today and after reading your reply I still haven't quiet phrased the problem correctly. Sorry. (You have indeed given a correct solution the problem I wrote here).

My problem I better phased as follows. Instead of multiple balls imagine there is a single ball in the bag. All the other people look into the bag and then call the colour out, but they may still be lying with a given probability. Then once they've finished, you have to work out probability of the colours called out and the probability that the correct colour was not called out at all.

Thanks.

6. Originally Posted by HMaoTw
Hi, again thanks for the reply. I've been working the problem today and after reading your reply I still haven't quiet phrased the problem correctly. Sorry. (You have indeed given a correct solution the problem I wrote here).

My problem I better phased as follows. Instead of multiple balls imagine there is a single ball in the bag. All the other people look into the bag and then call the colour out, but they may still be lying with a given probability. Then once they've finished, you have to work out probability of the colours called out and the probability that the correct colour was not called out at all.

Thanks.
The target moves as soon as the arrow is fired!

I'll try a different approach this time; what do you think the answer should be, and why? Or where did you get stuck? Again, my approach would be to start with small cases and work up.

7. Imagine that there is an unseen mathematical problem. I don’t know the exact problem but I do know that it is an addition of two real numbers. Therefore, I know that there are infinite possible answers but only one correct. Then an arbitrary number of people look at the problem and give me what they believe to be the correct solution. Based on their previous performance on comparable problems, I know how skilled they are, e.g. how likely the answer they gave is correct.

I have two sets, the first contains all the solution that have not been suggested (infinite) and the second containing all the solution that have been suggested. So, at the start, before any solutions have been suggested, I have 100% belief that the solution is contained within the un-suggested set and 0% belief that it is contained within the suggested set.

Then the first person suggests the solution ‘2’ and I trust them 85%. Therefore, I have an 85% belief that ‘2; is the correct solution and, moreover, I now belief that the suggested set has an 85% chance of containing the solution and the un-suggested set has a 15% chance of containing it.

I’ve got the bit down. I’m not sure how to update my beliefs once the next person comes along.

8. Originally Posted by HMaoTw
Imagine that there is an unseen mathematical problem. I don’t know the exact problem but I do know that it is an addition of two real numbers. Therefore, I know that there are infinite possible answers but only one correct. Then an arbitrary number of people look at the problem and give me what they believe to be the correct solution. Based on their previous performance on comparable problems, I know how skilled they are, e.g. how likely the answer they gave is correct.

I have two sets, the first contains all the solution that have not been suggested (infinite) and the second containing all the solution that have been suggested. So, at the start, before any solutions have been suggested, I have 100% belief that the solution is contained within the un-suggested set and 0% belief that it is contained within the suggested set.

Then the first person suggests the solution ‘2’ and I trust them 85%. Therefore, I have an 85% belief that ‘2; is the correct solution and, moreover, I now belief that the suggested set has an 85% chance of containing the solution and the un-suggested set has a 15% chance of containing it.

I’ve got the bit down. I’m not sure how to update my beliefs once the next person comes along.
I find some complications due to the infinities in the problem; we cannot assign a meaningful probability to the event that two people's wrong answers coincide. Here's what I started writing:

~~~~~~~~~~~~~~~~~~~~~

Okay, so moving to the next case, we have two people, $A_1, A_2$, who can be relied on to get correct answers with probability $P(C_1)$ and $P(C_2)$ where $C_i$ is the event that person $A_i$ gets the correct answer.

Two cases: (1) they give the same answer, (2) they give different answers

(1) they give the same answer

You are interested in two probabilities: (a) that the correct answer is the one called out, (b) that the correct answer was not called out.

~~~~~~~~~~~~~~~~~~~~~

Now normally we'd use these probabilities:

P(coincide AND both correct)
P(coincide AND both wrong)
P(both wrong)
P(both right)

to determine these probabilities:

P(both correct | coincide)
P(both wrong | coincide).

But we can't do that here because using a uniform probability distribution for wrong answers over the reals (I believe this probability distribution exists), we get P(coincide AND both wrong) = 0. So using

$\displaystyle P(C|A) = \frac{P(A \cap C)}{P(A)}$

will give P(both correct | coincide) = 1 and P(both wrong | coincide) = 0 which doesn't reflect reality: that it's possible that two people will get the same wrong answer with probability greater than 0, because a problem may have several "common pitfalls" and the like.

So maybe the question needs to be restructured? E.g., if it were a multiple choice question with 5 choices, and probability of that person getting the right answer is given, and if wrong answer, then it is equally likely to be any of the other 4; this problem will give more meaningful results.

9. Thanks again for the replay. I've already get everything working with closed solution problem, e.g. multiple choice, it's just a simple application of Bayesian Inferencer. What I need to do is generalize that out. It's quiet a hard problem. Talking a step back. Every time we are given a solution, either a replica or a duplicate, we become more sure that the correct solution is in the 'suggested' set (As long as we trust the people more than 50%). So we can see that the entropy is decreasing. I've never really had to look at entropy. Do you feel something like that could be of use here.

10. Originally Posted by HMaoTw
Thanks again for the replay. I've already get everything working with closed solution problem, e.g. multiple choice, it's just a simple application of Bayesian Inferencer. What I need to do is generalize that out. It's quiet a hard problem. Talking a step back. Every time we are given a solution, either a replica or a duplicate, we become more sure that the correct solution is in the 'suggested' set (As long as we trust the people more than 50%). So we can see that the entropy is decreasing. I've never really had to look at entropy. Do you feel something like that could be of use here.
I don't know enough about entropy to be able to answer that, sorry.

Also, I have limited experience with what follows, although I believe it to be correct.

I would go so far as to say that using a uniform distribution over the reals produces apparent contradictions. Say for example we have four people A,B,C,D who give answers

So restricting our view to A and B, we see that the probability they got the right answer is 1 (by reasoning in post #8). Yet the probability that C and D got the right answer is also 1. Clearly there is only one right answer and we would normally consider this a contradiction. I believe the contradiction is resolved by the distinction between surely and almost surely. I don't know much about measure theory but this seems to be exactly the type of situation where what's written in the Wikipedia article for "almost surely" applies.

Sorry if I wasn't able to provide an adequate answer. But I think as long as you use a uniform probability distribution over the reals, you will get exactly these results: whenever an answer appears more than once, it is correct with probability 1, with the overall result that your results are useless.

11. <somewhat ranty>

Actually I'm not sure what to think about a uniform probability distribution over the reals. My thinking was that we have a uniform probability distribution over the interval [a,b] defined here, and that there's a one-to-one correspondence between this interval and the reals (actually, I would use a one-to-one correspondence between [a,b] and (a,b), then between (a,b) and the reals)... for example consider the x-axis and the point (x,y) = (0,1) in the xy-plane and draw line segments from the point to the x-axis; we can then relate the angle with the real number, and say we use angles in the range (-pi/2, pi/2), then we can easily find a one-to-one correspondence with this interval and any (a,b). But this will mean that values near 0 are much more likely than values far away from 0 (relatively speaking), so I'm not sure in what sense this would be considered "uniform."

(And if we use the point (0,2) instead, then we get a different distribution, etc.)

</somewhat ranty>

Perhaps someone more experienced with these things would be needed to clarify. I probably don't have time to research in depth to clear up all the questions raised in any timely fashion.