# The trap of the self-referential question

Trent Dougherty put this photo on Facebook. Have a look at it and then try to answer the question.

What you may find is that as soon as you starting coming up with answers, they move. What are the chances of randomly picking the correct answer to this question? Well, there are four options, so the odds are one in four, or twenty-five percent, right? But wait a minute, two of the options say twenty-five percent. And if that’s the right answer… then that would mean half the options are correct, so now your chances of picking the right one are fifty percent. No, wait, that can’t be right, because fifty percent appears once among the options, and your chances of randomly picking it are twenty-five percent, which would mean that fifty percent is wrong. Well, are twenty-five percent and fifty percent both somehow right? No, that can’t be right, because that would make three of the four options correct and your odds would be seventy-five percent. And seventy-five percent isn’t an option.

At this point you might start thinking “wait, how did I get so tangled up? How come each time I choose an answer it looks like the facts change? This must be a trick!”

You’d be right – although not everyone agrees with my version of what makes it a trick. It’s a trick because the question isn’t a real question. It only looks like a question because it ends in a question mark and includes the words “this question.”

To help see why this isn’t a question, I want you to consider a different question, as follows: “What is the answer to this question?” It’s a collection of words that calls itself a question. It’s phrased as though it were a question, and it even ends in a question mark. So it’s a real question, right? Nope. If we add something to it, then we could make it into a real question. Like so: “I just asked John what his father’s name is. What is the answer to this question?” Now we’ve got a real question in the second sentence, because it’s referring to an actual question, namely the question described in the previous sentence: The question over John’s father’s name. But if you remove that, there is no longer anything answering to “this question,” and the question ends up just referring to itself, lacking substance and becoming meaningless.

The question on the blackboard shares this feature. “If you choose an answer to this question…” can be met with: “What question?” For just the same reason as the empty question in my example, this refers only to itself and ceases to be a genuine question. That is the trick. Only once you fall for the idea that this is an actual question do you start looking for an answer and realise that you’re on a fool’s errand.

As I said, not everyone shares my assessment of what’s fishy with the question on the blackboard. One professional philosopher suggested to me that really we should interpret the blackboard question more charitably. We should construe it, he said, to mean “What is the chance of choosing the right answer to a multiple choice question at random?” If we do that, I was told, the question is meaningful, since it refers, not to itself but to other questions that have substance in their own right, and yet we still end up with the confusing outcome. Hence, the problem is with the answers that are on offer, and not the question after all.

I don’t think so, for a couple of reasons. Firstly, I don’t think that’s what the question means for the simple reason that that’s not what it says – I know, call me a literalist fundamentalist on this! The question says: “If you choose an answer to this question at random, what is the chance you will be correct?” If the term “this question” conveys anything, surely it conveys the notion that what is in view is the question now being asked, and not just any possible multi-choice question. So that’s the first reason I don’t think the question can be salvaged in this way.

The second reason I don’t think the question can be salvaged in this way is that even the new version of the question is this: “What is the chance of choosing the right answer to a multiple choice question at random?” In order for there to be any chance at all of rightly answering a multiple choice question (or any other question), it must be a meaningful question. In fact I have assumed that this is an implied caveat of the question. I will also assume that the new question assumes that there is one and only one correct answer to the possible questions that it is asking us about. So that we don’t change the scenario in ways that have a material impact on the answers given on the blackboard (which are given as percentages, presupposing that there are four options), we should also specify that there are four options. So the question has been changed to this: “What is the chance of, at random, choosing the right answer to a meaningful multiple choice question where there is one right answer and four options?”

But if this is the question, none of the percentages given on the blackboard are even remotely relevant – which is my second reason for not accepting the rather generous re-interpretation of the question. The answer will depend entirely on the options that are provided for that other question being referred to here (questions like: “How many elks are there in the world?” “What is the largest planet in our solar system?” “Who holds the world record for women’s shotput?” etc). If the correct answer is given only once in that other scenario, the correct answer to this question is 25%. If the correct answer appears twice in the options in that other scenario, the correct answer to the blackboard question would be 50%. But the problem is, this question in its generously interpreted form doesn’t tell us how many times the right answer is offered up in that other scenario. The reinterpreted question might be meaningful, but the answer would be “that depends!” Since the (apparent) question on the blackboard assumes that we have enough information to answer it, the reinterpretation of this question is wrong.

We should simply read the blackboard question at face value. But at face value, this isn’t even a real question, any more than the question “What is the answer to this question?” on its own is a real question. And that is how this pseudo-question traps people into not being able to answer it.

If you have an alternative explanation, let’s hear it!

Glenn Peoples

### Similar Posts:

If you liked this post, feel free to help support this project.

• Jason September 8, 2012, 11:23 pm

Hmm, well you could say that it is the product of a high context society where the question draws on the pool of common knowledge that we low context readers are unfamiliar with…

Failing that, it’s a non-question. 🙂

• John Quin September 8, 2012, 11:53 pm

At the risk of cementing my status as a Glenn Peoples sycophant let me say that I side with Glenn on this one.
I was feeling a little encouraged when I was formulating the same conclusions about the issue just before I read Glenn’s analysis.
Hooray for me. LOL
Oxford here I come.

• Glenn September 9, 2012, 12:03 am

LOL John, but I never got to Oxford 😛

• Nate September 9, 2012, 3:50 am

• Glenn September 9, 2012, 10:38 am

Nate – Whoops! There are four options, not five. Quite right – fixed.

• Donavan September 9, 2012, 4:18 pm

There is an answer to this and it is as Jason implied, context.
I have worked on the TV show CSI for 12 years so all you have to do is use your CSI powers.
The context of the question is in the picture, a chock board, and the person who wrote the question must be
a teacher that used actual chalk, the angle of the picture is from the students perspective, the setting is probably an older classroom because of the use of dust causing chalk on a graphite chalk board. So that means this is one of several questions that are commonly written out by the teacher as opposed to handing out a photocopy scantron or something like that. The meaning of the term “this question” is in reference to all the other questions that the teacher has written out so there is meaning. The answer is 25% because the question inferred the correct answer A through D not a number.

• {Tim} September 9, 2012, 10:03 pm

One could take the view that it IS a valid question, but the answer is “zero”…

• Glenn September 9, 2012, 10:20 pm

Tim, one could also be wrong!

• Ciaron September 10, 2012, 10:07 am

The answer is simple. As every mutiple choice question MUST have 1 correct response and 3 incorrect responses, one must conclude that the answer D)25% is an error and must mean 52%. The marker must accept this mistake and mark both answers A&D as correct. 🙂

• Jonathan Bryan September 11, 2012, 10:34 pm

I think I agree with you, Glenn. I’m wondering, though, what you think it is exactly that makes self-referential questions problematic. It seems like some self-referential questions can have substance and are meaningful questions. For example: “How many words are in this question?” seems like a real question. So perhaps the real problem is not so much self-reference but lack of substance? Maybe the self-referential aspect is simply what gives the appearance of meaning, but is not the real problem.

• Sandra September 11, 2012, 11:03 pm

Jonathan, I think the redeeming feature of “How many words are in this question” is that refers to its own structure but not its content. Contrast that with “what is the answer to this question,” which refers to its content – which it just doesn’t have. But it does have structure.

• Glenn September 12, 2012, 7:50 am

Jonathan, I’m with Sandra on this. A question like “How many words are in this question” seems like a real question because we know how to respond to it – it asks about its own structure/syntax, and we have information about that. But questions that refer only to themselves and which purport to ask about their content as questions, these are empty.

• James September 13, 2012, 12:11 am

Hi Glenn,

I like this question! But I think the problem is with the choices. If the answers were 10%, 15%, 20%, 30%, you would likely just conclude that there was no correct answer – ie the poser of the question accidentally or deliberately left off the correct answer. If the answers were just 25%, 25%, 25%, 25%, then you’d think they were just trying to be annoying (for similar, but simpler, reasons to the current one). But if the answers were 10%, 15%, 20%, 25%, or even 10%, 15%, 50%, 50%, you would rightly conclude it was 25% or 50% (respectively). Just because a question has 4 given alternatives doesn’t mean one of them must be the correct answer.

Back to the original question… Could the answer be 25%? No, because, as you said, you have a 50% chance of picking one of the two 25%’s, which would imply the correct answer was 50% – a contradiction. Could it be 50%? Similarly, this would imply the correct answer should be 25%, another contradiction. There is no answer with a 60% chance of being picked, so that option is easily dispensed with. So it seems to me that this is simply a question without a correct answer among the options. But the genious of it is that it requires one to chase the answers around for a while before discovering this.

Cheers,
James.

• Wm Tanksley September 13, 2012, 3:54 am

I answered similarly by saying that it was a self-referential paradox — but I like James’ answer far more; that the problem isn’t entirely in the question, but in the provided answers. (There is a problem in the question, and it’s exactly as you describe, Glen; the question isn’t merely self-referential, but also insufficiently described. A good test question should work both as open-answer AND multiple choice. Therefore, this is not a good test question.)

After seeing this post, I have to say that the problem is NOT in the self-reference; rather, it’s that the question is insufficiently specified. The self-reference merely makes it easier to hide the problem.

And by the way, eliminating self-reference doesn’t eliminate paradoxes, unless you also eliminate the ability to do arithmetic. Russel and Godel proved that.

• Glenn September 13, 2012, 12:31 pm

Wm, it’s true that there are plenty of paradoxes that don’t involve self reference. But it also seems to me that what robs this question of specificity and makes it vacuous is that its content is nothing other than self reference.

James, it sounds like you’re saying that the issue here is just that the right answer isn’t supplied, and there’s really no issue with the self reference. OK, so keeping this as a self-referential question (ie don’t generously reinterpret it), can you say what the correct answer is, or if there could even be one?

• James September 13, 2012, 3:09 pm

Some self-referential questions have perfectly good answers. For example, the question “How many words does this question have?” seems to have the answer 7.

Most MC questions are written like this. They are simply questions with an actual answer, and this answer is made to be one of the MC options, the others filled in as wrong answers. If you instead gave no correct MC options, then the question itself might be fine, but the *MC question* has a problem.

But then some questions seem to have no answer. A classic is “How long is a piece of string?”. Your example of “What is the answer to this question?” seems to fit that bill too. Maybe it doesn’t have an answer, or maybe it has several (all possible?) answers. I think it is the vagueness rather than the self-reference that makes the problem. How would these go in a MC setting, I wonder? What if there was only one MC option supplied? (Does the student only get marks for not answering it?)

The original question here is even more interesting though. In the absence of any MC options, it also seems to have no answer (or maybe several answers – see below, also). Yet when MC options are attached, a correct answer seems to emerge. For example, if there were 5 options, 10%, 20%, 30%, 40%, 50%, then the correct answer would seem to be 20%, but if there were 3 options, 10%, 20%, 30%, then there would seem to be no correct answer. So maybe it would be more appropriate to ask it as “If you choose an answer to this *MC* question at random…”. And then there could either be or not be a correct answer, depending on the MC options (and there could not be a correct answer either because the appropriate 1/N was not supplied, or because of some more detailed contradictory scenario like in the original one).

But I suspect that there are also problems with the original question if it asked in the absence of MC options. It is well known that it is problematic to speak of “choosing a random number” (even positive integer). The problem is: what probability measure are you using? You could choose a random whole number from 1,…,6 by tossing a die. But how could you choose a random whole number from 1,2,3,4,…? They can’t all be assigned the same probability or else the sum of probabilities would be infinite. The individual probabilities have to be more-or-less decreasing towards zero, in which case it seems that “random” is a slightly misleading word – at least it’s not uniformly random. Back to the original question…. If an “answer to a question” is simply a meaningful string of words, then there are infinitely many possible answers. So is it truly possible to choose one at random?

Finally, I think there would be even more problems if the 60% option was changed to 0% 😉

• Glenn September 13, 2012, 4:44 pm

“For example, if there were 5 options, 10%, 20%, 30%, 40%, 50%, then the correct answer would seem to be 20%”

James, this seems (to me at least) to reinterpret the question in the way I described, to refer to “any given multiple choice question where there are five options, one of which is correct.” But of course, if we interpret it this way it is no longer self referential.

So this doesn’t really do the trick, I wouldn’t think. In fact I think it illustrates the point: In order to get rid of what is wrong with this question, we do indeed have to make it not self-referential.

“A classic is “How long is a piece of string?”. Your example of “What is the answer to this question?” seems to fit that bill too.”

Those seem to have equally serious problems because they lack the relevant content. Now of course, a question need not be self-referential in order to lack the right sort of content (the piece of string example illustrates this nicely), but it seems clear enough that if a question makes reference only to its own content and to nothing else, it will lack the right sort of content. This is why, as Sandra pointed out, a question that refers to its structure might be OK (e.g. “How many words are in this sentence”), but a question that refers only to its own content (e.g. “What is the answer to THIS question?” or “What are the chances of picking the answer to THIS question from a list of five answers”) is always going to be a problem.

• Wm Tanksley September 14, 2012, 9:46 am

Glenn, I agree with you that the problem is the vacuity of the question; but I disagree that the problem is self-reference. I think it’s actually a lack of specificity in reference that makes it vague enough to imply that the multiple-choice answer set is part of the data needed to answer the question. In other words, the problem is that this question refers outside of itself.

“but a question that refers only to its own content (e.g. “What is the answer to THIS question?” or “What are the chances of picking the answer to THIS question from a list of five answers”) is always going to be a problem.”

The first of those questions is simply vacuous in the sense you introduced of not actually being a question, and would remain so if its reference were changed to “the question” or “the ultimate question” (thereby making it obviously not a self-reference). The second one is also vacuous in the same sense, but also depends on its (unspecified) answers — it’s already built as not completely self-referential.

Here is a similarly built self-referential question which I just came up with intended to be a valid multiple-choice question: “What is the probability that picking the closest-to-correct choice from a set of four randomly generated integer percentage values (0-100, uniform distribution, no duplicates) will result in the closest integer to the percentage answer to this question?”

(The question’s pretty easy to answer; I didn’t spend too much time drawing it up. It might be fun to come up with a more subtle version of the same idea.)

Anyhow, would you consider this to be “a question that refers only to its own content”, to use your criterion? I don’t see why not… Yet I think that because it’s not an empty question it’s uniquely answerable, and because it doesn’t depend on its answers at all the answers don’t fluctuate.

Hmm, actually, that’s interesting — the fun paradoxes are the ones where the question implies that the chosen answer actually changes the question’s data set.

• Glenn September 14, 2012, 4:49 pm

Wm:

The first of those questions is simply vacuous in the sense you introduced of not actually being a question, and would remain so if its reference were changed to “the question” or “the ultimate question” (thereby making it obviously not a self-reference

True, it would remain empty. But recall that I’m not saying that a question that refers only to its own content is the only kind of empty question. Instead, I’m saying that every question that does only refer to its own content will belong to the set of empty questions.

Here is a similarly built self-referential question which I just came up with intended to be a valid multiple-choice question: “What is the probability that picking the closest-to-correct choice from a set of four randomly generated integer percentage values (0-100, uniform distribution, no duplicates) will result in the closest integer to the percentage answer to this question?”

(The question’s pretty easy to answer; I didn’t spend too much time drawing it up. It might be fun to come up with a more subtle version of the same idea.)

Anyhow, would you consider this to be “a question that refers only to its own content”, to use your criterion? I don’t see why not… Yet I think that because it’s not an empty question it’s uniquely answerable, and because it doesn’t depend on its answers at all the answers don’t fluctuate.

No – this is not a question that refers only to its own content. The question refers to “a set of four randomly generated integer percentage values.” In fact, the question doesn’t refer to itself at all.

• Kristian September 17, 2012, 10:47 am

Doesn’t Wm’s question refer to it self by including this bit?: ” will result in the closest integer to the percentage answer to this question?”. In particular the “this question” part?

Regarding the multiple choice question in the blog post I saw this attempt at an answer online, what do you think Glenn?

“First assume its a multiple choice question if so:
Since there are two 25% then 25% cannot be the right answer. So, we are left with two choices and a probability of 1/2 of getting the right answer. So based on this scenario the answer would be B 50%.”

Is there anything wrong with removing options like that?

Would re-formulating the question like this change anything?:

“If you choose an answer to this question at random, from the options given below, what is the chance you will be correct?

A)1%
B)2%
C)3%
D)25%”

It seems to me this version has a perfectly valid answer yet the only change besides changing the options is making the reference to the options explicit.

But if I removed that reference(which everyone interestingly seems to take for granted as background context in the original version) surely it would still have that very same valid answer yet it would be every bit as self-referential as the original question. This indicates to me that the problem with the original is something other than its self-referential nature.

• Glenn September 17, 2012, 12:34 pm

My bad, I managed to miss that. Yes, it does refer to itself (clearly I formed a judgement based on the earlier part and skipped the latter part). And now that I see this, I see that Wm’s question has no answer. Thanks for pulling me up, Kristian.

I’ll look at your reformulation later today.

• Wm Tanksley September 18, 2012, 12:03 pm

Glenn, I actually found your answer to be fairly enlightening — I just assumed the last sentence was a minor error, not important in the grand scheme.

But now I’m confused. How did Kristian’s comment show you that my question has no answer? Despite all the problems it does have, it seems to me that it has an answer, specifically “4%”. Kristian’s comment did correctly and clearly state the main point I was trying to make, that IMPLICITLY including the answer-set in the question is the hidden source of the problem, while making the answer-set explicitly part of the question fixes the breakage by making any error in the answer set clearly an error in the problem. My random-distribution statement was an attempt to explicitly specify what the answer set would look like without actually including an answer set (but I believe I failed in my goal, since you can’t use my description to generate a complete answer set).

• Glenn September 18, 2012, 6:26 pm

Wm – Kristian’s comment alerted me to the fact that your question did refer to itself, and when I revisited it with this in mind, I noted that it is indeed vacuous and has no answer.

“What is the probability that picking the closest-to-correct choice from a set of four randomly generated integer percentage values (0-100, uniform distribution, no duplicates) will result in the closest integer to the percentage answer to this question?”

So basically, there’s a set of four numbers from 0 – 100, and one of them is closest to the answer to a blank question (since the question asks about the relationships between another set of data and its own data – which is missing.

If this were something like a piece of javascript (or other programming language), the answer would be an error, because you can’t process null.

4% is not a correct answer to this question. Nothing is.

• Wm Tanksley September 19, 2012, 4:09 am

I agree that my question refers to itself; in fact, there’s a second hidden self-reference in the phrase “closest-to-correct”. But this question is not intended to be “about” itself even though it’s self-referential; it’s intended to be about a sample of size 4 randomly drawn without replacement from 0-100. It is a self-referential question that is not supposed to be empty.

I have to wonder, though, whether it’s really phrased correctly; it might be that your lack of answer is because it truly IS a bad question, accidentally made empty by my mistaken phrasing. It’s certainly grammatically and semantically awkward; even I can see that. It’s beyond reasonable doubt that the reason it’s awkward is because I attempted to shoehorn self-reference into a perfectly normal question.

So the next reasonable question I’d want to ask is whether that awkwardness is inherent — that a “normal” question simply cannot be made self-referential without also becoming empty.

I think your answer to that question is “no” — a self-referential question is ALWAYS empty. I think the answer might be “yes” — some self-referential questions are not empty. But I admit that my once-proud “example” has failed to show that, and like Fermat I’ve run out of room to show a better proof.

-Wm

• Wm Tanksley September 19, 2012, 11:42 am

Kristian’s comment is, I think, very informative. I decided to start with his question, replace the answers with the original answers, and then see how I could rephrase it.

Interestingly, the question I come up with is also self-referential, but in a mathematical way; and it’s 100% answerable.

My first move was to remove the “answer set” so that it was a simple open-form question, not a fake multiple-choice question. Thusly: “If you choose an answer to this question at random, from the list [25% 25%, 50%, 60%], what is the chance you will be correct?”

Next I remove the reference to “this question” and “the chance you will be correct”, instead using the mathematical term “frequency” to ask about the recurrences in the data.

“How frequently do the numbers in the list [25% 25%, 50%, 60%] describe their own frequency in the set?”

The answer is “0%”. The number 25% appears with a frequency of 50%, the number 50% appears 25% of the time, and the number 60% appears 25% of the time. None of them are equal to their own frequency.

Now, to go back to Kristian’s data set: “How frequently do the numbers in the list [1% 2%, 3%, 25%] describe their own frequency in the set?” The answer is now 25% — since 1/4 of the numbers are equal to their own frequency in the set.

Now we can see a way to get a strange answer: “How frequently do the numbers in the list [50% 50%, 3%, 25%] describe their own frequency in the list?” The answer is now 75%, since 3/4 of the numbers equal their own frequency.

We can now phrase a different question that IS a multiple-choice question: “Which one of the numbers in the list [1% 2%, 3%, 25%] describes its own frequency in the list?”

What do you think? The question is now clearly self-referential in its data, and also is not self-referential in its structure. It’s no longer a paradox, I don’t think; but other sentences can be built on the same template which remain paradoxical: the Spanish Barber or the Godel sentence, for example.

-Wm

• James September 20, 2012, 5:16 pm

This all reminds me of “the least integer requiring more than ten words to describe”.

• Glenn September 20, 2012, 5:41 pm

“How frequently do the numbers in the list [25% 25%, 50%, 60%] describe their own frequency in the set?”

Wm, this is fine – and importantly different from the last question. And now it’s a question that isn’t self referential. It refers to a list of percentages and asks about facts concerning that list. So it’s not self referential.

“Which one of the numbers in the list [1% 2%, 3%, 25%] describes its own frequency in the list?”

You say that this question is self referential, but it’s not. The question is not about itself – it is about a set of data, and the question asks about features of members of that set.

Side note, independent of the above: It’s possible you’re seeing this as a self referential question because the members of the data set being referred to are being analysed in terms of how they coincide (or don’t) with features of the set as a whole. But the question is nonetheless about this set, and not about itself. What is more, the data is being treated in reference to the structure of that set. It just happens that when you’re dealing with a set of numbers, content and structure frequently coincide. (So for example, in the set {1, 2, 3, 4}, the last member of the set coincidentally tells us something about the structure of the set: it has four members.)

• Wm Tanksley September 21, 2012, 4:35 am

Actually, I agree with you — my change moved the self-reference from the _question_ to the _data_. I think I said that… And I think it’s important that the question has ALWAYS been about the data. The problem with the old question was that it was unclear, not that it was empty. As we knew, simply changing the data made the old question have an answer. I’ve merely rephrased the question to be less ambiguous, and as a result errors in the data become clearly the fault of the data.

Now, I’m drawing a distinction between “the question” and “the data”, but you’re talking about an “empty question”. What is an “empty question” empty _of_? I think the only possible answer is “data”. A question has to refer to some data in order to make an answer possible.

You’ve demonstrated some apparent questions that contain NO data and are thus unanswerable. You’ve also demonstrated some questions that contain a self-reference instead of data (and thus contain no data). You contend that the self-reference itself makes the questions unanswerable; I contend that having no data makes the questions unanswerable, and being self-referential merely makes the questions unclear.

Interestingly, the question that started this all contains data and a self-reference; but I believe the problem is twofold: the phrasing of the question is vague and full of assumptions, and the data does not correspond to the question. Here’s the original data framed by my repaired question:

“Which one of the numbers in the list [25% 50%, 60%, 25%] describes its own frequency in the list?”

This question, like the original, cannot be answered — but the reason it can’t be answered is immediately obvious, because the question is clear. Nonetheless, the reason it can’t be answered is the SAME as the reason why the first one cannot be answered. I believe that any possible set of data will produce the same answer (or nonanswer) for the original and this question — therefore the two questions are semantically identical.

If I’m wrong (and I may be), it’s because of one of our assumptions about the original — for example, that it was intended to be multiple-choice, or that the answer-choices are part of the question (I think they HAVE to be).

-Wm

• Andy November 21, 2013, 7:51 pm

Kobayashi Maru

• Glenn November 21, 2013, 8:08 pm

Andy, I almost deleted that as spam! Then I Googled it. 🙂

• Grant October 27, 2014, 6:45 pm

The key to this question is that the method of selecting the correct answer is specified – random. No systematic choice of answer is allowed. Therefore by following the specified methodology, and given the inanity of the question, whatever answer you choose is supposedly correct. However, that presents a problem. Every answer A, B, C or D, is correct provided it is selected at random. Therefore the corresponding answers should all be 100%.

• Geoff February 19, 2016, 9:02 am

HOLD THE PHONE….

Donavan worked on CSI… which is like the best TV show ever. love your work.

Also, the answer is simple, every answer is correct, therefore there is no answer. Its like saying “How-High is a chinaman”. Its a question designed to confound philosophers and provide entertaining reading for those of us who are not (because we already know :P)