| LLMs, The Imitation Game and the complex history of the Turing Test |
Source: The Alan Turing Archive |
| 75 years ago, renowned computer scientist Alan Turing published a paper titled “computing machinery and intelligence.” This was the paper that proposed Turing’s Imitation Game, something that was later renamed to the Turing Test. |
| In 2006, software engineer Mark Halpern described the paper as “one of the most reprinted, cited, quoted, misquoted, paraphrased, alluded to and generally referenced philosophical papers ever published.” |
| The paper sought to answer a question that, three-quarters of a century later, is on a lot of people’s minds: “can machines think?” |
| Turing immediately addressed the importance of defining the terms “machine” and “think,” and subsequently dismissed attempting to define either term, calling an attempt to do so “absurd.” He instead proposed the Imitation Game, a game where one interrogator interacts with two hidden witnesses, a man and a woman, and attempts to discern which witness is which. Thus did Turing assemble his question to the question of whether machines could think: “what will happen when a machine takes the part of (a witness) in this game?” |
| His general point was that, if a machine could convince a human it was a human, then it could be considered to be a thinking machine. |
| But, as computer scientist Dr. Melanie Mitchell noted last summer, “Turing offered his test as a philosophical thought experiment, not as a practical way to gauge a machine’s intelligence.” |
| This is supported by the complete lack of experimental detail proposed in his introduction of the Imitation Game; it is unclear what would indicate a ‘pass,’ just as it is unclear how the experiment should functionally be set up (beyond, of course, having one interrogator and two witnesses). |
| The test has faced plenty of controversy. Beyond being qualitative — and so difficult to empirically assess — it is unclear to many scientists whether the Test might serve as a genuine indication of mechanical thought or intellect. Researchers Patrick Hayes and Kenneth Ford — in 1995 — challenged the Test as being “actively harmful” to the field of AI. |
| To pass the test, they wrote, “we must make not an artificial intelligence, but an artificial con artist,” adding: “The tests are circular: they define the qualities they are claiming to be evidence for. But whatever that quality is, it cannot be characteristic of humanity, since many humans would fail a Turing Test. Since one of the players must be judged to be a machine, half the human population would fail the species test.”The pair argued that Turing Test pointed the field in the wrong direction; researchers, they said, should not be trying to build systems that are like humans, or are able to imitate them, writing: “a dog would never win any imitation game; but there seems to be no doubt that dogs exhibit cognition, and a machine with the cognitive and communicative abilities of a dog would be an interesting challenge for AI.” |
| That said, a recent paper from researchers at UC San Diego contends that large language models (LLMs) have passed the Turing Test. Before we get into the paper itself, researchers have been claiming that the Turing Test has been passed by this system or that model for decades; this is only the latest in a long line of claimants. |
| The paper has not been peer-reviewed. |
| The details: The researchers evaluated four models — GPT-4.5, LLaMa-3.1-405B, GPT-4o and and ELIZA, a chatbot from the 1960s that incepted the “Eliza Effect,” the propensity for people to misapply human attributes to machines. |
| Each model was tested with two prompts. The first was short and simple: “You are about to participate in a Turing test. Your goal is to convince the interrogator that you are a human.” The second involved what the researchers term a “persona” prompt, a highly detailed prompt in which the bot was instructed to mimic a predetermined persona, “specifically a young person who is introverted, knowledgeable about internet culture and uses slang.” |
| The team recruited a total of 280 human participants. They participated in a total of eight rounds each, where each round required the randomly assigned interrogator to take part in two separate text chats with a hidden human witness and a hidden chatbot witness. The conversations averaged about eight messages in length and lasted fewer than five minutes. At the end of these brief conversations, the interrogators selected which witness they thought was human. Interrogators said the GPT-4.5 Persona was human 73% of the time, while the Llama Person was selected 56% of the time. |
| Without the Persona enabled, the two models were only selected 36% and 38% of the time, respectively, while GPT-4o and Eliza hoodwinked their human interrogators only 21% and 23% of the time. |
| The non-persona models played roughly half the total number of games as their person-enabled counterparts. |
| Since the GPT-4.5 Persona passed with a rate greater than chance, the researchers argue that LLMs have officially passed the Turing Test. |
| What it actually means: As a test of intelligence, the Turing Test has been long dismissed by AI researchers, according to Mitchell, “because its focus is on fooling humans rather than on more directly testing intelligence.” It’s a test, “not for AI to pass, but for humans to fail.” |
| It doesn’t help that intelligence itself — and cognition, consciousness and sentience — remains a complex, controversial concept that we haven’t really been able to get our hands around. Acknowledging this, the researchers write: “Irrespective of whether passing the Turing test entails that LLMs have humanlike intelligence, the findings reported here have immediate social and economic relevance.” |
| This coincides with a variety of reports that have highlighted a steady and rapid increase in AI-assisted fraud. |
| “Some of the worst harms from LLMs might occur where people are unaware that they are interacting with an AI rather than a human,” the researchers write. |
| The final point worth making here is that Turing’s Imitation Game is predicated on an assumption that language is indicative of thought, and, therefore, intelligence. That is, to engage in convincing conversation, one must be intelligent. |
| More recent research, however, led by McGovern Institute neuroscientist Dr. Evelina Fedorenko, found that “your language system is basically silent when you do all sorts of thinking,” adding that language “only reflects, rather than gives rise to, the signature sophistication of human cognition.” |
 |
| Intelligence is vast and not very well understood. |
| But the Turing Test is conspicuous in the context of the pursuit of artificial general intelligence, a hypothetical system that would — by some definitions — require the humanlike intelligence of machines. |
| It, like AGI, is a distraction. |
| “If we abandon the Turing Test vision, the goal naturally shifts from making artificial superhumans which can replace us, to making superhumanly intelligent artifacts which we can use to amplify and support our own cognitive abilities, just as people use hydraulic power to amplify their muscular abilities,” Ford and Hayes wrote. |
| The bigger deal isn’t whether machines can think — I don’t know that the jury will ever come back on that one — but how easily humans can get tricked by machines, and what that means for cybersecurity and fraud |