What It Was Like To Judge That History-Making Turing Test

What It Was Like To Judge That History-Making Turing Test

Earlier this week, a computer program passed the Turing Test for the first time in a controversial win for chatbots everywhere. Actor Robert Llewellyn was one of the judges for the test, and he gave Gizmodo an inside look at the strange experience of thinking a computer program was a real live boy.

Llewellyn is best known for playing the mechanoid Kryton in the British sitcom Red Dwarf, and writes frequently about science, sci-fi and technology at his personal site. He’s not a trained computer scientist though, which makes his read of the chatbot posing as a 13-year-old Ukranian boy named Eugene more in line with what any of ours might have been.

Speaking to Gizmodo via email while filming in Barcelona, Llewellyn told us how he got to be a judge for this round of the Turing Test in the first place:

I have had a lot of contact with Professor Kevin Warwick from Reading University who organised the test, I interviewed him at length when I was writing a tech comedy novel called ‘Brother Nature’ which featured a woman working in advanced robotics who uses her renegade brother as a guinea pig with disastrous results.

He was very kind and patient with me and I was fascinated by the work he has been doing. He was the first human being to have a chip inserted into his arm which was directly connected to his central nervous system. Barking mad, dangerous but fascinating. He survived and his involvement in the recent Turing Test is the reason I was invited. It may also have had something to do with the fact that I’ve portrayed a mechanoid in Red Dwarf for the past 25 years.

Were you one of the people who thought the chatbot was a computer or a did you think it was human?

This question has confused me greatly. It implies that the text on the two screens I was dealing with were both being generated by humans, not quite what I thought I was experiencing. If, alternatively, you mean was I thrown by the responses and chose the computer responses as the human ones 6 times out of 10, then yes, I am that fool. Or maybe not quite so foolish, it was very difficult to tell.

What made you think it was real or fake?

As I have already said, there were two ‘chat boxes’ on the screen, the judges asked questions and these received responses from either a human being hidden in another part of the building, or a computer. Our job was merely to try and work out which was which. The 1st test was easy, the answers from the computer were obvious, dull, long and dry. After that it became harder and harder to tell, the human responses were often a bit dull, some of the humans answering were quite young, high school students of 17 and 18, some adults, not all native English speakers. The software running on the computers was very sophisticated, understood slang and colloquialisms and made typos, answered in a jokey and informal manner and ran rings around me.

Finally, can you give us a broad sense of what the test was like?

It was quite hard work, 10 sessions of 5 minutes over an hour period, a lot of typing on a keyboard I wasn’t used to and often 2 answers coming back at the same time. I was only one of 5 judges, the rest of them high flying academics and a member of the House of Lords. Basically I was the thick one, they were all clever but the computer fooled them too.