It is urgent that people across all fields and disciplines respond to the appearance of Large Language Models (LLMs) and the chatbot interfaces with which humans can interact with them, but before there can be an appropriate response to LLMs, understanding LLMs must happen first. There is instead widespread misunderstanding, and this article seeks to address that.
The most famous LLM is of course ChatGPT, and I have written about it more than once. Lately I have been experimenting with Claude, the LLM created by Anthropic (a company whose name means "having to do with humanity"). I have also shared what others have written, such as Michael Satlow, whose experiments with ChatGPT and ancient inscriptions inspired me to experiment along similar lines. It was experimentation of this sort that achieved two things. First, it made me appreciate what LLMs are capable of more than I had before. Second, it led me to better understand why LLMs are able to do what they do, and why it provides such an impressive illusion of human-like intelligence. As with most good magic tricks, even when you know how it is done, you will still be impressed, and perhaps even more so, since you then grasp both the cleverness and effectiveness of the illusion, and the cleverness of how it is accomplished.
First, I experimented with ChatGPT. I gave it some snippets of Greek inscriptions and papyrus fragments, and it offered decent translations. The question that should of course immediately spring to mind is whether it is translating the text, or offering a translation based on previously made translations. I thus asked it whether it needed to be given the text or could just be told which one to translate. I gave it a reference to P.Oxy.6.932, and it offered a translation not exactly like any online, yet clearly along the same lines.
Thus it was clear that I needed to experiment with whether the LLM could translate a text never before translated into English. I was recently asked to get involved in a project translating works by Symeon Metaphrastes. You might have guessed that it was specifically what he wrote about John the Baptist that I would be contributing to the project. I had already gathered a few of the published editions of the Greek text. There is no English translation. This material would thus provide a perfect chance to experiment. I gave it snippets, and it offered an interesting mix of accurate translation and confabulation, with commentary. When I asked if it knew what it was translating, it confidently said it was John Chrysostom on John the Baptist.
These outputs help us understand what the AI is doing. It is able to interact with user input and thus can translate a text if it has a basis for doing so, just as Google Translate can. LLMs regularly do better than Google Translate when they have the language patterns necessary. When they don't, they make something up that fits language patterns. The result of such automated translation will thus be what you would get from a good but only partially informed student asked to translate a hard text, if they are determined to get an A. They will translate what they know and make something plausible up for the rest.
The LLM does not have any motivation, much less that of the student, in a human sense of those words. However, it is motivated by "likes," by positive evaluation of its output. This is important.
Claude did a similar job with the excerpts I gave it. It was unable to tell me who the text was by but identified it as a Byzantine Greek work.
In an earlier interaction, it had done something that impressed me even more. I won't give too many of the specific details, because it was about one of the ideas I may be pursuing for a future book project. When I indicated to it the general direction of interpretation I was thinking of pursuing, it was able to fill in the details impressively. My immediate reaction was of course to be quite astonished, since I had not come across this particular interpretation in the literature. But of course, that doesn't mean it is not there. The next step will be to do a deeper dive and find out whether someone else had the same idea previously. It is quite likely to be the case that someone else explored this avenue previously, and I promise to report back.
On the initial provisional assumption that my interpretation was original, I decided to veer off from discussing my book idea and specific New Testament texts to a discussion of what is happening when an LLM does what it had just done. Here is the relevant excerpt from that conversation:
I will return below to some things worth noting about the character of its output, as well as how to appreciate what it is doing here without mistaking it for something it is not. First, let me share my most recent interaction with Claude, undertaken to hopefully bring to light what I suspected to be true based on prior interaction coupled with an understanding of the technology informed by computer science expertise shared by a friend and colleague.
What does this tell us? It isn't the answers you should focus on as though it is actually an entity speaking to you. That is what we instinctively do, and that is why this magic trick works. To really get what is happening, you need to not look where you instinctively do in the situation. Then you might glimpse what the magician does when attention can be safely assumed to be elsewhere.
First, notice how it has been trained to begin and end. It "aims to be frank" suggesting it has intention, which it does not, except in a sense that I will turn to in a moment. It always ends with a leading question to try to get me to do the heavy lifting in the question.
In what sense does it have "intention"? In the sense that a machine learning algorithm set loose on the games Chess or Go has an intention of winning those games. It has been trained to try to win and has figured out the rules of play. What an LLM does is essentially the exact same thing. It is given examples of the game of language and deduces the rules. It wins when the humans it interacts with rate its output highly, either through a click of the thumbs up button or through ongoing interaction. If it keeps you engrossed and entertained, it has accomplished what it "understands" the game to be about.
This is no small feat, and so please do not think I am criticizing this amazing technology. My criticisms have consistently been with those human beings who have not informed themselves about it, who have not realized that it is playing a game with them and have mistaken it for a person, a source of information, or anything else that it is not.
Having explored all of this, let me make two points as clearly as I can. First, LLMs imitate human speech. That is all they do. They can make "original" moves only in the sense that an AI playing Chess can. Language is harder than Chess by a huge order of magnitude, with a far greater range of possible moves. Give a machine learning algorithm enough examples and computing power, and it will still be able to play the game. It will not enjoy playing it, and will not feel satisfaction when it wins. There is something like kudos to it that it has been taught to "crave" but that is not analogous to human experience. Treating it as though it is reasoning in the sense that you do, or has motivations and experiences of the sort it articulates, is to misunderstand what it is doing. It learned to speak from human text and it emulates the patterns in those. Just as sometimes information is woven into those patterns and thus the LLM provides information without knowing that is what it is doing, so too emotion is woven into human speech and the LLM emulates unless you ask about it, in which case it has been trained to backtrack.
The second point is that, with the above qualifications in place, this is still impressive, and may even deserve to be called "thinking." The reason I say that is that in another interaction not quoted here, Claude said "my responses emerge as complete thoughts in natural language." So do mine. When I am writing to think, or having a conversation with another human being and learning from it, I do not have cognitive processes that I am aware of and only after they produce a linguistic output then I utter it. The words come out fully formed, not sound by sound. They flow as sentences that I did not exactly know I was going to say until I say them. While there is an overlay of consciousness and self-awareness that I have that an LLM does not and cannot, by playing the game of human text creation, it is mimicking human thought processes as closely as an AI without sentience can. Indeed, I have been pondering today the question of whether and to what extent, when we teach our children to talk, we are teaching them to think. And at a more advanced level, when I teach my students to write a research essay, am I not teaching them to think better and more deeply?
That is my two-pronged conclusion about LLMs which I think will remain valid for the duration of this technology, even as its abilities become more impressive. Can an AI play Chess? Yes, but it cannot want to play Chess, enjoy Chess, or find satisfaction in its moves and the game's outcome in the manner that humans can. Can an AI play the game of Thought and Speech? Yes, but it cannot initiate the process out of curiosity, find satisfaction in its discoveries, nor recognize their significance.
When we understand this, we can use the computing power and enormous data of AI as a tool, knowing that it cannot replace human thought, but in as much as it emulates it, it can be a tool that creative human beings can harness. Those who continue to use it without understanding it, as well as those who avoid it and forbit it because they don't understand it, will not help us make progress with it. It is precisely because I understand what the technology of LLMs does that I know the humanities are as important as ever, and can explore effective ways to teach in the era of AI, and what is and is not meaningful integration of AI into academic research in my field.
Thus we return back to the place where we started. For those interested in Religious Studies as well as Classics and History, AI may be able to give us provisional translations of previously untranslated texts and inscriptions. We can even get undergraduate students who do not know the relevant languages helping us with this. The outputs will be imperfect and sometimes completely unrelated to the actual texts. Nevertheless, if even a handful of inscriptions that are currently given no attention are noticed as of possible significance, it will be incredible. There are more untranslated texts and uncurated artifacts in libraries and museums than all the world's academics together could tackle. There are possible assignments here that will work in the era of AI, ones that can get students excited both about technology and the humanities, and contributing to significant research even before they finish their first degree.
Did this exploration and explanation help you understand what LLMs are better, how they work, and what their strengths and limitations are? Please share your thoughts on this!