Improving Generative AI Thought Patterns To Deliver Smarter Results Via Meta's Thought Preference Optimization

In today's column, I examine a newly published approach to improving generative AI and large language models or LLMs which relies on the longstanding adage for humans that it is best to think before you act. Notably, not only are humans urged to think before they act, but they are further encouraged to continually improve their thinking processes so that they tend to think better each time that they think before they act.

You might say we aim to optimize the thought process.

The premise in an AI context is that if you have generative AI do sufficient pre-processing to logic out a potential response there is a heightened chance that the generated answer will be better. The technique of pre-preprocessing for garnering better answers is something that I've covered extensively and is widely known as chain-of-thought or CoT reasoning for AI, see the link here and the link here.

An intriguing novel twist is depicted in a new AI research paper that entails having generative AI do essentially supplemental data training on internally devised chains of thought and aim to improve the strength of the CoT capability. Envision this phenomenon as a method of keeping track of the logic used to produce answers, and then collectively using those instances to try and improve the logic production overall. A human might do likewise by reviewing their reasoning over and over again, desirous of gradually and ultimately bolstering their reasoning capacity.

Let's talk about it.

This analysis of an innovative proposition is part of my ongoing Forbes.com column coverage on the latest in AI including identifying and explaining various impactful AI complexities (see the link here).

Do you remember being in school and your teacher insisting that you show your work when solving test problems?

I'm sure you do, perhaps anxiously so.

A notable reason to show your work was so that the teacher could see the logic that you used to arrive at your answer. If you got the answer wrong, the teacher might at least give you partial credit based on whether your work demonstrated that you partially knew how to solve the problem at hand. Of course, this also helped in catching cheaters who weren't actually solving problems and instead were sneakily copying from their seated neighbors. I'll put aside that cheating consideration and focus solely on the hoped-for learning outcomes.

A bonus basis for showing your work is that it might aid you in learning how to get better at employing logic and thinking through a problem. The belief is that the more you write down the steps you've undertaken, there is a solid chance you will get better at coming up with the right steps. Generally, you can improve your overarching problem-solving prowess by repeatedly inspecting your work and refining your employment of logical reasoning.

Keep this in mind as we shift into leveraging the notion for purposes of advancing AI.

When using generative AI, you can get the AI to showcase its work by telling the AI to do stepwise processing and identify how an answer is being derived. This is customarily referred to as chain-of-thought processing or CoT. In a sense, the logical steps for reasoning about a problem can be specified as a chain or series of thoughts that are taking place.

I am leery of the now common catchphrase "chain-of-thought" in the AI field because it includes the word "thought" as though generative AI is so-called "thinking". Those kinds of fundamental words are best reserved for human mental endeavors. By parlaying them into the AI realm, this lamentedly is an insidious form of anthropomorphizing AI. It gives the impression that AI has thoughts and thinks on par with humans. That's not the case and it is sad and misleading that these phrases are being used in an AI context (see my detailed discussion at the link here).

Anyway, it has become the norm to use them, and I reluctantly go along, but ask that you keep separate the preposition of how these words apply to AI versus to human cognition, thanks.

Let's look at an example that illustrates the idea of showing work while AI is solving a problem. I want generative AI to help me with an upcoming trip, so I logged in and asked about potential travel plans. To get the presumed chain-of-thought, I mentioned in my prompt that I want to see the logic employed.

Here we go.

The answer by the generative AI was that I should take the train to get from San Francisco to New York City. Well, that might be fun to do if I had plenty of time and relished train travel, but the answer doesn't seem very good when under pressure or having other requirements about the journey.

I'm glad that I asked to see the logic or chain-of-thought. You can inspect the logic and see some key assumptions made by the AI. Rather questionable, I say.

One way around the somewhat off-target or semi-flawed answer would be for me to tell the AI that the logic portrayed is not very solid. I could then give the AI the logic that I want it to use. Presumably, I would end up with a better answer.

Let's instead lean into a classic bit of wisdom that it is often better to guide toward how to fish rather than doing the fishing for a circumstance at hand. I will tell the AI to review its answer and assess the logic used.

I tried this.

Aha, nicely, the AI identified weaknesses in the logic that had been used. I will prod the AI into redoing the travel planning and ask for better logic based on having discovered that the prior logic was weak.

Here it is.

I would say that the new answer is better since it brings up the importance of several factors including time, cost, and convenience.

The logic is much better too.

We have done something of grand significance. A better answer was derived by the AI and by all appearances this was due to bolstering the underlying logic that was used. I didn't have to lift a finger about redoing the logic. Instead, I merely prodded the AI into revisiting the logic and redoing the logic.

I'm reassured and excited that the answer to my travel question was definitely improved.

The thing is, I want the AI to always employ better logic, not just for the one question about traveling from San Francisco to New York City.

Here's what I will tell the AI to do.

You can see that I opted to focus on just travel-related problems.

Expand that scope and imagine that we want generative AI to inspect the logic or chain-of-thought being used and always try to improve upon it, across all kinds of problems. To get this to happen on a longstanding basis, we could exercise the AI with lots and lots of problems and get the AI to review the logic again and again. The aim would be to have the AI persistently get better at devising underlying logic.

A related facet is whether the AI will be able to sufficiently judge or assess the logic that it is using. There is a chance that the AI won't admit to having weak logic or might not be able to detect when the logic is poor. We could craft a separate component that will somewhat independently assess or judge the logic. Those assessments could be fed into the AI for purposes of then guiding which logic is better or worse than other logic.

There is an additional and quite interesting angle that bears a reflective moment.

Generative AI is data trained by scanning across the Internet and examining lots and lots of data. During this data training, the use of mathematical and computational pattern matching is performed. Human writing is then closely patterned. When you use generative AI, the pattern-matching computationally mimics how humans write. Voila, you get the amazing semblance of fluency that occurs while using generative AI and large language models.

I'll ask you a provocative question of a somewhat mind-bending nature.

Does all that data scanned across the Internet tend to contain the logic underlying whatever is stated or is the logic not necessarily readily accompanying the content found?

Give that a moment of reflection.

I would wager that much if not most of what you might find online would almost certainly not be accompanied by the logic or logical basis for whatever is being stated. Unless you perchance come across an online textbook of mathematical proofs, you aren't bound to see the logic employed. Furthermore, as an aside, even if people do show their logic, we might be suspicious as to whether the logic they show is coherent or complete.

The gist is this.

There is a low likelihood of being able to data train generative AI at the get-go on the logic of humans because the data source of the Internet tends to omit the logic that might have been employed. As such, you might have to find another way to get the logic, other than hoping it will simply be sitting out there on the Internet and tied to whatever problems or answers are here or there.

You can proceed in a sense to create synthetic logic, meaning after-the-fact logic that presumably underlies how something is solved or figured out. The chain-of-thought that you get generative AI to showcase could be construed as just that, namely synthetic logic. It isn't the logic per se that a human necessarily used or patterned on, instead, it is derived logic that comes after the fact.

A recent AI research study by researchers at Meta, University of California Berkeley, and New York University came up with a novel methodology that they refer to as Thought Preference Optimization or TPO to do something along the lines of what I have been noting.

The study is entitled "Thinking LLMs: General Instruction Following With Thought Generation" by Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, and Sainbayar Sukhbaatar, arXiv, October 14, 2024, and made these key points (excerpts):

This is an insightful study that seeks to explore and implement many of the facets that I mentioned here.

Their initial results suggest increased performance on selected benchmarks. The beauty too is that the heightened performance seems to occur across numerous domains. I mention this because there are CoT-related studies that have focused on specific domains, such as finance, medicine, the law, and other niches, which is great, but having an approach that appears to provide across-the-board improvements is equally vital, if not more so.

As they say, rising tides give rise to all boats.

I'll be eager to see if other AI researchers are able to replicate their results, plus make use of additional benchmarks to see the gamut of what these improvements might provide. Beyond trying this on Meta's Llama, it would be significant to use other generative AI models such as ChatGPT, GPT-4o, o1, Claude, Gemini, and so on.

Lots of work yet to do, and lots of exciting opportunities awaiting.

A few final thoughts for now.

Warren Buffett famously said this about thinking: "There is nothing like writing to force you to think and get your thoughts straight."

Returning to my point about showing your work during your schooldays, you must admit that writing down your logic was a means of forcing you to get your mind straight. Maybe it was painful and maybe you got dinged at times for making mistakes, but I dare say you are better off for it.

One twist is whether we truly think in the explicitly noted logic-based terms that we write down. Do you really think based on A leads to B, and B leads to C? Or is that a made-up rationalization that we are taught to abide by? Perhaps our brains work in some totally different way. Society is stridently forcing us to pretend that we think in a logical way, even though maybe we don't, or we use some other logic entirely.

The reason that matters is that we seem to be doing the same forcefulness to generative AI. Yes, we are forcing AI to abide by the logic that we also believe humans rationally are supposed to use. What if that's not what will ultimately get us to full AI or artificial general intelligence, referred to as AGI?

Makes you think.

The last word goes to Warren Buffett: "I insist on a lot of time being spent, almost every day, to just sit and think."

Yes, indeed, let's make sure we give plenty of thinking time toward AI and advances in AI. Go find a quiet place to think about it. Your thoughts might make all the difference in the world.

Improving Generative AI Thought Patterns To Deliver Smarter Results Via Meta's Thought Preference Optimization

POPULAR CATEGORY

corporate

tech

entertainment

research

misc

wellness

athletics