Apple Says Generative AI Isn't Good At Math


Apple Says Generative AI Isn't Good At Math

Conclusions from a new Apple study might make consumers rethink using ChatGPT -- and other Generative AI tools -- to get financial advice. And it should temper the plans of bank and credit union executives to use artificial intelligence (AI) to offer financial advice and guidance to consumers.

A survey from the Motley Fool revealed some surprising -- and, frankly, hard to believe -- statistics about Americans' use of the Generative AI tool ChatGPT for financial advice. The study found that:

According to the study, the most important factors determining consumers' use ChatGPT to find financial products are: 1) the performance and accuracy of the recommendations; 2) the ability to understand logic behind the recommendations; and 3) the ability to verify information the recommendation is based on.

But is the performance, accuracy -- and very importantly -- logic behind ChatGPT's recommendations sound? Apple's report cast some doubts.

Generative AI tools can do lots of amazing things, but, as a new report from researchers at Apple demonstrates, large language models (LLMs) have some troubling limitations with "mathematical reasoning." The Apple researchers concluded:

"Current LLMs are not capable of genuine logical reasoning; instead, they attempt to replicate the reasoning steps observed in their training data. When we add a single clause that appears relevant to the question, we observe significant performance drops across all models. Importantly, we demonstrate that LLMs struggle even when provided with multiple examples of the same question or examples containing similar irrelevant information. This suggests deeper issues in their reasoning processes that cannot be easily mitigated through few-shot learning or fine-tuning."

A recent TechCrunch article documented some of the seemingly simple mathematical calculations that LLMs get wrong. As the publication wrote, "Claude can't solve basic word problems, Gemini fails to understand quadratic equations, and Llama struggles with straightforward addition."

Why can't LLMs do basic math? The problem, according to TechCrunch, is tokenization:

"The process of dividing data up into chunks (e.g., breaking the word "fantastic" into the syllables "fan," "tas," and "tic"), tokenization helps AI densely encode information. But because tokenizers -- the AI models that do the tokenizing -- don't really know what numbers are, they frequently end up destroying the relationships between digits. For example, a tokenizer might treat the number "380" as one token but represent "381" as a pair of digits ("38" and "1")."

Annoyingly, a lot of people use the term "machine learning" when referring to regression analysis or some other form of statistical analysis. According to the University of California at Berkeley, machine learning has three components:

Regression analysis and most other forms of statistical analyses lack a model optimization process.

Here's the real-world problem: While "investment" results are generally trackable, "spending" results are not. For the vast majority of people, however, how they spend is a bigger determinant of their financial performance than investing is.

The other challenge here is that we don't simply spend to optimize our financial performance. We spend to optimize our emotional performance. How is a machine learning model going to track that?

Providing financial advice and guidance is not a straight-forward simple task -- the set of instructions needed to do it requires many "clauses." In other words, the goals and objectives for establishing financial advice and guidance are not simple and straight-forward -- and it's these complex questions and instructions that Generative AI tools are not good at (according to Apple).

Bottom line: Banks and credit unions shouldn't rely on AI to provide financial advice and guidance -- right now. Maybe someday, but not now, and not for another 5, maybe 10, years. If vendors claim they're using machine learning, ask them about their model optimization process. If they claim to have a large language model, ask them how it overcomes math computation limitations.

Previous articleNext article

POPULAR CATEGORY

corporate

11004

tech

11464

entertainment

13562

research

6204

misc

14431

wellness

10987

athletics

14421