Reflection 70B saga continues as training data provider releases post-mortem report - RocketNews


Reflection 70B saga continues as training data provider releases post-mortem report - RocketNews

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

On September 5th, 2024, Matt Shumer, co-founder and CEO of the startup Hyperwrite AI (also known as OthersideAI) took to the social network X to post the bombshell news that he had fine-tuned a version of Meta's open source Llama 3.1-70B into an even more performant large language model (LLM) known as Reflection 70B -- so performant, in fact, based on alleged third-party benchmarking test results he published, that it was "the world's top open-source model," according to his post.

I'm excited to announce Reflection 70B, the world's top open-source model.Trained using Reflection-Tuning, a technique developed to enable LLMs to fix their own mistakes.405B coming next week - we expect it to be the best model in the world.Built w/ @GlaiveAI.Read on ⬇️: pic.twitter.com/kZPW1plJuo -- Matt Shumer (@mattshumer_) September 5, 2024

However, shortly after its release, third-party evaluators in the AI research and hosting community struggled to reproduce the claimed results, leading to accusations of fraud.

Researchers cited discrepancies between the announced benchmark results and their independent tests, sparking a wave of criticism on social platforms such as Reddit and X.

In response to these concerns, Shumer pledged he would conduct a review of the issues alongside Sahil Chaudhary, founder of Glaive, the AI startup whose synthetic data Shumer claimed he had trained Reflection 70B on -- and which he later revealed to have invested what he called a small amount into.

Now, nearly a month later, Chaudhary last night released a post-mortem report on his Glaive AI blog about the Reflection 70B model and published resources for the open-source AI community to test the model and his training process on their own. He says while he was unable to reproduce all of the same benchmarks, he "found a bug in the initial code," resulting in several results appearing higher than what he has found on recent tests of Reflection 70B. However, other benchmark results appear higher than before -- adding to the mystery.

On September 5th, @mattshumer_ announced Reflection 70B, a model fine-tuned on top of Llama 3.1 70B, showing SoTA benchmark numbers, which was trained by me on Glaive generated data.Today, I'm sharing model artifacts to reproduce the initial claims and a post-mortem to address... -- Sahil Chau ...

Previous articleNext article

POPULAR CATEGORY

corporate

11011

tech

11464

entertainment

13565

research

6207

misc

14445

wellness

10991

athletics

14428