3.5 C
New York
Thursday, January 30, 2025
pCloud Premium

How DeepSeek stacks up against popular AI models, in three charts



20250128 ai performance 3x2 jw 9c5417

DeepSeek, until recently a little-known Chinese artificial intelligence company, has made itself the talk of the tech industry after it rolled out a series of large language models that outshone many of the world’s top AI developers.

DeepSeek released its buzziest large language model, R1, on Jan. 20. The AI assistant hit No. 1 on the Apple App Store in recent days, bumping OpenAI’s long-dominant ChatGPT down to No. 2.

Its sudden dominance — and its ability to outperform top U.S. models across a variety of benchmarks — have both sent Silicon Valley into a frenzy, especially as the Chinese company touts that its model was developed at a fraction of the cost.

The shock within U.S. tech circles has ignited a reckoning in the industry, showing that perhaps AI developers don’t need exorbitant amounts of money and resources in order to improve their models. Instead, researchers are realizing, it may be possible to make these processes efficient, both in terms of cost and energy consumption, without compromising ability.

R1 came on the heels of its previous model V3, which launched in late December. But Monday, DeepSeek released yet another high-performing AI model, Janus-Pro-7B, which is multimodal in that it can process various types of media.

Here are some features that make DeepSeek’s large language models appear so unique.

Size

Despite being developed by a smaller team with drastically less funding than the top American tech giants, DeepSeek is punching above its weight with a large, powerful model that runs just as well on fewer resources.

That’s because the AI assistant relies on a “mixture-of-experts” system to divide its large model into numerous small submodels, or “experts,” with each one specializing in handling a specific type of task or data. In contrast to the traditional approach, which uses every part of the model for every input, each submodel is activated only when its particular knowledge is relevant.

So even though V3 has a total of 671 billion parameters, or settings inside the AI model that it adjusts as it learns, it is actually only using 37 billion at a time, according to a technical report its developers published.

The company also developed a unique load-bearing strategy to ensure that no one expert is being overloaded or underloaded with work, by using more dynamic adjustments rather than a traditional penalty-based approach that can lead to worsened performance.

All these enable DeepSeek to employ a robust team of “experts” and to keep adding more, without slowing down the whole model.

It also uses a technique called inference-time compute scaling, which allows the model to adjust its computational effort up or down depending on the task at hand, rather than always running at full power. A straightforward question, for example, might only require a few metaphorical gears to turn, whereas asking for a more complex analysis might make use of the full model.

Together, these techniques make it easier to use such a large model in a much more efficient way than before.

Training cost

DeepSeek’s design also makes its models cheaper and faster to train than those of its competitors.

Even as leading tech companies in the United States continue to spend billions of dollars a year on AI, DeepSeek claims that V3 — which served as a foundation for the development of R1 — took less than $6 million and only two months to build. And due to U.S. export restrictions that limited access to the best AI computing chips, namely Nvidia’s H100s, DeepSeek was forced to build its models with Nvidia’s less-powerful H800s.

One of the company’s biggest breakthroughs is its development of a “mixed precision” framework, which uses a combination of full-precision 32-bit floating point numbers (FP32) and low-precision 8-bit numbers (FP8). The latter uses up less memory and is faster to process, but can also be less accurate.

Rather than relying only on one or the other, DeepSeek saves memory, time and money by using FP8 for most calculations, and switching to FP32 for a few key operations in which accuracy is paramount. 

Some in the field have noted that the limited resources are perhaps what forced DeepSeek to innovate, paving a path that potentially proves AI developers could be doing more with less.

Performance

Despite its relatively modest means, DeepSeek’s scores on benchmarks keep pace with the latest cutting-edge models from top AI developers in the United States.

R1 is nearly neck and neck with OpenAI’s o1 model in the artificial analysis quality index, an independent AI analysis ranking. R1 is already beating a range of other models including Google’s Gemini 2.0 Flash, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.3-70B and OpenAI’s GPT-4o.

One of its core features is its ability to explain its thinking through chain-of-thought reasoning, which is intended to break complex tasks into smaller steps. This method enables the model to backtrack and revise earlier steps — mimicking human thinking — while allowing users to also follow its rationale.

V3 was also performing on par with Claude 3.5 Sonnet upon its release last month. The model, which preceded R1, had outscored GPT-4o, Llama 3.3-70B and Alibaba’s Qwen2.5-72B, China’s previous leading AI model.

Meanwhile, DeepSeek claims its newest Janus-Pro-7B surpassed OpenAI’s DALL-E and Stable Diffusion’s 3 Medium in multiple benchmarks.




Source link

Odisha Expo
Odisha Expohttps://www.odishaexpo.com
Odisha Expo is one of the Largest News Aggregator of Odisha, Stay Updated about the latest news with Odisha Expo from around the world. Stay hooked for more updates.

Related Articles

Stay Connected

0FansLike
0FollowersFollow
0SubscribersSubscribe
Best Lifetime Deals on SaaSspot_img

Latest Articles

FireAid Concert: Billie Eilish, No Doubt, and more unite for California wildfire relief –...

0
NEW YORK: Two major fires in and around Los Angeles are burning for a third week, destroying more than 14,000 structures and displacing...

Garth dismisses Bouchier in 'perfect start' for Australia

0
Kim Garth gives Australia the 'perfect start' by dismissing England's Maia Bouchier inside the first over of the Women's Ashes Test in Melbourne. Source...

EarthCam video shows explosion over Potomac River

0
IE 11 is not supported. For an optimal experience visit our site on another browser.Now PlayingWatch: EarthCam video shows explosion over Potomac River00:23UP...

Harry Kane Champions League Dream at Risk Due to Bayern Munich Major Weakness, Warns...

0
Owen Hargreaves warns that Harry Kane Champions League aspirations could be hindered by Bayern Munich defensive weaknesses. Despite having one of the most...

People with ties to Musk and a top FBI critic are advising FBI director

0
A person who has been affiliated with Elon Musk’s SpaceX and a former aide to one of the FBI’s toughest Congressional critics are working as advisers...