Set as Homepage - Add to Favorites

【Jerome Deeds Archives】

Source：Wisdom Convergence Information Network Editor：Games Time：2025-06-27 09:38:09

DeepSeek has released a new paper,Jerome Deeds Archives with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]

1
2
3
4
5
6
7
8
9
10
11

Previous：Texas vs. Arizona State football livestreams: kickoff time, streaming deals, and more

Next：Over 25 beauty deals under $25 to shop during Amazon's Big Spring Sale

Related Articles

Related Recommendations

Categories

Latest Articles

Popular Articles

Hot Recommendations

Featured Column

Quick Links

How Tastemade has eaten the internet with Facebook and Snapchat 'Persona' spins off with two rhythm games and a new 3DS game The LG V30 will have a huge, 6 The LG V30 will have a huge, 6 Facebook is turning up the heat on media companies yet again George and Amal Clooney to help 3,000 Syrian refugee children go to school I'm Littlefinger from 'Game of Thrones' and I'm weird as shit The Reese's Peanut Butter Doughnut is about to hit Krispy Kreme Microsoft just dropped three cool new Xbox One controllers August is a great month for skywatching: How to make the most of it Before Kim Kardashian, there was Angelyne. Now her identity has been revealed.Clothing brands are making dedicated AirPod pockets now, if that makes sense to you Instagram Stories is 1 year old and still dominating Star Trek: Discovery: Michael Burnham's connection to Spock and Sarek explained 'Confederate' isn't the only post Facebook's smart speaker for video calling sounds really creepy How Tastemade has eaten the internet with Facebook and Snapchat Latest 'Final Fantasy XV' update lets you dress up like an invincible badass Mophie finally made a battery that can charge a laptop Amazon Jobs Day is bleak as hell Peloton will reportedly halt making... basically everything, including all its bikes and treads ChatGPT rolls out voice and image capabilities Tatiana Trouvé’s “Desire Lines” Finds Art in Central Park 'The Office' reboot is a good idea — if Michael, Jim, Dwight, and Pam aren't in it The Sound of Sound: Two Remembrances of Ornette Coleman Juan Felipe Herrera and Tomato Spotify's new Jam feature will let you listen to shared playlists in real time Shrek's swamp is coming to Airbnb In My Copious Free Time... TikTok is making 'Euphoria' fanfiction now On Taylor Swift’s Passive Cuteness for Fun and Profit Spotify pilots AI voice translation for podcasts Redditors can earn real money for good posts now How Psychoanalysis Helped John Berryman’s Poetry Remembering James Salter: On His Essay “The Skiing Life” The Treasure Maps of Pamela Singh Having Trouble Sleeping? Read the Ultimate Insomnia Cure. Best Echo deal: The Amazon Echo Pop is 70% off plus a month of Amazon Music Unlimited Richard McGuire on “Here,” His Groundbreaking Graphic Novel

3.0148s , 10098.3125 kb

Copyright © 2025 Powered by 【Jerome Deeds Archives】,Wisdom Convergence Information Network

Top