I think one of the things you’re going to see over the next few months is our leading AI companies taking steps to try and prevent distillation,
We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more,
We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here,
We know [China]-based companies – and others – are constantly trying to distill the models of leading US AI companies,
We engage in counter-measures to protect our IP, including a careful process for which frontier capabilities to include in released models, and believe as we go forward that it is critically important that we are working closely with the US government to best protect the most capable models from efforts by adversaries and competitors to take US technology,
Deepseek R1 is one of the most amazing and impressive breakthroughs I´ve ever seen - and as open source, a profound gift to the world.
There's no such thing as low cost, because the security and privacy costs are extremely high - let alone the perverted prism through which many answers will be presented
The model itself gives away a few details of how it works, but the costs of the main changes that they claim – that I understand – don’t ‘show up’ in the model itself so much,
The breakthrough is incredible – almost a ‘too good to be true’ style. The breakdown of costs is unclear,
It’s plausible to me that they can train a model with $6m,
It’s very much an open question whether DeepSeek’s claims can be taken at face value. The AI community will be digging into them and we’ll find out,
DeepSeek made R1 by taking a base model – in this case V3 – and applying some clever methods to teach that base model to think more carefully,
If they’d spend more time working on the code and reproduce the DeepSeek idea theirselves it will be better than talking on the paper,
GPT-4 finished training late 2022. There has been a lot of algorithmic and hardware improvements since 2022, driving down the cost of training a GPT-4 class model. A similar situation happened for GPT-2. At the time it was a serious undertaking to train, but now you can train it for $20 in 90 minutes,
These massive-scale models are a very recent phenomenon, so efficiencies are bound to be found,
The constraints on China's access to chips forced the DeepSeek team to train more efficient models that could still be competitive without huge compute training costs,
DeepSeek V3's training costs, while competitive, fall within historical efficiency trends,
AI models have consistently become cheaper to train over time - this isn't new,
It's overturned the long-held assumptions that many had about the computation power, the data processing that's required to innovate,
Given the limitations of purely defensive measures, it may also ramp up domestic AI investment, strengthen alliances, and refine policies to ensure it maintains leadership without unintentionally driving more nations toward China's AI ecosystem,