In early 2025, DeepSeek’s announcement about their latest AI model abruptly caught the attention of the entire tech world. Their far-fetched claims of achieving comparable performance to industry giants at a fraction of the cost, combined with their unprecedented transparency in releasing their full model architecture and training process, stunned both AI giants and everyday LLM users alike. In an era where AI development costs routinely run into the billions, their assertions were naturally met with intense scrutiny.
As the Marketing Coordinator of Xyonix and an individual with a non-technical background, I’m in the unique position to be surrounded by expert data scientists 5 days a week. I found myself both intrigued and slightly skeptical about these claims, having heard everything from the overwhelmingly optimistic take to the cyber-cynical one. To separate hype from reality, I sat down with Carsten Tusk, one of our Principal Data Scientists and co-founder, whose decades of experience in AI development provided a refreshingly grounded perspective on DeepSeek's achievements and their implications for the future of AI.
Breaking the Bank (Or Not)
Given that AI projects typically demand billions of dollars, DeepSeek's announcement caught me off guard. They stated that training their newest model required just $6 million–to put this into perspective, it's less than what was paid for a modern art piece of a banana duct-taped to a wall (1). When I first read these numbers in their research paper they seemed far too good to be true.
As I discussed this with Carsten, his explanation revealed why this research has captured so much attention. “I think it's because they published the numbers of what it took to train the model,” Carsten explains, “they are like significantly lower than what was put into models by let’s say OpenAI, Microsoft, or Google. And so the whole hype behind AI and NVIDIA's rise to fame is that we need billions for chips and data centers…and here comes this tiny company saying, ‘We did this for 6 million, and it's on par with the performance of your large-scale models.’”
However, the “$6 million miracle” requires some important context. As the comparatively modest training cost makes headlines, Carsten reveals a crucial and often overlooked detail: DeepSeek's GPU cluster itself most likely cost over $150 million. “The training cost is $6M in electricity, but buying the thing that you can run it on is another $150M,” he notes.
In other words, the headline-grabbing power bill represents just a tiny fraction of what it actually takes to build and maintain the high-performance computing infrastructure necessary for training large language models. Research from SemiAnalysis, an independent firm tracking AI infrastructure investments, paints a more complete picture: DeepSeek's hardware investments alone exceed $500 million over their history, with total infrastructure costs projected to reach ~$1.3 billion when accounting for data centers and GPU expansion plans (5).
While we celebrate a $6 million training breakthrough, it still exists within an ecosystem of billion-dollar investments. Yes, DeepSeek's optimizations in training algorithms and reinforcement learning approaches are impressive and important. But they're like finding a way to save money on fuel for a rocket – significant, but ultimately just one cost factor in a much larger, much more expensive operation.
Democratizing AI Development with Open-Weighted Models
When DeepSeek announced they were making their model weights public, I initially didn't grasp the significance. Like many, I've heard the term “open source” thrown around in AI discussions before. But as Carsten explained, this was different – and potentially revolutionary:
“That's a really big deal,” he emphasized, “if you have the weights open, then anybody can run that model. You can leverage the $6M that DeepSeek put into the training of that model. You grab the weights, you run it for yourself.”
To understand why this matters, let’s go back to this age-old adage: “Give a man a fish, and you feed him for a day; teach him to fish, and you feed him for a lifetime.” Open source means that you’re giving the man a fish (the code, in this instance) but open weights mean you’re teaching him to fish – you’re imparting the deeper, experiential knowledge of the model's learning, all of the patterns and connections it has made. But DeepSeek went even further; they published detailed documentation of their entire training process. It's like that master fisherman not only teaching you; but also revealing every detail of HOW they developed their angling expertise.
This level of transparency stands in stark contrast to the approach taken by companies like OpenAI, where technical reports can often read more like marketing materials than scientific documentation, crucial details conspicuously absent. It's a difference that resonates deeply with me as someone who works in marketing – there's marketing that illuminates, and marketing that obscures.
Such comprehensive disclosure carries some pretty profound implications for researchers. Scientists and developers can now not only USE DeepSeek's model but also 1) understand exactly how it was created, 2) verify its training methodology, 3) and potentially improve upon it. It's an approach that harkens back to the fundamental principles of scientific progress – that knowledge should be shared, verified, and built upon by the community.
But What’s the Reality of Running These Models?
DeepSeek’s emphasis on the democratization of AI technology sparked my curiosity about its practical applications. And when Carsten sent me an article prior to our chat detailing someone who had managed to run the entire model on a custom-built $6,000 PC, it seemed too good to be true. Turns out, it was (4).
“You have to take that with a grain of salt,” Carsten explained when we had a chance to chat about it. “Yes, it runs on that hardware. No, it's not very fast. It generates six to eight tokens per second.”
Generating only about ~5 words per second, this model falls short of modern LLMs that produce around 40 tokens (~31 words) per second. As Carsten points out, “You can play with it personally...You can't use that to, let's say, run a business or have more than one user on it.”
This reality check doesn't diminish their achievement. Rather, it highlights another crucial aspect of AI democratization. While running these models may technically be feasible on consumer hardware, practical deployment still requires significant computational resources. But the ability to experiment with these models, even at reduced speeds, opens up new possibilities for researchers and developers who might not have access to industrial-scale computing resources. They can test ideas, modify the model for specific applications, and could contribute to the broader understanding of AI capabilities and limitations.
DeepSeek’s Experimental Take on Reinforcement Learning
Our conversation turns to one of DeepSeek's seemingly more intriguing innovations: their experimental use of pure reinforcement learning without supervised fine-tuning. As someone without a deep understanding of AI terminology, I asked Carsten to break down what this really means.
Reinforcement learning, he explained, is like teaching through trial and error with a reward system. “You give it an example, you rate the results and come back and you give them a score, a reward,” Carsten described. “You reward the good results more than the bad results and it learns…based on that, what is good and what is bad.”
He elaborated that this differs from the traditional approach of supervised fine-tuning. As Carsten put it, “Supervised data is if you have a human generate examples. For example, the human says, here's the question, here's the answer. Somebody supervised that and said, ‘this is a training sample.’” The challenge with supervised learning, I found out, often comes down to cost and scale.
“Generating supervised data is expensive,” Carsten informed me. “Let's say you have human annotators...that is really labor intensive. So generating lots and lots and lots of supervised data is very expensive.”
DeepSeek's experiment with pure reinforcement learning aimed to bypass this limitation. However, their journey proved a bit more complex than they’d initially hoped. “That was the first experiment with DeepSeek v3.0,” Carsten noted. “They used pure reinforcement learning to train it just to prove a point and to see if this can be done... but the model wasn't quite on par with the real model.”
In their final version, DeepSeek ultimately returned to a hybrid approach. “What they did in their final version is they actually did use supervised fine-tuning,” Carsten explained. “They have this cold start data that they bootstrap the model with. And also the whole model doesn't come from nowhere. They used their DeepSeek base model that was trained on regular 14 trillion tokens to bootstrap it.”
This approach reveals what I believe is one of the more fascinating aspects of modern AI development: new models rarely start as blank slates. Instead, they build upon existing foundations, i.e., bootstrapping. DeepSeek also used what Carsten called “cold start data” – an initial dataset that helps the model begin learning with some basic understanding already in place. This initial knowledge base gives the model a foundation to build upon, making the learning process more efficient and effective.
The real breakthrough, then, wasn't in eliminating supervised learning entirely, but in demonstrating the potential of reinforcement learning in specific contexts. “What they've shown us is really the power of reinforcement learning, if you have the right applications and data…because you can artificially generate data,” Carsten reflected. This approach particularly shines in domains with clear right or wrong answers – like math, where you can automatically verify results.
“There's one problem with [pure reinforcement learning],” Carsten cautioned, highlighting an important catch. “It only works in cases where you can really make a judgment whether or not the response of the model is actually good because you need that reward function, and you need to be able to say ‘this is a good response’ and ‘this is a bad response.’”
This limitation makes reinforcement learning less suitable for tasks involving subjective judgments or nuanced responses.
AGI: Going Beyond Mere “Reasoning” & Pattern Recognition
Carsten and I also discussed an interview with DeepSeek’s founder, Liangwen Feng, in which he stated that achieving artificial general intelligence (AGI) was their ultimate goal (3). I've personally always found the concept of AGI a little fascinating, but ultimately quite unsettling. What exactly would it mean for a machine to possess general intelligence comparable to a human's?
Carsten's initial response was measured but decisive. “These models are good. They're impressive, but they're not what I would call AGI. Far from it.” He points to fundamental limitations in current transformer-based architectures. “Researchers have already shown that there are mathematically hard proven limits, what these transformer models with attention can actually solve. And there's a class of problems they can't solve. And while that is the case, you can scale it as far as you want,
that theoretical limit that has been proven will not be overcome by that. We definitely need a different architecture if we want to really solve these issues.”
However when I asked whether a future with artificial general intelligence is something we even want, Carsten threw an inquiry back at me, asking, “Do you think AGI is even possible?”
The question made me hesitate and reflect. For a neophyte like me, AGI would mean artificial intelligence with some semblance of consciousness or sentience - the ability to truly think and reason independently, beyond just responding to queries. But even as I mentally form this definition, I find myself questioning it–how do you measure consciousness? How do you put a label on sentience?
This prompts Carsten to steer our conversation into deeper philosophical waters. “It boils down to the question: is there something more than the biochemical, physical aspect of human beings, their conscience or not?” Carsten admits, “I don't have an answer to that, to be honest. That is a very philosophical question.”
The midst of this philosophical uncertainty we find ourselves in stands in stark contrast to the current reality of AI technology. As Carsten explains, today's most advanced models, including DeepSeek's, are essentially sophisticated pattern recognition systems. "Let's face it,” he says, "it's still a pattern recognition thing. We show them all the possible patterns in the world and they will assign a likelihood…There's no real reasoning there, there’s no real thinking there. It's just recombobulating what was fed into it."
Even on the more practical side of AGI development, the challenges are immense. While we can study and attempt to replicate aspects of human neural networks, we're still far from understanding the full complexity of human intelligence. As Carsten notes, even if we focus purely on the biomechanical aspects, "We can figure out why our brains work the way they are. We can construct artificial neural networks that are similar to that... but even that is a fanfiction pipe dream right now."
When I admit that I'm personally skeptical about the possibility of true AGI, particularly the consciousness aspect, Carsten's response reflects the humility that perhaps more of us in the AI field should embrace. The gap between our current pattern-predicting algorithms and genuine intelligence remains vast, and the philosophical questions about consciousness and intelligence continue to challenge our assumptions about what AGI would really look like.
Looking Forward
When I ask Carsten about the broader implications of DeepSeek's work, his perspective challenges the prevailing narrative.
"Honestly, I don't really share that hype," he says. "Here's a research group in China that came up with a good, efficient way of training a model…
They're not starting from scratch, right? They're literally building on the success of all the models that OpenAI and everybody else have developed."
Yet he definitely acknowledges the value in DeepSeek’s contribution. "They've shown us that if you have these preconditions, if you have a space where reinforcement learning can be applied, you can have very, very good success with smaller models."
As our discussion concludes, it's clear that while DeepSeek's achievements are significant, they represent an incremental step rather than a revolutionary leap. The path to more advanced AI systems remains complex, likely requiring more than just clever optimization of existing techniques, but instead potentially fundamental breakthroughs in network architecture.
For now, as Carsten reminds us, these models remain sophisticated pattern recognition systems rather than truly intelligent entities. And in an era of inflammatory AI headlines, sometimes the most valuable insights come from those who can appreciate the progress while maintaining a clear-eyed view of how far we still have to go.
Check out the full video here:
Curious About the Realities of AI Innovation?
At Xyonix, our expert AI team cuts through the hype to deliver real, practical insights on cutting-edge advancements like DeepSeek’s latest model. With 20+ years of experience, we help businesses navigate AI’s evolving landscape—turning research breakthroughs into actionable strategies.
Let’s separate fact from fiction together. Schedule a free consultation today and discover how Xyonix can help you integrate AI innovations into real-world solutions:
Discover how the Xyonix Pathfinder process can help you identify opportunities, streamline operations, and deliver personalized experiences that leave a lasting impact.
Check out some related articles:
Sources:
NBC News. (2025). Viral duct-taped banana sells for $6 million at auction. NBC News. https://www.nbcnews.com/news/us-news/viral-duct-taped-banana-sells-6-million-auction-rcna180564
Castelvecchi, D. (2025, January 31). Chatbot software begins to face fundamental limitations. Quanta Magazine. https://www.quantamagazine.org/chatbot-software-begins-to-face-fundamental-limitations-20250131/
The China Academy. (n.d.). Interview with DeepSeek founder: “We’re done following. It’s time to lead.” The China Academy. https://thechinaacademy.org/interview-with-deepseek-founder-were-done-following-its-time-to-lead/
PC Gamer. (2025). Today I learned I can run my very own DeepSeek R1 chatbot on just $6,000 of PC hardware and no megabucks Nvidia GPUs required. PC Gamer. https://www.pcgamer.com/hardware/graphics-cards/today-i-learned-i-can-run-my-very-own-deepseek-r1-chatbot-on-just-usd6-000-of-pc-hardware-and-no-megabucks-nvidia-gpus-required/
SemiAnalysis. (2025, January 31). DeepSeek debates. SemiAnalysis. https://semianalysis.com/2025/01/31/deepseek-debates/