top of page

DeepSeek: Searching for Answers in the depth of the US-China AI war

Writer's picture: Sharon GaiSharon Gai

 

Last week, DeepSeek tore through the US stock market with an announcement that shook Wall Street. As we grapple with the instability of our economy with Trump’s tariff announcement, this disruption adds much fuel to the fire. Since I’ve covered so many Chinese apps in the past (many of which are embedded in my book) I thought it was only appropriate to be investigating more into this company that has made global shockwaves.


What you need to know:


  • Wall Street investors are questioning whether US firms truly need “that” much capital to train AI models when DeepSeek’s team purportedly did it for ~$6 million USD

  • This caused a huge dip in the stock market and slashed $465 billion dollars from NVDIA’s valuation 

  • The AI community is enamored with the results of DeepSeek and how it surpasses ChatGPT’s many features  

  • When we thought that the future of AI would only rest in the hands of tech giants, this no-name start up that has just emerged out of nowhere sends the message that any small player in this space can overtake a large player with enough ingenuity

  • Will it be banned in the US? Very likely. Tik Tok might be gradually disappearing. Deepseek might have the same fate.

 


Founder of humble beginnings


Liang Wenfeng gains the spotlight in recent weeks as the founder of Deepseek. Born in 1985 in Zhanjiang, Guangdong Province, China, Liang was born into parents who were primary school teachers. During the 2007–2008 global financial crisis, Liang and his classmates formed a team to explore quantitative trading using machine learning techniques. He founded a quantitative hedge fund High-Flyer first before starting DeepSeek.


In a speech titled, “A Software Engineer’s View on the Future Financial Industry in China,” Liang emphasized that the core distinction between quantitative and traditional investment strategies lies in the decision-making process: quantitative funds rely on algorithmic methods, whereas traditional funds depend on human judgment.


Establishment of DeepSeek


In May 2023, Liang announced High-Flyer's expansion into the field of artificial general intelligence with the launch of DeepSeek. The company focuses on developing large language models and other AI technologies. Notably, High-Flyer had previously acquired 10,000 Nvidia A100 GPUs, providing DeepSeek with substantial computational resources. However, facing U.S. semiconductor restrictions, DeepSeek designed models compatible with older GPUs and domestic chips like Huawei’s Ascend, circumventing reliance on NVIDIA’s banned A100s. Huawei, a major telecommunications company in China, has products that are mostly banned in the US.


On January 20, 2025, DeepSeek released DeepSeek-R1, a 671-billion-parameter open-source reasoning AI model. Developed at a cost of $5.6 million using 2,048 Nvidia H800 GPUs, the model demonstrated a resource-efficient approach that contrasted with the billion-dollar budgets of Western competitors. By January 27, DeepSeek-R1 had surpassed ChatGPT to become the top free app on the U.S. iOS App Store, signaling a significant milestone in the global AI landscape.

 



Deepseek’s big innovation


Unlike many American rivals whose projects demand billions of dollars and massive computational infrastructure, Deepseek claims to have engineered models with groundbreaking techniques such as the Mixture-of-Experts (MoE) architecture and Multi-Head Latent Attention (MLA).


By activating only a fraction of its 671-billion-parameter model for any given token—in its flagship Deepseek-V3, only 37 billion parameters are active—the company dramatically slashes compute costs. According to a technical paper released in late 2024, the final training run for Deepseek-V3 cost roughly $5.6 million in cloud GPU hours, a figure that stands in stark contrast to the exorbitant expenses incurred by US labs developing similar capabilities.


The secret sauce appears to lie in the clever use of MoE models. By dividing the network into “experts” that specialize in different sub-tasks (such as math, coding, or logic), and by employing a dynamic routing mechanism that directs each input only to the experts that matter most, Deepseek achieves comparable—or even superior—performance to models like OpenAI’s o1 at a fraction of the cost. In a climate where US export controls have made advanced Nvidia chips harder to acquire, Deepseek’s resourcefulness has turned a geopolitical bottleneck into an engineering advantage.


Hangzhou, a city in China you’ve probably never heard of


Deepseek is founded in Hangzhou. You might not know much about this city. You might have never even heard of this city before. But I guarantee you’ll probably hear more about this city in the coming years.




I actually used to live there. It’s where I was able to become acquainted with much of China’s technology and learn about how tech companies here operate differently from Silicon Valley. If you’re interested in learning more about how they work, send me an email.  


(Side note: I’m thinking of facilitating a business tour of the city and other cities in China if there is enough interest. I know many people have been learning Chinese and are becoming more interested in visiting the country, let me know too.)

Hangzhou is also where Alibaba started. That’s another reason you’ll probably hear more about this place – this is also where the Qwen model is being developed.

 

Qwen: Alibaba’s Answer to the AI Arms Race


Not to be outdone, Alibaba has been busy evolving its own family of language models under the Qwen brand. Initially launched in 2023 as Tongyi Qianwen, Qwen was initially embedded into Taobao, China’s Amazon, as an AI ecommerce personal shopper, China’s Amazon where an user could ask Qwen what to buy as a gift for someone (in the example below).




Qwen has since grown into an external product that is similar to ChatGPT. The latest flagship, Qwen 2.5-Max, is trained on an enormous dataset of over 18 trillion tokens and utilizes advanced MoE techniques, Qwen 2.5-Max is engineered to excel in benchmarks across reasoning, coding, and language tasks. According to Alibaba’s announcement, Qwen 2.5-Max outperforms not only leading U.S. models like OpenAI’s GPT-4o and Meta’s Llama-3.1 but also rivals DeepSeek-V3 on several key metrics.


Qwen’s architecture benefits from Alibaba’s substantial R&D resources and its integration into a comprehensive cloud ecosystem. With offerings spanning from Qwen-VL (vision-language) to specialized Qwen-Math and Qwen-Coder models, Alibaba is positioning Qwen as a full-spectrum platform for both enterprise and consumer applications. The commercial backing and extensive datasets provided by Alibaba enable Qwen to push performance boundaries, particularly in processing long-context inputs and supporting multi-modal tasks.

 

Open Source


The advent of Deepseek has also shook the tech community on whether AI models should become more open source. Yann Lecun, VP & Chief AI Scientist at Meta applauded Deepseek’s commitment in creating a model that is open source.

 


What does an open source model in AI mean exactly? And why might it be better when it comes to AI models, something that will affect humanity?


1. Accessibility and Transparency:

  • Open-Source Models: The source code, model architectures, and training methodologies are publicly available. This transparency allows developers and researchers to inspect, modify, and enhance the models, fostering innovation and collaboration.

  • Closed-Source Models: The internal workings, including source code and training processes, are proprietary and not disclosed to the public. Users can interact with the model through provided interfaces but cannot access or modify its underlying structure.


2. Customization and Control:

  • Open-Source Models: Users have the freedom to fine-tune and adapt the models to specific needs, enabling the development of tailored applications across various domains.

  • Closed-Source Models: Customization is limited to the options provided by the model's creators. Users must operate within predefined parameters, which may not cater to all specific requirements.


3. Cost and Accessibility:

  • Open-Source Models: Typically free to use, these models lower the barrier to entry for individuals and organizations, promoting widespread experimentation and application.

  • Closed-Source Models: Often require subscription fees or usage-based payments, which can be prohibitive for some users and limit accessibility.


4. Community Engagement and Support:

  • Open-Source Models: Benefit from a collaborative community that contributes to improvements, shares knowledge, and provides support, leading to rapid advancements and diverse applications.

  • Closed-Source Models: Development and support are confined to the organization that owns the model, potentially limiting the diversity of input and slowing innovation.


5. Security and Ethical Considerations:

  • Open-Source Models: Transparency allows for thorough examination, enabling the identification and mitigation of biases, vulnerabilities, or ethical concerns.

  • Closed-Source Models: Lack of transparency can obscure biases or ethical issues, making it challenging for external parties to assess and address potential problems.


Yep, OpenAI is not so open. To this, Sam Altman is reconsidering its own open source strategy. This is why open competition is healthy and so important. In so many of my keynotes, I start with the slide: competition fuels innovation. It’s the only way we can push one company to do better than the next.


The US–China AI War: A Shifting Balance


The race for AI supremacy has long been framed as a contest between the deep-pocketed research labs of Silicon Valley and the nimble, innovative startups emerging from China. It started with Tik Tok and I had covered the migration to Rednote here. Deepseek is definitely next. If we take a moment to think about the amount of data that can be captured through a seemingly harmless infinite cat video app and compare it to a direct answer engine that Americans then use at work and in their lives, Deepseek seems much more like a threat.


In fact, Senator Hawley is in the process of passing a bill that might ensure jail time for anyone who is downloading Deepseek. “Every dollar and gig of data that flows into Chinese AI are dollars and data that will ultimately be used against the United States,” Senator Hawley said in a statement. “America cannot afford to empower our greatest adversary at the expense of our own strength. Ensuring American economic superiority means cutting China off from American ingenuity and halting the subsidization of CCP innovation.”


China’s Strategy: The Second Mover’s Advantage


To win this war, China might be using its own unique tactic.


We have all heard of the First Mover’s Advantage where if you are the first to market, it is easier to establish an affinity for your brand with the customer. But sometimes, it might be the second mover that wins. China follows this strategy that I call Second Mover’s Advantage. Take what has been developed and released and optimize on top of it. China basically did it again within the AI space.


From the manufacturing of goods to algorithms, no matter if it’s hardware or software, it seems that China is remarkably good at taking something that might have emerged from the West, optimizing it, and making it cheaper and more accessible to the masses. Is it better ultimately? That one’s debatable.


Consider the examples:


  • Deepseek versus ChatGPT

  • TikTok versus Instagram

  • Temu/Shein versus Amazon

  • BYD versus Tesla

  • Xiaomi versus Apple

  • Unitree versus Optimus 


What we see here isn’t simply imitation—it’s a deliberate process of adaptation and enhancement. China’s strategy often starts with borrowing a concept that has proven successful, then re-engineering it to better fit market needs, lower production costs, or integrate with advanced local technologies. This model of iterative improvement allows for rapid innovation and scaling, turning established ideas into even more refined products and services.


This phenomenon also highlights an important shift in global innovation: the traditional lines between imitation and invention are becoming increasingly blurred. What might begin as a derivative product can evolve into something truly groundbreaking when it leverages local expertise, efficient manufacturing processes, and a keen understanding of market dynamics.


Ultimately, this evolution is redefining how we view innovation. Instead of a simple transfer of technology from one region to another, we are witnessing a collaborative global ecosystem where ideas are continuously refined and improved upon. As these dynamics play out, the world may soon see a new era of technological progress driven by the fusion of international insights and localized ingenuity.


What does this mean for the rest of us?


DeepSeek’s model is already wrapped in AWS and Microsoft Azure. (That was a quick one to integrate) So if Hawley passes this bill, does that mean companies that use this model from an enterprise perspective are also jailed? This is yet to be determined…


Because DeepSeek’s model is open source, companies are not subject to expensive licensing fees that often accompany proprietary models. Instead, they can adapt and deploy the model within their own applications without paying premium costs for usage rights. This, combined with its cloud integration, means they pay only for the actual compute used—typically on a pay-as‑you-go basis offered by Azure or AWS.


This should mean cheaper tools for you as a consumer. It also means that OpenAI (and all the other US models) are under pressure to deliver more value to the customer in a shorter amount of time. At the end of the day, users just want something that is cheap, fast and good. Tik Tok won over a bunch of Instagram users with the latter app copying many features of the former in the end. Will the same thing happen to America's suite of AI apps?


Which app is better?


Well, that depends on if you’re a paid user. From my view, dated Feb 4 2025, ChatGPT is still superior with its latest o3 mini high model. The thing is, you have limited queries as a Plus user and most people aren’t shelling over $200 a month as a Pro user. DeepSeek on the other hand is completely free. There is no paywall. Hence it will appeal to the masses. Remember Temu? Ah, that was how they nabbed users too. They got us with their $0.50 paper towels and $0.10 sofa cushions. It’s the same tactic at play here. 


Finally, we probably have not sought far enough in China’s AI landscape. When this market moves, it moves suddenly, is DeepSeek even the only player? I think we’re far from uncovering it all. As we familiarize ourselves with the handful of players in the US, I would say there are equal numbers, if not more, bubbling in China.

1 view0 comments

Recent Posts

See All

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
  • TikTok
  • LinkedIn
  • White Instagram Icon

500 Terry Francois Street

San Francisco, CA 94158

bottom of page