- TW8816-LB3-GR
- TW8816-LB3-GRS
- TW8834-TA2-CR
- TW8836-LB2-CE
- TW9910-LB2-GR
- TW9910-NB2-GR
- R7F701404EAFB
- R7F7015664AFP-C
- R7F7010323AFP
- R5F10PLJLFB#X5
- DF2676VFC33V
- HD64F2636F20V
- HD64F2638F20V
- HD64F2638WF20V
- DF2633RTE28V
- HD64F2612FA20V
- NCV7321D12R2G
- NCV7342D13R2G
- NCV7351D13R2G
- SPC5674FF3MVR3
- FS32K148UJT0VLQT
- SPC5748GK1MKU6
- FS32K144UAT0VLLT
- FS32K146HAT0MLLT
- CAT24C02TDI-GT3A
- CAT24C256YI-GT3
- S25FL064LABMFB010
- CAT24C64WI-GT3
- CAT24C256WI-GT3
- CAT25160VI-GT3
- CAT24C16YI-GT3
- CAT24C04WI-GT3
- M95256-DRMN3TP/K
- MTFC16GAPALBH-IT
- MTFC8GAKAJCN-1M WT
- MTFC8GAKAJCN-4M IT
- MT52L256M32D1PF-107 WT:B
- MT52L512M32D2PF-107 WT:B
- MT25QL256ABA8E12-0AAT
- MT25QL256ABA8ESF-0AAT
- MT29F128G08AJAAAWP-ITZ:A
- MT29F16G08ABABAWP-IT:B
- MT29F16G08ABACAWP-ITZ:C
- MT29F16G08AJADAWP-IT:D
- MT29F1G08ABAEAWP-IT:E
- MT29F2G01ABAGDWB-IT:G
- MT29F2G08ABAEAWP:E
- MT29F2G08ABAEAWP-IT:E
- 2SJ598-ZK-E1-AZ
- RJK0330DPB-01#J0
- UPA1918TE(0)-T1-AT
- NZ9F4V3ST5G
- PCF8578T/1,118
- STGYA120M65DF2AG
- BU931P
- BU941ZPFI
- BU931T
- ESDAXLC6-1BT2
- STD15P6F6AG
- TESEO-VIC3DA
- STGB20NB41LZT4
- BSS123NH6327XTSA1
- BSS131H6327XTSA1
- BSS126H6327XTSA2
- BSP315PH6327XTSA1
- IPD100N04S402ATMA1
- IPB80N06S2L07ATMA3
- BSP170PH6327XTSA1
- BSP613PH6327XTSA1
- BSS223PWH6327XTSA1
- BSS816NWH6327XTSA1
- AUIRF7341QTR
- MPXV5100GC6U
- MMA6900KQ
- MPXHZ6400AC6T1
- MPX4115AP
- MPX5050DP
- MPXAZ6115AP
- MPXHZ6115A6U
- MPXV5050DP
- MPXV5050GP
- MPXV7002DP
- MPXV7025DP
- MPX5050D
- LMT86QDCKRQ1
- TMP451AIDQFR
- TMP112AQDRLRQ1
- TMP411AQDGKRQ1
- TMP411DQDGKRQ1
- SN74LV4052APWR
- LM5007SD/NOPB
- SNJ54LS08J
- TPS3702CX33DDCR
- TMP6131QDECRQ1
- TMP6131QDYARQ1
- TMP6131QDYATQ1
Active suspension can take into account the smoothness and handling stability of the car, while in the traditional passive suspension design, smoothness and handling stability are often difficult to take into account, and generally have to take a compromise approach.When the load quality changes or road conditions change, the active suspension can ...
IVI can realize a series of applications including 3D navigation, real-time road conditions, IPTV, assisted driving, fault detection, vehicle information, body control, mobile office, wireless communication, online-based entertainment functions and TSP services, which greatly enhance the level of vehicle electronics, networking and intelligence. Ne...
The power system of a car is the whole process of mechanical arrangement in which the power generated by the engine is transmitted to the wheels through a series of power transmissions. The engine runs, in fact, the crankshaft is rotating, and one end of the crankshaft ...
Donut Lab Launch the World's First Mass-Producible All-Solid-State Battery?
At the 2026 CES, amidst a wave of cutting-edge technology showcases, one particular announcement sent shockwaves through the industry: the arrival of all-solid-state batteries. This news originates from Donut Lab, a Finnish technology company that unveiled what they claim to be the "world's first mass-producible all-solid-state battery" during this year’s exhibition.
For years, global battery manufacturers and automakers have poured immense resources and hopes into solid-state technology. Is it possible that the future has finally arrived so suddenly? Let's take a closer look at the data Donut Lab has released.

Key Performance Specifications
According to the official Donut Lab website, their all-solid-state battery is nearing mass production with several striking specifications:
Energy Density: 400 Wh/kg
Charging Speed: 5-minute ultra-fast charge to 100%
Cycle Life: 100,000 cycles
Cost: Lower than current lithium-ion batteries
1. Energy Density: 400 Wh/kg
While 400 Wh/kg is impressive, it is considered relatively "restrained" for an all-solid-state battery. Generally, the energy density of these batteries is expected to be more than double that of current lithium-ion batteries, with expectations often exceeding 500 Wh/kg. For context, Mercedes-Benz test vehicles equipped with non-mass-produced solid-state batteries have already reached 450 Wh/kg. Currently, even some "semi-solid" batteries that still contain liquid electrolytes can approach the 400 Wh/kg mark.
2. The 5-Minute Full Charge
The claim of a "5-minute fast charge to full capacity" is particularly intriguing. While solid-state batteries are inherently faster to charge than traditional lithium batteries, reaching 100% in five minutes is a bold assertion. In traditional batteries, charging typically slows down significantly (trickle charging) once the State of Charge (SOC) reaches 80% or 90%. Donut Lab’s claim implies the battery maintains high power input all the way to 100%.
However, there is a discrepancy on their website: the provided charging power curve actually shows the battery reaching approximately 80% SOC at the 300-second (5-minute) mark, rather than the 100% stated in the text.
3. Unprecedented Longevity: 100,000 Cycles
High cycle life is a known characteristic of solid-state technology, but 100,000 cycles is an extraordinary figure. To put this in perspective:
● A Lithium Iron Phosphate (LFP) battery, like BYD’s first-generation Blade Battery, has a cycle life of over 3,000 cycles.
● At 300 km per cycle, 3,000 cycles cover 900,000 km—enough to last 18 years even if driving 50,000 km annually.
● At 100,000 cycles, the battery would theoretically outlast not just the car, but potentially several generations of owners.

Global Solid-State Battery Technical Comparison (Projected 2026 Data)
Let's compare these specifications with the current leading solid-state battery projects from companies like CATL, Toyota, and QuantumScape
|
Metric |
Donut Lab (Finland) |
CATL (China) |
Toyota (Japan) |
QuantumScape (USA) |
|
Energy Density |
400 Wh/kg |
500 Wh/kg (Condensed) |
450-500 Wh/kg |
~400-500 Wh/kg |
|
Charging Time |
5 min (to 100%) |
10-15 min (to 80%) |
10 min (0-80%) |
15 min (10-80%) |
|
Cycle Life |
100,000 cycles |
2,000 - 6,000 cycles |
~2,000 cycles |
~1,000 - 2,000 cycles |
|
Mass Production |
Q1 2026 |
2026 (Semi) / 2027 (All-solid) |
2027-2028 (Lexus models) |
Late 2025 / 2026 |
|
Tech Route |
Undisclosed (Cobalt/Nickel-free) |
Sulfide/Oxide/Condensed |
Sulfide Electrolyte |
Anodeless Design |
In-Depth Comparative Analysis
1. Energy Density: Donut Lab is Relatively "Conservative"
At 400 Wh/kg, Donut Lab’s energy density is actually slightly lower than CATL’s announced "Condensed Battery" (500 Wh/kg). Industry giants like Toyota and Samsung are also targeting the 500 Wh/kg threshold in their laboratories. This suggests that Donut Lab's core competitiveness may not lie in how much energy it can store per kilogram, but rather in its speed of practical application and commercialization.
2. Charging Speed: Challenging the Laws of Physics
Toyota’s plan for a 10-minute fast charge by 2027 is already considered industry-leading. Donut Lab’s claim of "100% charge in 5 minutes" is technically audacious.
Industry Standard: Even current 4C or 5C fast-charging technologies usually only apply to the 10%-80% range.
The Donut Lab Edge: They claim the battery does not enter "trickle charge" mode at the end of the cycle. If true, this would eliminate "range anxiety" entirely, though it would place immense stress on thermal management systems and local power grid capacity.
3. Cycle Life: The Most Controversial Data Point
This is where the gap is most staggering. Top-tier power batteries today typically offer around 3,000 cycles (enough for roughly 1 million kilometers).
100,000 Cycles: This figure is two orders of magnitude higher than the current industry standard. In known electrochemical systems, such longevity is usually only seen in "Supercapacitors," not high-energy-density chemical batteries.
Market Impact: If accurate, these batteries would last for centuries. While this might be "overkill" for a passenger car, it would have revolutionary implications for energy storage, industrial robotics, and high-frequency logistics.
4. Timeline: Startup vs. Traditional Giants
The Giants: CATL Chairman Robin Zeng has stated that all-solid-state batteries are currently at a "4" out of 9 on the maturity scale, with massive production not expected until closer to 2030.
Donut Lab: Claims delivery in Q1 2026 with their electric motorcycles. This timeline is significantly more aggressive than Toyota’s 2027-2028 target or CATL’s 2027 small-batch pilot.
Final Assessment
Donut Lab appears to be a "black horse" in the industry, attempting to bypass the semi-solid-state phase entirely through a startup's agility and a bold technical route.
If the data is authentic: It will rewrite the global energy landscape, reshaping not just the automotive industry, but also grid storage and aviation.
The Skeptical View: Given the lack of transparency regarding their "Cycle Life" testing and the discrepancies in their "Charging Curve," mainstream experts remain cautiously optimistic at best, and highly skeptical at worst.

Safety and Environmental Sustainability
Beyond the primary metrics, this battery reportedly offers several other "perfect" performance features:
Safety: The battery is completely non-flammable.
Thermal Resilience: It operates normally between -30°C and 100°C, maintaining over 99% of its charge in these extremes.
Green Materials: Production utilizes 100% eco-friendly materials (though specific details remain undisclosed) and eliminates the need for rare metals like cobalt and nickel. This reduces costs and bypasses supply chain bottlenecks and geopolitical risks, allowing for production anywhere in the world.
Form Factor: The battery is highly adaptable and can be customized into any size or geometric shape, described as being as flexible as "clay."

Final Observations
Donut Lab plans to first deploy these batteries in its own electric motorcycles, with mass production and delivery scheduled for the first quarter of this year. The company is ambitious, with interests spanning electric chassis, drive motors, industrial robotics, and software platforms.
However, some skepticism is warranted. Donut Lab is a startup that only became independent from its parent company in late 2024, meaning it has been operating on its own for just over a year. Even including the parent company's history, the total development time is only about seven or eight years.
In the wider industry, the consensus for the earliest mass production of all-solid-state batteries is generally no earlier than 2027, and even that timeline faces a high risk of delays. As Robin Zeng, Chairman of CATL, noted late last year: "On a scale of 1 to 9 for technical and manufacturing maturity, the industry's highest level is currently only around 4." Significant hurdles remain regarding materials, production technology, and the costs associated with establishing entirely new manufacturing systems.
Whether Donut Lab has truly achieved a breakthrough or is overpromising remains to be seen as their Q1 deadline approaches.
Jan 19, 2026
NVIDIA Alpamayo: In-depth Analysis of Inference-centered AI Architecture for Autonomous Driving
At CES 2026, NVIDIA CEO Jensen Huang announced the launch of Alpamayo, defining it as the first autonomous driving AI capable of "thinking and reasoning." This marks a historic leap for autonomous driving technology from "perception-driven" to "reasoning-driven."
For automotive chip and autonomous driving developers, Alpamayo is more than just a new model – it represents a complete, open-source "Physical AI" ecosystem.

Core Technical Architecture: From Perception to "Vision-Language-Action" (VLA)
Traditional autonomous driving systems often decouple perception and planning, while Alpamayo adopts an innovative Vision-Language-Action (VLA) model architecture.
1. 10 Billion-Parameter "Chain of Thought" Reasoning
Alpamayo 1 features 10B (10 billion) parameters, consisting of two core components:
● Backbone: The 8.2B-parameter Cosmos-Reason model.
● Action Expert: A 2.3B-parameter Diffusion-driven trajectory decoder.
This architecture enables Chain of Thought reasoning. Instead of mechanically outputting acceleration or steering commands, the system generates "reasoning traces" like humans. For example, when approaching an intersection, it might think: "I see a stop sign ahead and pedestrians on the left; I should slow down and stop to wait."

2. End-to-End Training and the Role of "Teacher Model"
Alpamayo is trained end-to-end, directly forming a closed loop from camera input to actuator output. NVIDIA explicitly positions Alpamayo 1 as a "teacher large model":
● Vehicle Deployment: Developers can use model distillation to extract its reasoning capabilities into more streamlined runtime models for real-time operation on in-vehicle chips.
● Toolchain Support: It can also serve as the foundation for automatic annotation systems or reasoning evaluators, significantly improving data processing efficiency.
Three Pillars of the Ecosystem
NVIDIA's release includes not just model weights but a full-stack developer platform:
|
Pillar |
Content |
Technical Value |
|
Open-Source Model (Alpamayo 1) |
Open weights and inference scripts |
Supports developers in fine-tuning according to regional safety standards. |
|
Simulation Tool (AlpaSim) |
Fully open-source end-to-end framework |
Provides high-fidelity sensor modeling and supports closed-loop testing of rare edge cases. |
|
Physical AI Open Dataset |
1,727 hours of real driving data |
Covers 25 countries, 100 TB of sensor data, including complex long-tail scenarios. |
Why This Is a Turning Point for the Automotive Chip Industry?
1. Ultimate Solution to the "Long-Tail Effect"
The biggest challenge in autonomous driving lies in handling rare extreme scenarios (e.g., faulty traffic lights or abnormal roadblocks). Alpamayo's reasoning ability allows it to make safe decisions based on physical common sense in unprecedented new scenarios, rather than relying solely on training experience.
2. Transparency and Interpretability
Traditional "black-box" models make it difficult to explain accident causes. Alpamayo's reasoning traces demonstrate the logic behind every decision – critical for passing regulatory approval and building user trust.
3. Mass Production Deployment: Mercedes-Benz CLA Takes the Lead
Alpamayo is no longer confined to laboratories. Jensen Huang confirmed that the first Mercedes-Benz CLA models equipped with the system will hit U.S. roads in the first quarter of 2026, followed by launches in Europe and Asia in the second and third quarters respectively.

NVIDIA Alpamayo VS Tesla FSD
In the "chip war" of autonomous driving, NVIDIA Alpamayo and Tesla FSD (especially the upcoming v14 version) represent two distinct chip design philosophies and computing power allocation strategies. We conduct an in-depth comparison across three technical dimensions:
1. Hardware Specs & Chip Architecture
|
Dimension |
NVIDIA Alpamayo (Powered by DRIVE Thor) |
Tesla FSD (Powered by AI4 / HW4.0) |
|
Single-Chip Computing Power |
1,000 INT8 TOPS / 2,000 FP4 TFLOPS |
Estimated 300-500 TOPS (AI4) |
|
Process/Architecture |
Blackwell Architecture (4nm/3nm class) |
Tesla Custom SoC (Samsung 7nm/5nm) |
|
VRAM |
24GB+ (minimum requirement for model inference) |
16GB - 32GB (shared memory) |
|
Computing Redundancy |
DRIVE Hyperion platform supports dual Thor redundancy |
Dual-chip backup with more aggressive computing power allocation |
● NVIDIA Thor: Adopts the latest Blackwell Architecture, optimized for FP4 precision specifically for Transformers and Physical AI. This delivers significantly higher throughput than previous generations when running 10B-parameter models like Alpamayo.
● Tesla AI4: While single-chip computing power is slightly lower, its vertical integration efficiency is exceptional. Elon Musk revealed that Tesla bypasses 70% of NVIDIA's chip gross margin through custom chip development, achieving higher cost-effectiveness.
2. Differences in Parameter Count & Memory Pressure
● Alpamayo (10B parameters): A typical "large model," its 10 billion parameters impose extremely high requirements on in-vehicle VRAM. NVIDIA officially states that even inference scripts require a minimum of 24GB VRAM to load. This means vehicles equipped with Alpamayo must be configured with high-bandwidth, large-capacity LPDDR5X memory.
● Tesla FSD (approximately 10-15B parameters): Tesla's latest end-to-end model also falls within this parameter range. However, Tesla's advantage lies in model distillation – while cloud-trained models are large, on-vehicle FSD models undergo extreme pruning and quantization to adapt to the limited SRAM and memory bandwidth of HW3.0/4.0.
3. Inference Strategy: "Chain of Thought" vs. "Pure Neural Execution"
This is the most significant difference in how the two consume computing power:
1. Alpamayo's "Expensive Reasoning"
Alpamayo employs Chain-of-Thought (CoT) reasoning. This means the chip must compute not only driving actions but also "reasoning traces" (textual thinking logic). This Vision-Language-Action (VLA) model generates a large number of intermediate tokens during inference, imposing higher concurrency requirements on the chip's single-thread performance and Tensor Cores.
2. Tesla's "Intuitive Driving"
Tesla FSD v12+ is a "pure neural execution" system, more akin to human "muscle memory." While it is evolving toward reasoning (v14.3 version), its design goal is ultimate low latency and real-time performance, with computing power allocated more toward real-time video stream analysis rather than long-chain logical explanation.

Summary: How Should Developers Choose?
● Choose NVIDIA Thor + Alpamayo: Ideal for automakers pursuing L4-level autonomy and operating in highly regulated environments. Alpamayo's "reasoning paths" provide the best technical means for regulatory audits and accident explanation. Its computing power headroom (1000 TOPS) reserves space for future model upgrades.
● Choose Tesla's Model (Custom/Highly Optimized SoC): Suitable for automakers prioritizing extreme cost control and large-scale mass production. It proves that with massive data and efficient compilers, top-tier assisted driving experiences can still be achieved with 300-500 TOPS of computing power.
Conclusion
The launch of NVIDIA Alpamayo marks the transition of autonomous driving from the "Perception Era" to the "Cognition Era." For automakers and chip designers, future competition will shift from mere computing power stacking to efficiently supporting the on-vehicle deployment and real-time response of such large-scale reasoning models.
FAQs About NVIDIA Alpamayo
1. What is NVIDIA Alpamayo?
It is a comprehensive autonomous driving ecosystem featuring a reasoning-based Vision-Language-Action (VLA) model, open-source simulation tool (AlpaSim), and large-scale Physical AI dataset. Designed for "thinking and reasoning," it shifts autonomous driving from perception-driven to reasoning-driven.
2. What core components does Alpamayo include?
● Alpamayo 1: An open 10B-parameter reasoning VLA model available on Hugging Face.
● AlpaSim: An open-source end-to-end simulation framework for closed-loop testing.
● Physical AI Dataset: 1,727 hours of multi-sensor driving data from 25 countries.
3. How is Alpamayo different from traditional autonomous driving systems?
It adopts Chain-of-Thought reasoning to handle rare "long-tail" scenarios (e.g., faulty traffic lights) with physical common sense. Unlike traditional decoupled perception-planning systems, it generates transparent reasoning traces to explain decision logic.
4. Can developers access and customize Alpamayo?
Yes. It is open-source, allowing developers to fine-tune the base model with local data, adapt to regional driving rules, and distill its capabilities into streamlined in-vehicle runtime models.
5. When will Alpamayo-powered vehicles be available?
The first Mercedes-Benz CLA models with Alpamayo launch in the U.S. in Q1 2026, followed by Europe in Q2 and Asia in H2 2026, starting with supervised hands-free driving modes.
6. What hardware is required to run Alpamayo?
Alpamayo 1 requires a minimum of 24GB VRAM for inference. It is optimized for NVIDIA DRIVE Thor chips (1,000 INT8 TOPS) based on the Blackwell Architecture, ensuring efficient execution of large reasoning models.
Jan 08, 2026
NVIDIA BlueField-3 DPU Architecture and Roadmap
Modern hyperscale cloud technologies are driving data centers toward new architectural paradigms. A new class of processors—specifically designed for data center infrastructure software—is being adopted to offload and accelerate the massive computational workloads generated by virtualization, networking, storage, security, and other cloud-native AI services. This class of products is represented by the BlueField DPU family.

As illustrated in NVIDIA’s BlueField DPU product roadmap, the lineup includes the already available second-generation BlueField-2, the upcoming BlueField-3 DPU delivering 400 Gb/s throughput, and the future BlueField-4 DPU, which will integrate NVIDIA GPU capabilities and scale up to 800 Gb/s.

BlueField-3 is the first 400 Gb/s DPU purpose-built for AI and accelerated computing. It enables enterprises to achieve industry-leading performance and data center-grade security across applications of any scale. A single BlueField-3 DPU can deliver data center services equivalent to what would otherwise require up to 300 CPU cores, thereby freeing valuable CPU resources to run mission-critical business applications. Optimized for multi-tenant and cloud-native environments, BlueField-3 provides software-defined, hardware-accelerated networking, storage, security, and management services at the data center level.
The introduction of BlueField-3 addresses long-standing industry challenges related to end-to-end data security. BlueField-3 fully inherits the advanced capabilities of BlueField-2 and significantly enhances and extends them in terms of performance and scalability, as shown in the figure below.
BlueField Architecture Overview
At its core, the BlueField architecture tightly integrates a network interface subsystem with a programmable data path, dedicated hardware accelerator subsystems for functions such as encryption and compression, and an ARM-based processor subsystem for control and management. In BlueField-3, the Data Path Accelerator (DPA) includes 16 processing cores capable of handling up to 256 concurrent threads.

The key technical features of BlueField-3 are described below across networking, security, and storage workloads.
1. Networking Workloads
For networking workloads, BlueField-3 further enhances technologies such as RDMA, connection tracking, and ASAP². It also delivers improved time-synchronization accuracy, enabling precise clock synchronization between data centers and edge environments. Key technologies are analyzed below.
RDMA Technology
RDMA (Remote Direct Memory Access) enables direct data exchange between memory spaces, providing excellent scalability, higher performance, and significant CPU offloading. The main advantages of RDMA include:
1. Zero-copy Applications can perform data transfers directly without traversing the network software stack. Data is sent directly to application buffers or received directly from them, eliminating intermediate copies to network layers.
2. Kernel bypass Applications can transfer data entirely in user space, avoiding costly context switches between kernel and user modes.
3. No CPU involvement Applications can access the memory of a remote host without consuming any CPU resources on that host, enabling remote read and write operations transparently.
4. Message-based transactions Data is processed as discrete messages rather than streams, removing the need for applications to segment streams into individual messages or transactions. Messages up to 2 GB in size are supported.
5. Native scatter/gather support RDMA natively supports scatter/gather operations, allowing data to be read from multiple memory buffers and transmitted as a single message, or received as one message and written into multiple buffers.
GPU-Direct RDMA (GDR)
GPU-Direct RDMA (GDR) enables the GPU of one system to directly access the GPU memory of another system. Prior to GDR, data had to be copied from GPU memory to system memory before RDMA transmission, and then copied again from system memory to GPU memory on the destination system.
GDR significantly reduces data copy operations during GPU communication and further lowers latency. Mellanox network adapters already support GPUDirect RDMA for both InfiniBand and RoCE transports. Following NVIDIA’s acquisition of Mellanox, all NVIDIA network adapters now fully support GPU-Direct RDMA technology.
2. Security Workloads
In terms of security, BlueField-3 delivers full line-rate, on-the-fly encryption and decryption at 400 Gb/s across the IP, transport, and MAC layers. When using deep packet inspection with RegEx and DPI, throughput can reach up to 50 Gb/s. Key security features are described below.
IPSec Acceleration
BlueField-3 supports the IPSec protocol, providing encryption and decryption at the IP layer while maintaining network line-rate performance. IPSec (Internet Protocol Security) is a suite of open security standards developed by the IETF (Internet Engineering Task Force). Rather than a single protocol, IPSec comprises a set of protocols and services designed to secure IP communications, supporting both IPv4 and IPv6 networks.
IPSec primarily includes the AH (Authentication Header) and ESP (Encapsulating Security Payload) security protocols, the IKE (Internet Key Exchange) key management protocol, and various authentication and encryption algorithms. By combining encryption and authentication, IPSec provides comprehensive security services for IP packets.
BlueField-3 can achieve IPSec encryption and decryption speeds of up to 400 Gb/s. In comparison, CPU-based IPSec implementations paired with 100 Gb/s or 200 Gb/s networks typically deliver only 20–40 Gb/s and consume substantial CPU resources. Offloading IPSec to BlueField-3 releases this CPU capacity for application workloads.
TLS Acceleration
BlueField-3 also supports TLS at the TCP layer, securing data in transit. TLS is the encryption protocol used by HTTP to mitigate three major risks of plaintext communication:
l Eavesdropping, where third parties can intercept communication content
l Tampering, where communication data can be modified
l Impersonation, where malicious actors can masquerade as legitimate parties
l Accordingly, TLS is designed to ensure that:
l All transmitted data is encrypted, preventing eavesdropping
l Integrity checks detect any data modification immediately
l Digital certificates prevent identity spoofing
l TLS relies on public-key cryptography. Clients obtain the server’s public key and use it to encrypt data, which the server then decrypts using its private key. BlueField-3 can achieve TLS encryption and decryption speeds of up to 400 Gb/s, once again offloading significant computational overhead from the CPU.
3. Storage Workloads
In storage workloads, BlueField-3 enables capabilities that were previously difficult or impossible to achieve. It can emulate block storage, file storage, object storage, and NVMe storage, while offloading encryption and decryption operations—such as AES-XTS—during data persistence. Even cryptographic signing operations can be offloaded to the DPU.
Its Elastic Block Storage (EBS) performance can reach up to 18 million IOPS for read and write operations, while virtualization I/O acceleration can achieve up to 80 million packets per second (Mpps).
BlueField SNAP Technology
BlueField SNAP is a software-defined network-accelerated processing technology that allows users to access remote NVMe storage connected to servers as if it were local storage. It combines the efficiency and manageability of remote storage with the simplicity of local storage access.
The NVIDIA BlueField SNAP solution eliminates dependency on local storage and addresses growing cloud demands for storage disaggregation and composable storage architectures. SNAP integrates seamlessly into nearly any server environment, regardless of operating system or hypervisor, enabling faster adoption of NVMe over Fabrics (NVMe-oF) across data centers.
Delivered as part of the BlueField PCIe DPU SmartNIC portfolio, BlueField SNAP virtualizes physical storage so that network-attached flash storage behaves like local NVMe storage. Today, all major operating systems and hypervisors support local NVMe SSDs.

By leveraging existing NVMe interfaces and combining them with the performance, manageability, and software transparency of local SSDs, BlueField SNAP delivers composability and flexibility for network flash storage. When combined with BlueField’s powerful multi-core ARM processors, virtual switching, and RDMA offload engines, SNAP supports a broad range of accelerated storage, software-defined networking, and application solutions. Together with ARM processing, SNAP can also accelerate distributed file systems, compression, deduplication, big data analytics, AI workloads, load balancing, and security applications.
4. Development Ecosystem
For the development ecosystem, NVIDIA provides the DOCA (Data Center on a Chip Architecture) software development kit, designed to enable and support the BlueField partner ecosystem. Through DOCA, developers can implement software-defined networking, storage, and security, and directly access BlueField’s hardware acceleration engines.
The NVIDIA DOCA SDK offers a complete and open development platform for building software-defined and hardware-accelerated networking, storage, security, and management applications on BlueField DPUs. DOCA includes runtime environments for creating, compiling, and optimizing applications; orchestration tools for configuring, upgrading, and monitoring thousands of DPUs across a data center; and an expanding set of libraries, APIs, and applications such as deep packet inspection and load balancing.
DOCA is a framework composed of libraries, memory management, and services built on a mature driver stack. Some libraries are derived from open-source projects, while others are proprietary to NVIDIA. Similar to how CUDA abstracts GPU programming, DOCA abstracts DPU programming at a higher level. NVIDIA delivers a complete solution by combining developer-focused DOCA SDKs with DOCA management software for out-of-the-box deployment.
For example, ASAP² is a hardware-based protocol that processes network data paths and is delivered in binary form. It enables network device emulation through Virt I/O and low-level APIs for configuring flow tracking and Regex accelerators. Security drivers provide in-kernel TLS offload, while SNAP drivers enable NVMe virtualization for storage workloads.
DOCA maintains backward compatibility across generations. NVIDIA’s vision is to establish the DPU as the third pillar of heterogeneous computing—complementing CPUs and GPUs—and DOCA is essential to realizing this vision across a wide range of applications.
The Role and Value of the DPU
DPUs extend the capabilities of SmartNICs by inheriting features such as CPU offload, programmability, task acceleration, and traffic management, while enabling unified programmable acceleration across both the control plane and data plane.
Traditionally, data center operations—including both compute workloads and infrastructure tasks—have relied heavily on CPUs. As data processing demands continue to grow, CPU performance has reached practical limits, and the slowing of Moore’s Law has become increasingly evident. GPUs emerged to address compute bottlenecks, but the data center bottleneck has now shifted toward infrastructure tasks such as data storage, data validation, and network security.
The DPU addresses this need by accelerating general-purpose infrastructure workloads. In a DPU-centric architecture, DPUs form a powerful infrastructure layer, while CPUs and GPUs focus on application compute. Key characteristics of a DPU include:
1. Industry-standard, high-performance, software-programmable multi-core CPUs, typically based on widely adopted ARM architectures and tightly integrated with other SoC components.
2. High-performance networking interfaces capable of parsing, processing, and efficiently delivering data to GPUs and CPUs at line rate.
3. Rich, flexible, programmable acceleration engines that offload and accelerate AI and machine learning, security, telecommunications, storage, and virtualization workloads.
|
Dimension |
SmartNIC |
DPU (Data Processing Unit) |
|
Positioning |
Improves server performance in cloud and private data centers by offloading networking and other workloads from the server CPU. |
A data center-level computing processor that can exist as the smallest node in a data center. |
|
Main Features |
Frees up CPU overhead and is programmable; features task acceleration and traffic management. |
Includes dual-plane offloading and acceleration for both data and control planes; covers all SmartNIC functions; features standard, high-performance, software-programmable multi-core CPUs and a rich set of flexible programmable acceleration engines. |
|
Ecosystem |
The ecosystem is complex with non-unified standards; development difficulty is high, and project portability is poor. |
Possesses a standard ecosystem; some have dedicated software development platforms providing high-level standard development interfaces (such as NVIDIA's DOCA SDK), leading to low entry and development difficulty. |
|
Application Scenarios |
Accelerates specialized services such as storage, security, and data compression. |
Data centers and cloud computing; network security; high-performance computing (HPC) and AI; communications and edge computing; data storage; streaming media, etc. |
|
Value |
Processes specialized services with relatively single functionality within the data center; passive and dependent on other devices. |
Can function as a standalone, independent data center unit with rich, expandable functions; set to become a standard data center component and one of the three core pillars (CPU, GPU, DPU); active, capable of serving as a computing node, NIC, or acceleration engine, and can exist independently. |
The core mission of the DPU is data pre-processing and post-processing. This includes networking tasks (such as ALL-to-ALL and point-to-point communication acceleration, IPSec, TCP connection tracking, and RDMA), storage tasks (distributed storage, encryption and decryption at rest, compression, redundancy algorithms), virtualization acceleration (OVS and hypervisor offload, separation of control and data planes), and hardware-based security (such as Root of Trust).
From a cloud computing perspective, the DPU effectively offloads the entire IaaS service stack into hardware acceleration.
SmartNICs typically fall into FPGA-based and ARM-based categories. FPGA-based SmartNICs struggle with control-plane processing, while ARM-based SmartNICs can become overloaded when handling diverse workloads. By providing dual-plane acceleration for both data and control planes, DPUs overcome these limitations.
Moreover, unlike traditional SmartNICs, DPUs can function as the smallest autonomous node in a data center, integrating compute, networking, acceleration engines, and security. As a result, DPUs are expected to become a standard component of future data centers and one of the three core pillars alongside CPUs and GPUs.
NVIDIA BlueField-3 DPU FAQs
1. What is NVIDIA BlueField-3 DPU used for?
The NVIDIA BlueField-3 DPU is used to offload, accelerate, and secure data center infrastructure workloads such as networking, storage, security, and virtualization. By handling these tasks in hardware, BlueField-3 frees CPU resources for application processing and improves overall data center performance and isolation.
2. How is BlueField-3 different from BlueField-2?
BlueField-3 significantly increases performance and scalability compared to BlueField-2. It supports up to 400 Gb/s throughput, offers enhanced Data Path Accelerators (DPA), improved RDMA and security offload, and delivers data center services equivalent to hundreds of CPU cores. BlueField-2 is limited to lower bandwidth and earlier acceleration capabilities.
3. What makes BlueField-3 suitable for AI and accelerated computing?
BlueField-3 is designed to support AI and accelerated workloads by providing high-bandwidth networking (400 Gb/s), RDMA and GPU-Direct RDMA support, and hardware-accelerated security. These features reduce latency, minimize data movement overhead, and ensure that GPUs and CPUs are dedicated to AI computation rather than infrastructure tasks.
4. Does BlueField-3 support hardware security acceleration?
Yes. BlueField-3 provides full line-rate hardware acceleration for security protocols such as IPSec and TLS at up to 400 Gb/s. It also supports deep packet inspection (DPI), RegEx acceleration, and root-of-trust capabilities, enabling strong isolation and zero-trust security models in multi-tenant cloud environments.
5. How does BlueField-3 improve storage performance?
BlueField-3 accelerates storage by offloading NVMe, NVMe-oF, block, file, and object storage operations to the DPU. With BlueField SNAP, remote NVMe storage can be accessed as if it were local, while encryption, compression, and virtualization tasks are handled in hardware, resulting in higher IOPS and lower CPU overhead.
6. What is NVIDIA DOCA, and how does it relate to BlueField-3?
NVIDIA DOCA is a software development framework that allows developers to build and deploy networking, storage, security, and management applications on BlueField DPUs. DOCA provides APIs, libraries, and tools to directly access BlueField-3’s hardware acceleration engines, simplifying DPU programming and enabling portable, future-proof infrastructure applications.
Dec 30, 2025
Google TPU Chip Ironwood Technology Explained
In November, Google officially commercialised its seventh-generation TPU chip, Ironwood, marking one of the most significant updates in its AI accelerator roadmap to date. Compared with the sixth-generation TPU (Trillium), Ironwood delivers a fourfold improvement in both model training performance and inference throughput. This leap does not merely represent incremental silicon progress; it directly targets the rapidly growing global demand for large-scale generative AI and enterprise-level AI deployments.
By addressing the core bottleneck in AI inference—namely, the high cost and large energy footprint of deploying trillion-parameter models—Ironwood enables enterprises to run advanced AI workloads more efficiently and affordably. Google has stated that leading AI developer Anthropic is preparing to deploy one million new TPUs to support ongoing development and operation of its Claude model family, illustrating the scale at which modern AI systems now operate.
In the following sections, we will examine Google’s latest TPU in detail.
Introduction to TPUs
What is a TPU and what does it do?
A Tensor Processing Unit (TPU) is a custom-designed application-specific integrated circuit (ASIC) developed by Google for accelerating machine-learning workloads. Google introduced the first-generation TPU internally in 2015, and the company publicly revealed the technology during the 2016 Google I/O conference. Since then, TPUs have become a foundational element of Google’s AI infrastructure.
The TPU differs fundamentally from general-purpose computing chips because all elements of its architecture—logic units, data paths, memory hierarchy, and interconnect—are designed specifically for tensor operations, such as matrix multiplication and convolution. These operations form the mathematical backbone of neural networks, particularly deep learning models used in language processing, vision, speech recognition, and recommendation systems.
By focusing exclusively on these operations, TPUs eliminate unnecessary hardware complexity and achieve extremely high parallelism, enabling substantial improvements in computational efficiency relative to CPUs and GPUs.

What is an ASIC chip?
An ASIC (Application-Specific Integrated Circuit) is a chip tailored to perform a particular task or serve a specific application domain. Unlike CPUs and GPUs—which are designed to handle broad categories of operations—ASICs are engineered with a single purpose in mind.
This design philosophy brings several pronounced advantages:
1. Higher Performance
Because ASICs incorporate hardware structures optimised for a target task, they can execute these operations far more efficiently. For example, AI-focused ASICs like TPUs implement large systolic arrays and streamlined control logic, reducing the number of cycles needed to perform each operation. Pipelining and parallel data flow further minimise latency.
2. Superior Energy Efficiency
General-purpose processors typically waste energy executing functions not directly required for AI tasks, such as branch prediction and complex control flows. ASICs, by contrast, remove unnecessary logic gates and minimise switching activity. This results in significantly lower power consumption and allows higher sustained utilisation of computational units.
3. High Integration and Smaller System Footprint
ASICs can consolidate diverse functional blocks—compute engines, memory controllers, interconnect components—onto a single die. This reduces system size, simplifies board design, and enhances reliability. In mass production, ASICs also benefit from economies of scale, making them cost-effective in high-volume or hyperscale deployments.
The Evolution of Google’s TPU
Google’s TPU programme has evolved rapidly over the past decade:
● 2015 – TPU v1: Introduced as an internal inference accelerator.
● 2016 – TPU v1 publicly unveiled: Demonstrated at Google I/O; used in AlphaGo, showing its ability to support sophisticated reinforcement-learning systems.
● 2018 – TPU v2: Added distributed shared memory and moved towards large-scale training workloads.
● 2020 – TPU v3: Implemented liquid cooling, enabling higher power envelopes and improved thermal stability.
● 2022 – TPU v4: Adopted a 3D torus interconnect topology, dramatically improving multi-chip scaling.
● 2023 – TPU v5: Delivered further improvements in cost-per-compute and training efficiency.
● 2024 – TPU v6 Trillium: Added an MLP core specifically optimised for Transformer-based language models and expanded support for large model training.
● 2025 – TPU v7 Ironwood: A major architectural step forward.
A single Ironwood Superpod integrates 9,216 TPU chips, each equipped with 192 GB of HBM3e memory providing 7.4 TB/s bandwidth, and a peak computational performance of 4,614 TFLOPs (FP8). Collectively, such a system forms one of the world’s most capable AI supercomputers.

How TPUs Differ from CPUs and GPUs
Architectural Differences
The core architectural distinctions between CPUs, GPUs and TPUs can be summarised as follows:
CPU (Central Processing Unit)
CPUs prioritise flexibility and are built with complex control units and deep cache hierarchies. This enables them to handle branching, interrupts and diverse workloads. However, their relatively small number of cores limits their parallel computing capability.
GPU (Graphics Processing Unit)
GPUs contain thousands of small compute cores designed for highly parallel workloads, originally graphics rendering but now widely used for general-purpose matrix operations. However, GPUs still retain general-purpose components that introduce overhead for AI workloads.
TPU (Tensor Processing Unit)
A TPU strips away much general-purpose hardware and instead uses a systolic array, a highly parallel grid of arithmetic units specifically tuned for tensor operations. Data moves rhythmically through the array, allowing tens of thousands of simultaneous multiply-accumulate operations with minimal control overhead.
Application Scenarios
● CPUs are ideal for flexible, small-scale inference, model prototyping and tasks requiring high control complexity.
● GPUs excel at training medium-sized models, executing custom kernels, and supporting a wide range of workloads.
● TPUs dominate in ultra-large-scale training, long-duration workloads, trillion-parameter embedding lookups, and massive parallel inference tasks.
Comparison of CPU, GPU, FPGA, and ASIC (NPU/TPU)
|
Dimension |
CPU |
GPU |
FPGA |
ASIC (NPU / TPU) |
|
Full Name |
Central Processing Unit |
Graphics Processing Unit |
Field-Programmable Gate Array |
Application-Specific Integrated Circuit (Neural Processing Unit / Tensor Processing Unit) |
|
Primary Purpose |
General-purpose computing, OS tasks, logic-heavy operations |
Parallel computation, graphics rendering, AI training & inference |
Customisable hardware logic, prototyping specialised pipelines |
Highly specialised AI computation (tensor/matrix operations) |
|
Architecture Type |
Few powerful cores, deep cache hierarchy |
Thousands of simple cores for massive parallelism |
Reconfigurable logic blocks + routing matrix |
Fixed-function compute arrays (e.g., systolic arrays) optimised for AI |
|
Programming Flexibility |
Very high |
High |
Very high (hardware-level customisation) |
Low (purpose-built for specific workloads) |
|
Performance on AI Workloads |
Low |
High |
Moderate to High (depends on custom design) |
Very High (industry-leading efficiency for LLMs) |
|
Latency Characteristics |
Low latency, good for control-heavy tasks |
Moderate latency |
Very low latency when optimised |
Very low latency for supported AI operations |
|
Energy Efficiency |
Low to moderate |
Moderate |
High (when optimised) |
Very high (2–4× GPU in many cases) |
|
Hardware Customisation |
None |
Limited |
Full hardware customisation |
None after manufacturing (fully fixed) |
|
Scalability in Data Centres |
Limited |
High (multi-GPU clusters) |
Moderate (depends on design complexity) |
Very high (thousands of NPU/TPU chips in pods) |
|
Use Cases |
OS, applications, logic processing, sequential tasks |
Deep learning training, graphics rendering, HPC |
Prototyping, edge AI, specialised pipelines, real-time control |
LLMs, large-scale AI training & inference, recommendation engines |
|
Typical Power Consumption |
45–125 W |
250–700+ W |
Highly variable (1–50 W edge / 100+ W data centre) |
10–200 W (TPU v7 = ~157 W) |
|
Ease of Development |
Easiest |
Easy due to CUDA/ROCm |
Difficult (hardware design skills needed) |
Moderate; requires framework support (XLA/NNAPI) |
|
Cost |
Low |
High |
Variable |
High initial cost, low cost-per-compute for large deployments |
|
Best Strength |
Versatility |
Parallel throughput |
Custom logic & low latency |
Maximum AI efficiency & scale |
|
Main Limitation |
Poor parallel performance |
High power use & less efficient scaling |
Complex development & longer design cycles |
Limited flexibility & tied to manufacturer ecosystem |
TPU Hardware Architecture
The TPU architecture is built around three interdependent subsystems:
1. Compute Subsystem
The systolic array consists of thousands of arithmetic logic units laid out in a two-dimensional grid. Each ALU performs multiply-accumulate (MAC) operations while data flows through the array in a pipelined manner. This design allows near-maximum utilisation of compute resources, surpassing typical GPU utilisation rates for large matrix multiplications.
2. Memory Subsystem
TPUs incorporate multiple layers of memory:
● High-bandwidth HBM3e capable of hundreds of gigabytes per second
● High-speed SRAM caches
● Local register files for extremely low-latency access
This hierarchical approach ensures the compute units are consistently supplied with data, minimising bottlenecks.
3. Interconnect Subsystem
The TPU interconnect enables multiple chips to work in synchrony, forming TPU Pods that scale to thousands of devices. High-speed links and topology-aware routing ensure efficient cross-chip communication.
Google also integrates programmable controllers and data-processing modules that handle scheduling, prefetching, and format conversion, all contributing to performance gains.

HBM in TPU Architectures
High Bandwidth Memory (HBM) is crucial for sustaining the throughput required by modern neural networks. Large models demand enormous amounts of data movement, and HBM3e reduces memory stall time by delivering multi-terabyte-per-second bandwidth. In Ironwood, the 192 GB memory capacity per chip means larger model partitions can be stored locally, reducing the need for inter-chip communication.

Core Technical Advantages of TPUs
1. Energy Efficiency
TPUs allocate the majority of transistors to compute units rather than control logic, enabling them to deliver 2 to 4 times higher performance per watt than contemporary GPUs. This is essential for large-scale AI clusters where energy usage is a major operational and environmental concern.
2. Compute Density
With 4,614 TFLOPs of FP8 compute capability, Ironwood surpasses even Nvidia’s latest Blackwell GB200 GPU on raw inference performance. The smaller physical footprint also enables higher rack density, lowering the total cost of ownership for hyperscale deployments.
3. Cost Effectiveness
TPUs reduce redundant hardware costs and leverage Google’s XLA compiler to optimise models automatically. According to Google Cloud, training large language models on TPUs can be 40–60% cheaper than performing the same tasks on GPUs.
Typical TPU Application Scenarios
TPUs support a wide range of practical AI tasks:
1. Natural Language Processing
Google’s PaLM and Gemini models, among the world’s largest and most capable language models, are trained on TPU Pods. The TPU architecture is particularly effective for attention mechanisms and wide-layer MLPs.
2. Computer Vision
Image classification, object detection, and video understanding workloads benefit from the TPU’s high matrix-multiplication throughput.
3. Recommendation Systems
Services such as Google Search and YouTube rely on TPUs to process enormous embedding tables, enabling personalised content recommendations for billions of users.
4. Edge AI
The Coral Edge TPU supports low-latency inference in industrial inspection, smart retail, and IoT devices, where real-time responses are essential.
Google TPU vs. Nvidia GPU
Architecture and Specifications
Google Ironwood (TPU v7):
● ASIC with systolic array
● FP8 performance: 4,614 TFLOPs
● HBM3e: 192 GB
● Power consumption: 157 W
● Scales up to 9,216 chips per Superpod
Nvidia Blackwell B200 (2024):
● General-purpose GPU
● FP8 performance: 4,500 TFLOPs
● 8-GPU platform memory: 1,440 GB
● Power consumption: 700 W
Nvidia H200 (2025):
● Hopper-derived architecture
● FP8 performance: ~2,560 TFLOPs
● Memory: 141 GB
● Power: 450 W

Performance and Energy Efficiency Comparison
Ironwood slightly exceeds the B200 in FP8 inference and significantly outperforms the H200. TPU’s architecture leads to stronger energy efficiency and better sustained utilisation for large workloads.
Strengths and Limitations
TPU Strengths:
● Exceptional inference throughput
● Industry-leading energy efficiency
● Excellent scaling capabilities
● Tight integration with Google’s software stack
TPU Limitations:
● Restricted to Google Cloud’s ecosystem
● Less flexible for general-purpose workloads
● Higher barrier to entry for custom operator development
GPU Strengths:
● Universal deployment flexibility
● Mature and robust CUDA ecosystem
● Strong support for diverse model types
GPU Limitations:
● Lower energy efficiency
● Scaling inefficiencies in ultra-large clusters
TPU Market Landscape
IDC reports:
● 2024 global GPU market: ~USD 70 billion
● 2024 global ASIC market: ~USD 14.8 billion
● 2030 projections:
○ GPU market > USD 300 billion
○ ASIC market > USD 80 billion
Shipment Forecasts
● 2024 shipments:
○ GPUs: 8.76 million
○ ASICs: 2.83 million
● 2030 forecasts:
○ GPUs: ~30 million
○ ASICs: ~14 million
This corresponds to CAGR:
● GPUs: ~23%
● ASICs: ~30%
Google TPU leads the ASIC sector with over 70% market share in 2024, generating USD 6–9 billion revenue.
The competitive landscape is intensifying:
● Amazon Trainium: Over 200% shipment growth in 2024
● Meta MTIA v2: Focused on inference, with a training-oriented ASIC expected in 2026
● OpenAI ASIC initiative: Targeting 3 nm/A16-class chips with mass production projected for 2026
This increasingly diverse ecosystem indicates that AI-specific silicon is becoming central to the next generation of global compute infrastructure.

FAQs About Google TPU Chips
1. What is the main difference between a TPU and a GPU?
A TPU is a custom-built ASIC designed specifically for tensor operations used in machine learning, particularly large-scale training and inference. It uses a systolic array architecture to maximise matrix multiplication efficiency. A GPU, by contrast, is a general-purpose parallel processor suited for a wide range of workloads, including graphics rendering, scientific computing and AI. TPUs offer superior energy efficiency and better scaling for very large models, while GPUs provide greater flexibility and broader ecosystem support.
2. Why are TPUs particularly effective for large language models (LLMs)?
LLMs rely heavily on large matrix multiplications, high-dimensional embeddings and Transformer layers—all of which map extremely well to the systolic arrays and high-bandwidth memory design of TPUs. TPUs maintain higher utilisation during long-running training cycles and reduce communication overhead across thousands of chips, making them ideal for trillion-parameter models.
3. Can TPUs be used outside of Google Cloud?
At present, Google TPUs are only accessible through Google Cloud’s managed infrastructure. Unlike GPUs, which can be purchased and deployed on-premise or integrated into custom servers, TPUs are not available for independent hardware purchase. This design ensures tight optimisation between Google’s hardware, software stack and data-centre network fabric.
4. How does Google’s Ironwood TPU compare to Nvidia’s Blackwell GPUs?
Ironwood delivers slightly higher FP8 inference performance than the Nvidia B200 and significantly outperforms the H200. It also consumes far less power—157 W compared with around 700 W for a B200—resulting in better performance-per-watt and improved data-centre efficiency. However, GPUs retain advantages in versatility, custom operator development and ecosystem maturity.
5. What workloads benefit most from TPU acceleration?
TPUs excel at large-scale AI workloads that rely on high-throughput tensor operations, such as:
● training and inference of LLMs
● computer vision models with heavy convolutional layers
● massive embedding table lookups used in recommendation systems
● long-duration or hyperscale distributed training tasks They are less suited to workloads requiring extensive branching logic or highly specialised custom kernels.
6. Are TPUs more cost-effective than GPUs for AI training?
For large language models and other matrix-heavy workloads, TPUs generally offer 40–60% lower overall training cost compared with GPUs. Their higher energy efficiency, reduced hardware overhead and XLA compiler optimisations contribute to lower total cost of ownership. However, for smaller models or workloads requiring bespoke GPU kernels, GPUs may still be more economical.
Nov 27, 2025
- Qualcomm SA8620P: AI Powerhouse for ADAS and Autonomous Driving
- What is NVIDIA DRIVE AGX Thor? A Deep Dive into NVIDIA's Automotive AI Supercomputer
- How to Test a PNP Transistor with a Digital Multimeter: Step-by-Step Guide
- My Deep Dive into NVIDIA DRIVE Orin - The Brain of Autonomous Vehicles
- Top 20 Most Advanced Autonomous Driving Chips 2025
- Tesla's HW5 FSD Chip to Use TSMC’s 3nm Process, Mass Production Expected in 2026
- Power Integrations Introduces 1700V Switch ICs for 800V EV Applications
- Toshiba Launches New In-Vehicle Optical Relay TLX9152M For Electric Vehicle BMS
