Nemotron 3 Nano 30B A3B

Nemotron 3 Nano 30B A3B is a sparse hybrid Mamba-Transformer mixture-of-experts (MoE) model with 30B total parameters but only 3B active per token. It supports a context window of 262.1K tokens with throughput closer to a 3B dense model than a 30B one.

Reasoning

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'nvidia/nemotron-3-nano-30b-a3b',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

More models by NVIDIA

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.3s

225tps

$0.37/M

$1.08/M

Read:$0.12/M

Write:—

—

06/04/2026

256K

0.2s

124tps

$0.15/M

$0.65/M

Read:—

Write:$0.06/M

—

03/18/2026

131K

0.2s

154tps

$0.06/M

$0.23/M

—

08/18/2025

131K

0.2s

$0.20/M

$0.60/M

—

12/01/2024

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Nemotron 3 Nano 30B A3B

More models by NVIDIA