Tags · deepseek-v3

A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

671b

946.4K Pulls Updated 2 months ago

5 Tags

5da0e2d4a9e0 • 404GB • 2 months ago

5da0e2d4a9e0 • 404GB • 2 months ago

7770bf5a5ed8 • 1.3TB • 2 months ago

5da0e2d4a9e0 • 404GB • 2 months ago

96061c74c1a5 • 713GB • 2 months ago