Falcon3 represents TII’s latest advancement in efficient language models under 10B parameters, focused on enhancing science, math, and code capabilities while maintaining training efficiency.

Key Features

Four sizes: 1B, 3B, 7B, 10B
Depth up-scaling technique used to create 10B model from 7B
Knowledge distillation for smaller models (1B, 3B)

Performance Highlights

falcon3:1b outperforms smollm2:1.7b, matches gemma2:2b
falcon3:10b achieves SOTA in under-13B category
Extended context length up to 32K tokens (8K for 1B model)

References

Hugging Face