qwq:32b-preview-fp16

65.2% on GPQA, showcasing its graduate-level scientific reasoning capabilities
50.0% on AIME, highlighting its strong mathematical problem-solving skills
90.6% on MATH-500, demonstrating exceptional mathematical comprehension across diverse topics
50.0% on LiveCodeBench, validating its robust programming abilities in real-world scenarios.

These results underscore QwQ’s significant advancement in analytical and problem-solving capabilities, particularly in technical domains requiring deep reasoning.

As a preview release, it demonstrates promising analytical abilities while having several important limitations:

Language Mixing and Code-Switching: The model may mix languages or switch between them unexpectedly, affecting response clarity.
Recursive Reasoning Loops: The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.
Safety and Ethical Considerations: The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.
Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

QwQ is a 32B parameter experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities.

![image.png](/assets/mchiang0610/mikey3.1/e6d2ac3a-0d55-4e8b-9f53-ee5e269ed521)

![image.png](/assets/mchiang0610/mikey3.1/b56aaf87-c5bf-4249-be99-28930845e48e)

QwQ demonstrates remarkable performance across these benchmarks:

- **65.2% on GPQA**, showcasing its graduate-level scientific reasoning capabilities 
- **50.0% on AIME**, highlighting its strong mathematical problem-solving skills
- **90.6% on MATH-500**, demonstrating exceptional mathematical comprehension across diverse topics
- **50.0% on LiveCodeBench**, validating its robust programming abilities in real-world scenarios.

These results underscore QwQ’s significant advancement in analytical and problem-solving capabilities, particularly in technical domains requiring deep reasoning.

As a preview release, it demonstrates promising analytical abilities while having several important limitations:

1. **Language Mixing and Code-Switching:** The model may mix languages or switch between them unexpectedly, affecting response clarity.

2. **Recursive Reasoning Loops:** The model may enter circular reasoning patterns, leading to lengthy responses without a conclusive answer.

3. **Safety and Ethical Considerations:** The model requires enhanced safety measures to ensure reliable and secure performance, and users should exercise caution when deploying it.

4. **Performance and Benchmark Limitations:** The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)