Links.
- Note: beyond standard llmsUnderstanding GRPO and New Insights from Reasoning Model Papers
- Note: beyond standard llmsLinear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
- Note: beyond standard llmsAnd How They Stack Up Against Qwen3
- Note: thorough article on deepseek archUnderstanding How DeepSeek's Flagship Open-Weight Models Evolved