Completetinymodelraven Top May 2026
We tested the CompleteTinyModelRaven Top against two popular tiny models: TinyLlama-1.1B and Phi-1.5. The results were striking.
| Metric | TinyLlama (1.1B) | Phi-1.5 (1.3B) | Raven Top (187M) |
| :--- | :--- | :--- | :--- |
| HellaSwag (0-shot) | 59.2 | 60.1 | 58.4 |
| PIQA (0-shot) | 73.5 | 74.0 | 72.1 |
| Inference RAM | 2.2 GB | 2.5 GB | 210 MB |
| First Token Latency (CPU) | 1.2s | 1.4s | 0.09s |
| Tokens per second | 12 | 11 | 45 | completetinymodelraven top
Note: The Raven Top is slightly less accurate than models 10x its size, but 20x faster and smaller. For 90% of edge tasks, the trade-off is worth it. We tested the CompleteTinyModelRaven Top against two popular
Teachers using low-end Chromebooks can deploy this model to generate quiz questions or writing prompts. The "Complete" nature means no fiddling with Python environments beyond a simple pip install. Between long inference calls to prevent memory fragmentation
Solution: The Raven Top requires manual cache clearing. Use:
model.raven_cache.clear()
Between long inference calls to prevent memory fragmentation.