Completetinymodelraven Top May 2026

We tested the CompleteTinyModelRaven Top against two popular tiny models: TinyLlama-1.1B and Phi-1.5. The results were striking.

| Metric | TinyLlama (1.1B) | Phi-1.5 (1.3B) | Raven Top (187M) | | :--- | :--- | :--- | :--- | | HellaSwag (0-shot) | 59.2 | 60.1 | 58.4 | | PIQA (0-shot) | 73.5 | 74.0 | 72.1 | | Inference RAM | 2.2 GB | 2.5 GB | 210 MB | | First Token Latency (CPU) | 1.2s | 1.4s | 0.09s | | Tokens per second | 12 | 11 | 45 | completetinymodelraven top

Note: The Raven Top is slightly less accurate than models 10x its size, but 20x faster and smaller. For 90% of edge tasks, the trade-off is worth it. We tested the CompleteTinyModelRaven Top against two popular

Teachers using low-end Chromebooks can deploy this model to generate quiz questions or writing prompts. The "Complete" nature means no fiddling with Python environments beyond a simple pip install. Between long inference calls to prevent memory fragmentation

Solution: The Raven Top requires manual cache clearing. Use:

model.raven_cache.clear()

Between long inference calls to prevent memory fragmentation.

Axiory uses cookies to improve your browsing experience. You can click Accept or continue browsing to consent to cookies usage. Please read our Cookie Policy to learn more.