The most salient thing about these models is that they're non-reasoning models. ...

woadwarrior01 36 days ago | parent | context | favorite | on: Granite 4.1: IBM's 8B Model Matching 32B MoE

The most salient thing about these models is that they're non-reasoning models. This makes then very token efficient and particularly well suited for local inference where decoding is usually slower than with datacenter GPUs.

Link to HF collection: https://huggingface.co/collections/ibm-granite/granite-41-la...

lostmsu 35 days ago [–]

Probably worse than Gemma 4 or Qwen 3.6 with thinking off.