1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git`
2. Then (assuming you already have xcode build tools installed):
cd llama.cpp cmake -B build -DGGML_METAL=ON cmake --build build --config Release -j$(sysctl -n hw.logicalcpu)
./build/bin/llama-server -m ~/Downloads/Bonsai-8B.gguf --port 80 --host 0.0.0.0 --ctx-size 0 --parallel 4 --flash-attn on --no-perf --log-colors on --api-key some_api_key_string
And this is when Im serving zero prompts.. just loaded the model (using llama-server).
1. Clone their forked repo: `git clone https://github.com/PrismML-Eng/llama.cpp.git`
2. Then (assuming you already have xcode build tools installed):
3. Finally, run it with (you can adjust arguments): Model was first downloaded from: https://huggingface.co/prism-ml/Bonsai-8B-gguf/tree/main