Llama.cpp – Musings on AI

Llama 3 on Web UI

Dracones April 22, 2024

When doing inference with Llama 3 Instruct on Text Generation Web UI, up front you can get pretty decent inference speeds on a the M1 Mac Ultra, even with a…

Llama.cpp Mac Nvidia

The Case for Mac: Power Usage

Dracones April 22, 2024

I have two dual Nvidia 3090 Linux servers for inference and they’ve worked very well for running large language models. 48GB of VRAM will load models up to 70B at…