Ollama (Download and run large language models locally) Ollama is an application which lets you run large language models offline. A list of models are available on ollama.com/library. Optional dependencies like CUDA or ROCm will be automatically detected during compilation of ollama libraries, if present. CUDA=ON: building with CUDA, default is CUDA=OFF. ROCM=ON: building with ROCm, default is ROCM=OFF. Building ollama server and client requires network and development/google-go-lang