cxos-vendor-llama-cpp/README.md

# cxos/vendor/llama-cpp/ — pinned llama.cpp / ggml inference engine

CxLLM-Arch's Core inference backend embeds [llama.cpp](https://github.com/ggerganov/llama.cpp)
through this vendor shim. We do **not** commit the multi-hundred-megabyte
source tree; this directory holds:

* **`PINNED.json`** — exact upstream tag, tarball URL, and SHA-256
  CxLLM trusts. Bumping is a single-commit operation: update both
  `version` and `sha256` together, ideally with a co-located CI run
  that proves reproducibility.
* **`fetch.sh`** — downloads the tarball, verifies SHA-256, and extracts
  to `src/llama.cpp-<ver>/` (gitignored). Refuses to run when
  `PINNED.json` still has the placeholder all-zeros sha.
* **`build.sh`** — invokes `cmake … --install` into
  `dist/cxllm-arch/llama-cpp/`, with backend toggles via
  `--backend {cpu,vulkan,cuda,hip,opencl}` (multi-flag).

Run via the top-level Makefile:

```sh
make cxos-vendor-llama          # fetch + verify
make cxos-vendor-llama-build    # CPU only
make cxos-vendor-llama-build BACKENDS="vulkan cuda"
```

CxLLM-Arch's `Core/CMakeLists.txt` consumes the install prefix produced
here when `CXLLM_USE_LLAMA_CPP=ON` (default for production builds).

Trust model: tarballs from upstream are not GPG-signed, so we anchor on
SHA-256 in PINNED.json. Bumps are reviewed and reproduced in CI before
merging.