CxLLM-Arch's Core inference backend embeds llama.cpp through this vendor shim. We do not commit the multi-hundred-megabyte source tree; this directory holds:

PINNED.json — exact upstream tag, tarball URL, and SHA-256 CxLLM trusts. Bumping is a single-commit operation: update both version and sha256 together, ideally with a co-located CI run that proves reproducibility.
fetch.sh — downloads the tarball, verifies SHA-256, and extracts to src/llama.cpp-<ver>/ (gitignored). Refuses to run when PINNED.json still has the placeholder all-zeros sha.
build.sh — invokes cmake … --install into dist/cxllm-arch/llama-cpp/, with backend toggles via --backend {cpu,vulkan,cuda,hip,opencl} (multi-flag).

Run via the top-level Makefile:

make cxos-vendor-llama          # fetch + verify
make cxos-vendor-llama-build    # CPU only
make cxos-vendor-llama-build BACKENDS="vulkan cuda"

CxLLM-Arch's Core/CMakeLists.txt consumes the install prefix produced here when CXLLM_USE_LLAMA_CPP=ON (default for production builds).

Trust model: tarballs from upstream are not GPG-signed, so we anchor on SHA-256 in PINNED.json. Bumps are reviewed and reproduced in CI before merging.