32 lines
1.3 KiB
Markdown
32 lines
1.3 KiB
Markdown
# cxos/vendor/llama-cpp/ — pinned llama.cpp / ggml inference engine
|
|
|
|
CxLLM-Arch's Core inference backend embeds [llama.cpp](https://github.com/ggerganov/llama.cpp)
|
|
through this vendor shim. We do **not** commit the multi-hundred-megabyte
|
|
source tree; this directory holds:
|
|
|
|
* **`PINNED.json`** — exact upstream tag, tarball URL, and SHA-256
|
|
CxLLM trusts. Bumping is a single-commit operation: update both
|
|
`version` and `sha256` together, ideally with a co-located CI run
|
|
that proves reproducibility.
|
|
* **`fetch.sh`** — downloads the tarball, verifies SHA-256, and extracts
|
|
to `src/llama.cpp-<ver>/` (gitignored). Refuses to run when
|
|
`PINNED.json` still has the placeholder all-zeros sha.
|
|
* **`build.sh`** — invokes `cmake … --install` into
|
|
`dist/cxllm-arch/llama-cpp/`, with backend toggles via
|
|
`--backend {cpu,vulkan,cuda,hip,opencl}` (multi-flag).
|
|
|
|
Run via the top-level Makefile:
|
|
|
|
```sh
|
|
make cxos-vendor-llama # fetch + verify
|
|
make cxos-vendor-llama-build # CPU only
|
|
make cxos-vendor-llama-build BACKENDS="vulkan cuda"
|
|
```
|
|
|
|
CxLLM-Arch's `Core/CMakeLists.txt` consumes the install prefix produced
|
|
here when `CXLLM_USE_LLAMA_CPP=ON` (default for production builds).
|
|
|
|
Trust model: tarballs from upstream are not GPG-signed, so we anchor on
|
|
SHA-256 in PINNED.json. Bumps are reviewed and reproduced in CI before
|
|
merging.
|