Installation¶
Verify:
System requirements¶
- Python 3.11+ (3.12 and 3.13 supported)
- Linux, macOS, and Windows
- No GPU required - base install runs entirely on CPU
- No API keys required - works offline
The base install is ~106 MB and includes per-country PII detectors, policy engine, four jurisdiction packs (NDPA-2023, POPIA, Kenya DPA, Ghana DPA), the Pipeline framework primitive, arche.sign (Ed25519 + did:key + JWS), arche.credentials.sd_jwt (SD-JWT-VC), and the SQLite audit log.
No heavy ML/DPI dependencies are loaded by import arche - enforced by CI (see .github/workflows/arche-core-budget.yml).
Optional extras¶
The base install is the framework. Heavy capabilities ship as opt-in extras so a Nigerian fintech doesn't pay the cost of an OCR stack they'll never use.
pip install arche-core[doc] # docling - PDF, DOCX, PPTX, XLSX, HTML
pip install arche-core[doc-ocr] # adds easyocr for scanned PDFs / images
pip install arche-core[doc-vlm] # adds transformers for VLM backends
Enables Pipeline.process_file(path) for non-text inputs.
pip install arche-core[detect] # GLiNER2-PII via ONNX (~250 MB)
pip install arche-core[presidio] # Microsoft Presidio recognizer plug-in
pip install arche-core[resolve] # Splink + DuckDB for billion-row ER
Soft-PII (detect), Western-only PII corpus baseline (presidio), and probabilistic entity resolution (resolve). All optional - arche's African-context detectors live in the base install.
pip install arche-core[llm] # openai + anthropic SDKs
pip install arche-core[litellm] # LiteLLM proxy router
Powers the LLM-anchored extraction path; not required for any of the headline v0.2 workflows.
Using uv (recommended)¶
uv is a fast Python package manager. Install uv itself first if you haven't:
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Pick the right uv command for the job¶
| You want to... | Use | Result |
|---|---|---|
Add arche-core as a project dependency (writes pyproject.toml + uv.lock) |
uv add 'arche-core[detect]' |
Pinned in your project; collaborators get the same version |
| Install into the active environment ad-hoc | uv pip install 'arche-core[detect]' |
Works inside any active venv / conda env, no project file edits |
| Run once without polluting any environment | uvx --from 'arche-core[detect]' python -m my_script |
Ephemeral env; deps disappear after the process exits |
| Install as a global CLI tool with extras | uv tool install 'arche-core[detect]' |
arche (when v0.2.0a2 ships the CLI back) available everywhere |
| Develop inside this monorepo | uv sync --all-packages |
Resolves the whole workspace including arche-core, arche-graph, arche-live |
Shell quoting
The [extra] syntax must be quoted in zsh, bash, and fish - unquoted brackets are interpreted as glob patterns and either silently expand to nothing or error out.
uv add 'arche-core[detect]' # ok in all shells
uv add "arche-core[detect]" # ok in all shells
uv add arche-core[detect] # zsh: "no matches found"
PowerShell does not need the quotes but accepts them.
Installing with extras¶
The base install (no extras) is the framework only. Each extra layers on optional capabilities. Combine multiple in one bracket pair:
# One extra
uv add 'arche-core[detect]' # GLiNER soft-PII (~250 MB)
uv add 'arche-core[doc]' # docling for PDF / DOCX / PPTX / XLSX / HTML
uv add 'arche-core[presidio]' # Microsoft Presidio recognizer plug-in
uv add 'arche-core[resolve]' # Splink + DuckDB (billion-row ER)
uv add 'arche-core[llm]' # openai + anthropic SDKs
uv add 'arche-core[litellm]' # LiteLLM proxy router
# Several at once - single resolve, one transaction
uv add 'arche-core[detect,doc]'
uv add 'arche-core[detect,doc,llm]'
# The convenience superset - pdf, docx, detect, presidio, resolve, llm
uv add 'arche-core[all]'
# Deprecated v0.1 names still resolve (with a DeprecationWarning):
uv add 'arche-core[gliner]' # use [detect] instead
uv add 'arche-core[pii]' # use [presidio] instead
uv add 'arche-core[splink]' # use [resolve] instead
Workspace development (this monorepo)¶
git clone https://github.com/Plehthore/arche
cd arche
uv sync --all-packages
uv run pytest packages/arche-core/tests
uv sync --all-packages resolves the workspace's full dependency graph including arche-core, arche-graph, arche-adapters, and the api / demo members. Extras defined inside each member's pyproject.toml are part of the resolve.
To add an extra at sync time without editing pyproject:
uv sync --all-packages --extra detect
uv sync --all-packages --extra detect --extra doc
uv sync --all-packages --all-extras # pull every extra of every member
To run a one-off command with the extras available:
uv run --extra detect python -c "from arche.extract import extract; print(extract('Jane at +234 803...', backend='gliner'))"
Picking the right extra for your use case¶
| Use case | Install |
|---|---|
| Resolve names + IDs from text (Pan-African PII Taxonomy) | base install - nothing extra needed |
| Run on PDFs / DOCX / scanned invoices | uv add 'arche-core[doc]' (add [doc-ocr] for image PDFs) |
| Soft-PII / job-titles / orgs / unknown places via NER | uv add 'arche-core[detect]' |
| Western PII corpora baseline (US SSN, IBAN, etc.) | uv add 'arche-core[presidio]' |
| Million-row entity resolution / dedup | uv add 'arche-core[resolve]' |
| LLM-anchored extraction with provider fallback | uv add 'arche-core[llm]' |
| Everything at once (~1 GB) | uv add 'arche-core[all]' |
What's next¶
- Quick Start - five minutes from install to a signed redacted document.
- Sign, share, extract tutorial - the headline verifiability workflow.