Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
.venv
.venv
.DS_Store
134 changes: 0 additions & 134 deletions README-test-uv.md

This file was deleted.

176 changes: 115 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,106 +1,160 @@
## Configuring Python in CBS Remote Access (RA)
## What this is for

**Python is not installed by default at CBS RA (yet).** To activate Python, contact the CBS microdata team at [`microdata@cbs.nl`](mailto:microdata@cbs.nl).
CBS Remote Access (RA) only lets you use Python packages that CBS has approved and installed in advance. To get a package approved, you submit a file pip requirements.txt file listing exactly which packages (and versions) you need, and CBS installs them for you.

### Default Python Packages
This repository helps you build that file correctly, without needing to understand Python packaging in depth. You will:

By default, some packages are available in Python at CBS RA, such as `pandas`, `pyreadstat` or `matplotlib`
1. Write down which packages you want, in a simple text file (`requirements.in`).
2. Run two commands that figure out the exact versions that work together.
3. Get a ready-to-send file (`environment0000.txt`) to email to CBS.
4. Use the same setup on your own computer, outside of CBS RA, while you work.

If you require additional packages or specific versions, follow the steps below to create and submit your own Python environment.
You only need to follow these steps once per project (and again whenever you want to add a new package).

---

### Creating a Custom Python Environment
## Basic configuration

Follow these instructions to set up and submit a customized Python environment. You need to use a **Windows** computer.
### Step 0: Install `uv`

#### Step 1: Check Existing Environment
`uv` is the tool that does the heavy lifting (figuring out compatible package
versions). Install it once, following the official instructions:
https://docs.astral.sh/uv/getting-started/installation/

- Check if `environment0000.txt` (replace `0000` with your actual project number) already contains the required packages and suitable versions.
- **If yes:** Send this file directly to CBS.
- **If no:** Continue to Step 2.
If you've never used a terminal before: a terminal is just a window where you
type commands instead of clicking buttons. On Windows, open "PowerShell" or
"Command Prompt"; on Mac, open "Terminal" (both are pre-installed). The
installation instructions above include a single command to paste in and run.

#### Step 2: Create the Environment (Windows + Conda)
### Step 1: Download this repository to your computer

Install conda locally (only needed if you do not already have Conda installed):
- Follow the official Conda installation instructions [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation).
- If you're unfamiliar with command-line tools, consider installing [Anaconda](https://www.anaconda.com/products/individual) instead.
Important: save it to a normal folder on your computer's hard drive — **not** a
folder that syncs to the cloud (OneDrive, pCloud Drive, Dropbox, Google Drive, etc).
Cloud-sync folders can silently break the setup in step 3.

On your local Windows machine:
If you're comfortable with git:

```sh
conda create -n 0000 python
conda activate 0000
conda install pip
pip install package_name
git clone <repo-url>
cd cbs_python
```

Replace `package_name` with the packages you need (e.g., `pip install numpy`). If you want to install all the packages in the requirements.txt file in this repository, use `pip install -r requirements.txt`
Otherwise, download the repository as a ZIP from its webpage and unzip it into a
local folder.

**Note:** If using Jupyter Notebook or Spyder, install these explicitly, e.g.:
### Step 2: Say which packages you want

```sh
pip install jupyter spyder
```

#### Step 3: Export the Environment
Open `requirements.in` in a text editor. It's a plain list of package names, one
per line, already organized into groups (data handling, visualization, etc.), with
a short comment next to each one explaining what it's for.

Export the environment into a requirements file:
- To add a package, add a new line with its name.
- To remove one, delete its line (or put a `#` in front of it to keep it for later).

```sh
pip freeze > C:\temp\environment0000.txt
```

Check `environment0000.txt` for local paths (`file://`). If found, regenerate using:

```sh
pip list --format=freeze > C:\temp\environment0000.txt
```
You don't need to write version numbers — the next step figures those out for you.

#### Step 4: Verify Environment
### Step 3: Let `uv` work out the exact versions

Validate your environment by removing and recreating it:
Open a terminal in the repository folder and run:

```sh
conda remove -n 0000 --all
conda create -n 0000
conda activate 0000
conda install pip
pip install -r C:\temp\environment0000.txt
uv init --bare # only needed the very first time
uv add --bounds exact -r requirements.in
```

Test thoroughly before submission by running python and importing your packages one by one.
This checks that all the packages you listed actually work together, and writes
the result into two files that you don't need to edit by hand:

#### Step 5: Submit Your Environment
- `pyproject.toml` — the exact version of each package you asked for in
`requirements.in` (e.g. `pandas==2.3.3`), so the choice is recorded and won't
silently change later.
- `uv.lock` — every other package those packages depend on internally, also
pinned to an exact version, so the same complete set can be reproduced
identically on any computer.

Send your verified `environment0000.txt` (replace 0000 by your project number) to CBS via email.
Together they're the "recipe" that `uv run` (step 5) and the CBS export (step 4)
both read from.

If this command fails with an error, see "Advanced configuration" below — most
failures come from specific packages (PyTorch Geometric, flash-attn, etc.) that
need a bit of extra setup in `pyproject.toml`.

### Step 4: Create the file to send to CBS

---

## Using Python at CBS RA
CBS RA runs Windows, so generate a Windows-specific version of the package list:

We recommend to use Python through Visual Studio Code (VS Code), installed by default:
```sh
uv pip compile pyproject.toml --python-version 3.12 --python-platform windows --no-annotate --no-header -o environment0000.txt
```

- In VS Code, select the Python interpreter in the bottom-right corner of the editor.
Rename `environment0000.txt` so `0000` matches your project number. This file can
be installed with plain `pip` (no `uv` needed), which is what CBS RA will do.
Email it to CBS as described in the "Creating a Custom Python Environment" section
above.

You could also use Python through Jupyter in RA, for that, open an Anaconda terminal in the RA and run:
### Step 5: Use the same packages on your own computer

```sh
conda activate 0000
jupyter notebook --notebook-dir=H:
uv run jupyterlab
```

This opens Jupyter in your shared directory (`H:`).
`uv run <command>` runs any command (Jupyter, a script, etc.) using exactly the
packages you listed, installing anything missing automatically. This way, what you
test on your own computer matches what you'll have access to in CBS RA.

---

## Contact
## Advanced configuration

A few packages need extra settings in `pyproject.toml`, added by hand, before
`uv add` / `uv lock` / `uv pip compile` will work. These settings only need to
be added once — after that, steps 3 and 4 above work normally.

This documentation is maintained by the [ODISSEI Social Data Science (SoDa)](https://odissei-data.nl/nl/soda/) team.
### Packages installed from a URL instead of the normal package index

For technical questions or suggestions:
Some packages aren't on the normal package index and need to be installed from
a URL instead — for example the PyTorch Geometric (PyG) wheels used by
`pyg-lib`, `torch-sparse`, etc. Add the URL under `[tool.uv]` (used by `uv add`
and `uv lock`) and under `[tool.uv.pip]` (used by `uv pip compile`):

```toml
[tool.uv.pip]
find-links = ["https://data.pyg.org/whl/torch-2.9.0+cpu.html"]
emit-find-links = true
```

- `[tool.uv] find-links` makes `uv add`/`uv lock` (step 3) look at that URL.
- `[tool.uv.pip] find-links` makes `uv pip compile` (step 4) look at that URL.
Both are needed — they're read by different commands.
- `emit-find-links = true` makes `uv pip compile` write the `--find-links`
line at the top of the generated `environment0000.txt`. That way, whoever
installs that file with plain `pip install -r environment0000.txt` (e.g.
CBS RA) automatically knows where to find these packages too — without it,
`pip` would fail to find them.

If you don't add this, `uv` may report that no matching version exists for a
package, even though it's listed correctly in `requirements.in`.

### Packages that fail to build with "ModuleNotFoundError: No module named 'torch'"

Some packages (e.g. `torch-cluster`, `torch-scatter`, `torch-sparse`,
`torch-spline-conv`, `flash-attn`) are compiled from source, and their build
script imports `torch` without declaring it as something it needs in order to
build — so `uv` builds them in a clean environment that doesn't have `torch`
yet, and the build fails with `ModuleNotFoundError: No module named 'torch'`.

The error message includes the fix. Add the affected package(s) under
`[tool.uv.extra-build-dependencies]` in `pyproject.toml`:

```toml
[tool.uv.extra-build-dependencies]
flash-attn = ["torch"]
torch-sparse = ["torch"]
torch-scatter = ["torch"]
torch-cluster = ["torch"]
torch-spline-conv = ["torch"]
```

- File an issue in the project's issue tracker, or
- Contact [Javier Garcia-Bernardo](https://github.com/jgarciab).
This tells `uv` to install `torch` into the temporary build environment first,
so the package's build script can find it. Without this, both `uv add` and
`uv pip compile` fail with the same error.
Loading