Capture a Flower federated learning run and emit an RO-Crate describing it — a machine-readable, FAIR provenance record of who ran what, with which software, configuration, and data flow, and what came out.
The run is modelled as a Process Run
Crate-style
CreateAction, following the Federated Learning RO-Crate
profile:
- instrument — the software that did the work: Flower, the ML framework(s) (with declared and installed versions), and the aggregation strategy (FedAvg, FedProx, …) with its hyperparameters
- object — the run configuration as
PropertyValueinputs - result — the final aggregated model and a per-round metrics / federation log file
- agent / author / license — provenance: a
Personentity (ORCID used as its@idwhen provided), a root license, and the agent on the action
Final performance metrics are attached to the output model as schema.org
PropertyValues, optionally with semantic identifiers via a user-declared
metric→URI mapping.
⚠️ flwrCrate supports Flower's message-based ServerApp API only (Flower ≥ 1.29:flwr.serverapp,@app.main(), andstrategy.start(...)returning aResult).Apps written against the classic API (
server_fn,ServerAppComponents,flwr.server.strategy) are not supported.
Check an app's compatibility in seconds:
grep -rn "flwr.serverapp\|strategy.start\|server_fn\|ServerAppComponents" <app>/<pkg>/server_app.pyflwr.serverappandstrategy.start→ ✅ compatibleserver_fnorServerAppComponents→ ❌ classic API, not supported (yet)
Other requirements: Python ≥ 3.9, rocrate ≥ 0.13 (tomli on Python < 3.11).
git clone https://github.com/faizollah/flwrCrate.git
pip install -e flwrCrateInstall it into the same environment that runs your Flower app (the one
flwr run uses).
Integration is three touchpoints in your server_app.py: a context manager,
one wrap_evaluate(...), and one record_result(...).
@app.main()
def main(grid: Grid, context: Context) -> None:
...
strategy = FedAvg(fraction_evaluate=fraction_evaluate)
result = strategy.start(
grid=grid,
initial_arrays=arrays,
train_config=ConfigRecord({"lr": lr}),
num_rounds=num_rounds,
evaluate_fn=global_evaluate,
)
if context.run_config["save-model"]:
torch.save(result.arrays.to_torch_state_dict(), "final_model.pt")from flwrcrate import FLCrateTracker
@app.main()
def main(grid: Grid, context: Context) -> None:
...
strategy = FedAvg(fraction_evaluate=fraction_evaluate)
with FLCrateTracker(
context, strategy,
output_dir="/absolute/path/to/your-app/fl_crate_out", # absolute!
pyproject_path="/absolute/path/to/your-app/pyproject.toml", # absolute!
app_name="My federated run",
author={"name": "Your Name", "orcid": "https://orcid.org/0000-0000-0000-0000"},
license="https://spdx.org/licenses/MIT.html",
) as tracker:
result = strategy.start(
grid=grid,
initial_arrays=arrays,
train_config=ConfigRecord({"lr": lr}),
num_rounds=num_rounds,
evaluate_fn=tracker.wrap_evaluate(global_evaluate), # or None
)
torch.save(result.arrays.to_torch_state_dict(), "final_model.pt")
tracker.record_result(result, model_path="final_model.pt")
# On clean exit the RO-Crate is written to <output_dir>/ro-crate/That is the whole integration. On exit — including on failure — the crate
is built; a failed run is recorded with FailedActionStatus and the error
message, so partial runs are still documented.
If your app passes no evaluate_fn to strategy.start(...) (common for
client-side-only evaluation), skip wrap_evaluate entirely. Per-round metrics
are then read from the Result object's client-side aggregates
(train_metrics_clientapp / evaluate_metrics_clientapp) by
record_result(...):
with FLCrateTracker(context, strategy, ...) as tracker:
result = strategy.start(grid=grid, initial_arrays=arrays,
train_config=..., num_rounds=...)
tracker.record_result(result) # don't forget this line!Don't forget
record_result(result). Without it the crate is still written, but it will contain no performance metrics, no end time, and no model — only the static capture from the start of the run.
flwr run installs your app to ~/.flwr/apps/<hash>/ and executes the
ServerApp from there (in simulation, inside a Ray worker that does not inherit
your shell environment). Relative paths therefore resolve against the
installed copy, not your project: a relative output_dir "loses" the crate in
~/.flwr/apps/..., and a relative pyproject_path fails to find your
config — silently dropping the framework/dependency capture. Always pass both
as absolute paths.
| Parameter | Required | Description |
|---|---|---|
context |
yes | The Flower Context (gives access to run_config) |
strategy |
yes | Your strategy instance (FedAvg, FedProx, …) — class, module, and hyperparameters are captured |
output_dir |
recommended | Where to write outputs. Use an absolute path. Default: ./fl_crate_out |
pyproject_path |
recommended | Path to your app's pyproject.toml. Use an absolute path. Default: "pyproject.toml" |
app_name |
no | Human-readable name for the crate's root dataset |
author |
no | "Name" or {"name": ..., "orcid": ..., "affiliation": ...} — becomes a Person entity (ORCID as @id) |
license |
no | License for the crate root, e.g. an SPDX URL "https://spdx.org/licenses/MIT.html" |
agent |
no | Who executed the run, same format as author. Defaults to the author |
Metric names are taken verbatim from your MetricRecords, so any analysis
(classification, regression, anomaly detection, …) works without hardcoded
names. To attach semantic identifiers, declare a mapping in your app's
pyproject.toml:
[tool.flwrcrate.metric-uris]
accuracy = "https://schema.org/Accuracy"
rmse = "http://www.wikidata.org/entity/Q1374913"Mapped metrics get a propertyID; unmapped metrics still emit a plain
PropertyValue and log a warning.
<output_dir>/
├── captured_metadata.json # full capture: config, frameworks, strategy, timing, final metrics
├── metrics_log.json # per-round metrics + federation details (referenced by the crate)
└── ro-crate/
├── ro-crate-metadata.json
├── final_model.pt # if a model_path was given
└── metrics_log.json
The crate's @graph contains, linked together:
| Entity | Type | Content |
|---|---|---|
./ |
Dataset |
Root: name, author, license, conformsTo the FL profile, mentions the run |
#fl-run |
CreateAction |
The run: agent, startTime/endTime, actionStatus, instrument/object/result |
#flower |
SoftwareApplication |
Flower with its installed version |
#framework-* |
SoftwareApplication |
Every declared dependency (minus an infrastructure deny-list): softwareRequirements = the declared version spec, softwareVersion = the actually-installed version |
#fl-strategy |
SoftwareApplication |
The aggregation strategy with its hyperparameters as PropertyValues |
#param-* |
PropertyValue |
Run configuration inputs (the action's object) |
#metric-* |
PropertyValue |
Final-round metrics, attached to the output model (with propertyID when mapped) |
final_model.pt, metrics_log.json |
File |
The run's outputs (the action's result) |
Per-round metric history and federation details (participant counts,
federation options) live in metrics_log.json, which the crate references as
a run output — keeping the metadata file lean while preserving the full time
series.
Crates validate against the official
rocrate-validator. Note:
ro-crate-py ≥ 0.15 writes RO-Crate 1.2, while rocrate-validator 0.10
ships only a 1.1 profile, producing a single false-positive MUST 5.3
(conformsTo version). All other REQUIRED checks pass.
| App | Strategy | Stack | Notes |
|---|---|---|---|
@flwrlabs/quickstart-pytorch |
FedAvg | PyTorch, TorchVision | server- and client-side metrics |
@flwrlabs/quickstart-sklearn |
FedAvg | scikit-learn | no server-side evaluate fn |
@chongshenng/fed-engines |
FedProx | PyTorch, HF datasets | anomaly-detection metrics (balanced accuracy, per-class recall), client-side only |
👉 See a real generated crate before you run anything:
examples/quickstart-pytorch/ contains the
integrated server_app.py and the
ro-crate-metadata.json
it produced.
- Classic API not supported. Apps on
server_fn/ServerAppComponents(still common in published Flower apps) need porting to the message API first. Supporting the classic API is an open design question. - Participant count under recent Flower.
num-supernodesmoved from the app'spyproject.tomlto~/.flwr/config.toml(the SuperLink connection config), so participant-count capture from pyproject returns only run-config-derived hints. Reading the active Flower config is a planned enhancement. - Dataset identifiers, ethics/governance lineage are not reachable from Flower and must be supplied by the user (e.g. via config) — relevant for regulated domains.
- Per-client metadata is out of scope by design. Flower exposes only per-round aggregates, which aligns with FL's privacy premise.
- 0.4.0 — renamed to flwrCrate (dist/import
flwrcrate); config key is now[tool.flwrcrate.metric-uris](the legacy[tool.fedacrate.*]key is still read). - 0.3.0 — run discoverable from the crate root (
mentions); aggregation strategy emitted as aSoftwareApplicationwith hyperparameters; per-round metrics + federation details inmetrics_log.json; all declared dependencies captured (deny-list instead of allow-list); author/license/agent provenance scaffolding. - 0.2.0 — initial working version.
pip install -e ".[dev]" # installs pytest
pytest # run the full unit + integration suiteThe suite is split into fast unit tests (dependency parsing, metric
handling, slug/person helpers, crate assembly) and integration tests that
drive the whole FLCrateTracker lifecycle and assert a complete, correct
ro-crate-metadata.json — without needing Flower, Ray, or any ML framework
installed. CI runs it on Python 3.10–3.12 (see the badge above).
Developed within the ELIXIR Fed-A-Crate project (WP6) toward milestone M6.7, building on the Federated Learning RO-Crate profile by the eScienceLab (The University of Manchester).