OTA-over-LoRa: signed full/delta firmware updates (.mota) for ESP32 & nRF52 🤖🤖#2864
OTA-over-LoRa: signed full/delta firmware updates (.mota) for ESP32 & nRF52 🤖🤖#2864vk496 wants to merge 24 commits into
Conversation
… tooling (.mota reference, EndF/target/vector generators)
…pect/serve/keygen)
…eeder/console) Extends the USB-serial folder relay to WiFi so an ESP32 companion can both serve .mota and be operated headlessly: - motatool `serve --tcp <host[:port]>`: a TcpTransport sibling of the serial transport (default port 5001). SeederCore/Folder are reused unchanged — the COUNT/DESCRIBE/READ protocol is transport-agnostic. - ESP32 companion: a dedicated OTA seeder port (5001) for `serve --tcp`, plus an OTA text console on 5002 (`nc <ip> 5002` -> `ota status|ls|announce|...`, the same handle_ota_command CLI serial nodes have). Both run alongside the phone-app port (5000); all three coexist. - WiFi.setSleep(false): ESP32 STA mode's modem power-save periodically sleeps the modem/CPU and stalls the SX1262 SPI+DIO servicing, leaving LoRa deaf while WiFi is associated. Disabling it restores the radio (HW-validated: a V3 WiFi companion is then discovered over LoRa and discovers its peers). - docs: serving .mota over WiFi (protocol §10.2 + user guide).
Discovery was hard to use — a node only advertised at boot, so a peer that ran `ota ls` minutes later saw "no neighbours". - First self-advert ~8s after boot, then a short burst (~1 min), then re-announce at a random 3-10 min interval so a long-running node stays discoverable without all nodes beaconing in lockstep. The beacon is tiny, lowest-priority and duty-gated, so a few-minute cadence is cheap. - `ota ls` now shows the raw target id (`hw XXXXXXXX`) when the env name is not in this build's OtaTargets.h table, instead of a blank "[other hw]".
Switch these existing variants from `nrf52_base` to `rak4631_hw` (which adds ENABLE_OTA + the in-place flash store + the EndF post-build hook), so they get OTA-over-LoRa. Scope is limited to variants already covered by the Adafruit_nRF52_Bootloader_OTAFIX in-place apply — no new variants are added. OtaTargets.h is regenerated to include their target ids.
|
Nice :-). I see -- not knowing programming or the protocol in depth -- some issues:
Does it really go multi-hop, or only zero-hop? (If multi-hop, it should really be thought how to get network burden down and to stop that just any remote evil person can flush the whole network with evil updates. I opt for zero-hop.) I would further opt to advertise only manuall by default. I tried to follow your links to |
Thanks, links fixed. By default I set max 3 hops, but I agree to actually change that dynamically (will merge in the following commits) Advertising manually would prevent propagation of the firmware. I don't know what config should be by default, but beacons are intended to be small and cheap. Advertising one beacon (20 bytes IIRC) every 3-10min randomly should not have big impact in the mesh |
|
Advertising manually would prevent propagation of the firmware. I
don't know what config should be by default, but beacons are intended
to be small and cheap. Advertising one beacon (20 bytes IIRC) every
3-10min randomly should not have big impact in the mesh
Ehm, many meshes did cut down on forwarding adverts because flood
routing of them did overload. That was the reason the EVO firmware was
born.
Every 3..10 min some flood message -- thats way too much.
Weekly I think is a good option. We do not have so frequent firmware
updates to deploy.
So, you want to make an auto-update function by default, not a manual
update over LoRa, as other OTAs are also manual?
|
|
I don't mind to follow what community agrees. You can always change the default behaviour. The only downside about every week is that you need to wait a week to know what firmwares are available around surrounding ( Yes, the default behaviour is supported. Your target node must discover the firmware from your relay node and start pulling it. The problem is that you need days (or even weeks in heavy traffic scenarios) to pull it. Just 1 device is fine, but if you have 7 nodes, that's a headache. That's why the idea of DHT/P2P firmware sharing. All node can share its own full firmware installed in the flash + mOTA (if available). Special nodes like ESP32 with WiFi can rely a folder from a host full of mOTA, so they don't need to store them in order to share them. The goal is: if you have N same nodes, push once and let the firmware spread |
|
Hey vk496, noticed you built the .mota OTA-over-LoRa protocol with merkle tree verification and delta updates across ESP32 and nRF52, that's rlly rlly cool, especially the DHT style propagation for nodes that are hard to physically reach. We're building an open core OTA platform with a similar goal, pushing updates to fleets you can't easily access, though our approach goes through WiFi/cellular rather than mesh. Would you be down for a quick call to talk through how you implemented the delta updates and merkle verification? Always good to compare notes with someone solving the same class of problem from a different angle. |
…ved-set change The discovery beacon previously re-announced at a random 3-10 min interval. Replace that with a fixed, user-configurable cadence: - OtaManager::advert_mins() — re-advertise every N minutes after the boot burst; 0 disables periodic re-advertise (boot burst only). Default 24h. - Persisted in NodePrefs (CommonCLI) and runtime-tunable: `ota config advert <minutes>` (0..10080; 0 = off), and shown in `ota config`. - When periodic advert is disabled, the scheduler still re-checks the config on a slow timer, so a later `ota config advert <mins>` takes effect live. Also advertise immediately whenever the served set changes — when a motatool folder is attached to / detached from the ESP32 WiFi seeder — so peers learn about newly-available firmware without waiting for the next interval (the `ota folder` serial path already announced on attach). Docs: protocol beacon-cadence note + user-guide `ota config advert`.
Hi. If you want quicker interaction, feel free to join the Discord thread: https://discord.com/channels/1495203904898728149/1518163443750797332 |
Hey, thanks for sharing, the link doesn't seem to be working for me, getting an invalid invite error. Mind sending a fresh one? |
https://discord.gg/9sRhx5wvJ (OTA over LoRa in the |
…rd RAM guard Bound OTA-over-LoRa duty cycle across repeaters with one runtime-tunable, persisted limit (OtaManager::max_hops, `ota config hops <0..8>`, default 3): - Accept-gate: a node ignores OTA that arrived from more than max_hops hops away (neither processes nor relays it). 0 = direct only. - Forward-cap: relay a flood only while still under max_hops, appending this node's path-hash (hop count increments like the mesh flood routing). - RAM guard: relay an OTA flood only while more than OTA_FWD_MIN_FREE packet- pool slots stay free, so heavy OTA (best-effort, lowest-priority) can never monopolise the shared pool and starve real traffic — a dropped relay is re-requested by the source. Persisted in NodePrefs (CommonCLI) and shown in `ota config`. Docs updated.
|
Can this be paired with the temp radio command? Faster tx speeds are possible then |
What do you mean? Any link or reference? |
|
https://docs.meshcore.io/cli_commands/#change-the-radio-parameters-for-a-set-duration So you move the radio to a different freq and then can go nuts with sending lora packets. |
…e handlers Add the device->host WRITE half of the mota-seeder link so a device can capture a .mota it is fetching off-mesh into the host folder (the same --dir used for serving), stored as <mid>.mota — e.g. to grab an exact copy of a device's firmware to build a delta against firmware you don't otherwise have. - MotaSeederProto: OP_STAT / OP_BEGIN / OP_WRITE / OP_SREAD / OP_FIN (keyed by mid). Resume needs no host bookkeeping: BEGIN 0xFF-fills the file; on reconnect the device SREADs the leaves and re-requests only missing blocks (same as flash resume). Partial = <midhex>.mota.part, published to <midhex>.mota on FIN. - motatool `serve --dir <folder>` now also handles the storage ops on the same folder/connection (SeederCore gains the store dir; serve_loop frames the variable-length WRITE). Host round-trip test added. Firmware side (FolderMotaStore + `ota pull <#> <dest>` + pause/resume) follows.
I don't think this blocks somehow mOTA. The protocol doesn't understand radio. Just if nodes are available or not. Is up to the user IMHO |
…firmware side FolderMotaStore: an OtaStore that streams an in-transit .mota straight to the host folder over the seeder link (OP_BEGIN/WRITE/SREAD/STAT/FIN) instead of RAM/flash — the device holds no local staging for it. Wired as a selectable pull destination: - OtaContext gains a folder_dest (registered by the app while a motatool `serve` link is up) + its human id (e.g. "tcp 192.168.4.5"). - `ota pull <#> <dest>` now takes a MANDATORY destination; `ota pull <#>` with none lists the choices (flash always; folder + its link id when connected). - The ESP32 WiFi seeder registers/clears the folder destination on connect/close — the same connection both serves the folder and accepts pulls into it. Captures an exact copy of a device's firmware over the mesh to build a delta against. Pause/resume on a mid-pull disconnect follows.
If an `ota pull <#> folder` block-write fails mid-transfer (the motatool seeder link dropped), the fetch enters a new PAUSED state instead of failing or falling back to RAM/flash: progress stays on the host, the manager stops requesting, and loop()/stall-detection leave it untouched (it waits indefinitely). On reconnect the ESP32 seeder re-registers the folder destination and, if PAUSED, calls resumeStaged(): OP_STAT re-attaches the host's partial, the leaves are re-read, and only the missing blocks are re-requested (a brand-new/absent file restarts from 0). `ota status` reports the paused state.
Document `ota pull <#> <dest>` (flash|folder, destination mandatory), the folder pull that captures a device's exact firmware to the host as <mid>.mota (for delta-building), the paused/resume-on-reconnect behaviour, and that a `motatool serve` link doubles as the pull-to-folder store. (protocol §10, user guide, motatool README)
|
Giving the admin a hint to switch to a different frequency would be a good reminder. |
Yes, but why? Being alone is not the intention of this OTA. Changing radio to have more speed is out of scope (you can go to node and flash it through BLE/Wifi/USB if speed matters) It's not like I don't want doing that. I just don't understand your usecase |
Route both begin() and reopen() through a pure mota_nrf52_stage_plan() that bottom-aligns a received .mota below the filesystem region (trailer ending at FS_START, where the bootloader scans) and refuses any size that would overrun that ceiling or overlap the running image — the single place the FS/prefs-safe bounds are enforced. Add compile-time static_asserts pinning the flash-layout ordering (app < staging ceiling < bootloader; in-place workspace ends at/below the ceiling) so an inconsistent constant fails the build instead of silently corrupting user prefs. Cover the pure planner in test/test_ota/test_ota_flashplan.cpp.
…-diff capture Protocol: OTA_REQ and OTA_GET_MANIFEST now carry a want_mask bitmap, so a fetcher asks for specific fragments (all on the first request, only the still-missing holes on a retry) instead of a whole block/manifest window. The WANT_MANIFEST and FETCHING retry loops re-ask only on a no-progress tick, so a lost fragment costs one fragment to recover and a re-request can't collide with an in-flight multi-fragment burst on half-duplex radios. Warm-start (motatool folder-capture only): new OTA_GET_LEAVES/OTA_LEAVES let `ota pull <#> folder validate` bulk-fetch the target's merkle leaves, authenticate them against the manifest root, diff a similar seed build already staged in the destination, and pull DATA over LoRa only for the blocks that differ. Leaves are bitmap-fragmented + no-progress retry-gated like the manifest, capped at OTA_DIFF_MAX_BLOCKS so the want_mask stays a fixed uint16, and the diff runs a bounded batch per loop tick so it never starves the mesh loop. motatool `serve --seed <build.mota>` injects the seed payload into the destination .part on OP_BEGIN.
|
In a congested mesh, sending the firmware on the same frequency is a no go. Use case is a tree repeater at the top of the mountain in the snow. |
|
I see. When do you suggest that reminder? I feel that it's just a note/idea that can be in the docs instead of the firmware |
|
First question is if there is a way to push the update and not use the torrent scheme. If there's a command like Use Case: |
|
The idea behind DHT is to not interrupt normal operations. Some nodes (like one in the mountain ) could be the only node capable of doing hops. You assume that switching to a different radio is better because you will be alone. But if everyone else did this, you would have the same issue. And assuming that you need to do it to just 1 node and not 7. The way I solve this is by making OTA a non priority traffic. And ideally, upgradable without user intervention (you flash your home node and just let it propagate to your others nodes). Anyway, with the current PR, you can do what you want. You can switch your remote and local node to a different radio and start polling the mOTA from the cli. Worst case, is that it will not finish on time and some blocks remains to pull. Switch again, finish poll and apply update |
Add an admin-only `ota stats` reply: one dense line with the running firmware's merkle content-id (mid) AND its EndF body_hash (only body_hash was surfaced before), version, served-set count + digest, live fetch state/progress, and policy — snprintf-bounded to the 160-byte reply. The remote CLI path is already admin-gated, so it's admin-only over the mesh (send it from the app's repeater command screen, or the WiFi/serial OTA console). A new servedDigest() accessor exposes the beacon set-digest. HW-verified on RAK4631: reports the fw identity + live fetch state (incl. during a warm-start capture).
The EndF post-build stamper read the version via _cppdef('FIRMWARE_VERSION'), which only sees -D build flags — but MeshCore authors FIRMWARE_VERSION as a header #define in each example, so the stamper found nothing and defaulted fw_version to 0. Every .mota / OTA advert then reported v0.0.0 while `ver` (which reads the header at runtime) showed the real version.
Read the header value as a fallback WITHOUT moving where MeshCore authors it: honor a -D override first (the header's #ifndef guard invites it, for release builds), else read the #define from the example this env builds (via build_src_filter), falling back to the repo-wide value when unambiguous. Purely additive to our own EndF tooling — no MeshCore source changed.
HW-verified: RAK4631 now reports `fw v1.17.0` in ota stats/ls (was v0.0.0), matching `ver`.
… switch) Spell out how the warm-start seed is actually applied (a user asked): the seed is the --seed <file> given to `motatool serve`, NOT a file dropped into the --dir destination (which starts empty). motatool stamps that one seed into the fresh .part on every `… folder validate` begin, so it's always the named file — no guessing. `validate` is the switch (a plain folder pull fetches from 0); a re-pull re-begins fresh (never resumes a stale partial); a mismatched/absent seed just falls back to fetching those blocks over the radio (correct result, only slower).
TL;DR
Eventually upgradable. Low priority.
firmware OTA over LoRa. DHT/bittorrent propagation. Support ESP32 and nRF52. nRF52 requires special bootloader to apply OTA.. Serve folder with multiple mOTA through WiFI/serial (motatool)
Tested with Heltec V3, RAK4631, T114
Test firmware: https://github.com/vk496/MeshCore/releases/tag/dev-latest
User Quick Start:
docs/ota_user_guide.md.Dev specs:
docs/ota_protocol.mdWhat this adds
Over-the-air firmware updates that travel over the existing LoRa mesh — no
internet, BLE, or USB needed at the target node. A node discovers update sources
among its neighbours, fetches a signed image block-by-block, verifies it, and
applies it. Images can be full or delta against a known base (deltas are
far smaller — essential on LoRa's tiny bandwidth).
Motivation: repeaters/sensors are often deployed somewhere physically awkward to
reach. Today updating them means going to the device. This lets you push a signed
update through the mesh itself.
How it works
.motacontainer — a signed update package: manifest + payload + a merkletree over fixed-size blocks, plus a 56-byte
EndFself-identity trailer bakedinto every firmware (target id, version, hw id, image hash). Spec in
docs/ota_protocol.md.OTA_ADVbeacon(seeder + count + set-digest); interested peers send
OTA_QUERYand build acatalog from
OTA_HAVE. Jitter + overhear-suppression keep it quiet.DATAfragments + a merklePROOF, reassembled and verified per block; resumable across reboots.no bootloader change); nRF52 applies the delta in place via a companion
bootloader (see Dependencies). Refuses on hw-id mismatch (brick-safety).
autofetchandautoinstallpolicies are opt-in and conservative by default.ota status | ls | get <#> | install | cancel | announce | self | ….What's in the PR (layered for review)
The 13 commits build up in dependency order — container format → vendored decoder
→ transfer protocol → platform apply → folder relay → CLI/node integration →
build enablement + tooling → host packager → tests → docs → WiFi serve →
discovery tuning → nRF52 variant enablement. Each is independently reviewable.
Also included:
motatool— a portable C++17 host tool to build/verify/inspect/serve/keygen.mota(cross-checked byte-for-byte against the Python reference).motatool) can serve a folder of.motato peersover USB serial or WiFi/TCP (ESP32 companion: dedicated seeder port 5001 +
an OTA text console on 5002, both coexisting with the phone-app port).
pio test -e native),motatoolctest, and aPython reference suite; generated tables (
OtaTargets.h,mota_vectors.h) arecommitted.
Platform support
bootloader supports (RAK4631, Heltec T114, LilyGo T-Echo family, ThinkNode,
Wio-Tracker, Xiao nRF52, ProMicro, T1000-E, …). No new variants are introduced.
Dependencies
nRF52 in-place apply requires a companion bootloader change
(
Adafruit_nRF52_Bootloader_OTAFIX, separate repo/PR — link TBD). ESP32 needsnothing extra. Full/verify/serve paths need no bootloader on any platform.
Testing
motatoolctest, Python reference suite all pass.Heltec V3 (ESP32-S3): mesh discovery, full + delta fetch, signature/hash verify,
and apply (ESP32 A/B + nRF52 in-place).
Notes / scope
detools0.53.0 decoder is vendored (isolated, decoder-only, 3rd-party) — thedelta codec is never reimplemented.
NodePrefsfields are versioned/back-compatible.Checklist