Field transfer on mesh adaptation — design¶
Status: design / spec for implementation. Phase-1 (REMAP) not yet built.
A working band-aid is in ddt.py (see §7). This doc is the entry point for
the implementation session.
1. Problem¶
When an adaptive mover repositions mesh nodes, every field defined on the mesh
must be brought onto the new node layout. Today this is done by user code
(the harness hand-remaps T after calling the mover), which structurally
cannot transfer variables the user does not know exist — solver history
(DuDt.psi_star), projection/Hessian work-vars, RBF proxies, etc.
The concrete failure (parallel adaptive AdvDiffusionSLCN, see memory
project_slcn_psistar_seam_remap): a 30–50% temperature spike at np-partition
seams, traced to the SLCN history psi_star being re-recorded by a rank-local
evaluate of the field at its own (on-vertex) node coordinates, which
mis-locates at a process boundary. Root cause is hidden-variable transfer,
not the φ-solve / remap of T / monotone clamp / implicit solve (all
exonerated by a long isolation; see the memory).
Goal: move field transfer out of user code and into the adapt operation, with a per-variable policy so hidden variables fail safe (transferred needlessly) rather than silently wrong (stale).
2. Transfer-semantics taxonomy — four kinds¶
policy |
post-adapt value = |
for |
|---|---|---|
REMAP (default) |
old field re-sampled at the new node positions |
any Eulerian stored quantity; all DuDt history |
CARRY |
unchanged DOF value (belongs to node/material identity) |
genuinely Lagrangian fields (nodes move with material) |
REINIT |
nothing — recomputed from source before next use |
stateless work-vars (gradient/Hessian projections, RBF proxies) |
ALE |
CARRY + a lazy |
history of a material derivative (adv-diff, NS, VE), as an opt-in refinement of REMAP |
REMAP is the safe universal default — literally “the previous values are those at the original mesh points,” correct to first order or better for any Eulerian quantity. ALE is a refinement, never forced.
2a. CRITICAL: DuDt history is REMAP, never REINIT¶
DuDt.psi_star is persistent, unrecoverable state, not a recomputable
work-var. For order ≥ 2 the older terms (psi_star[1], …) are accumulated
upstream history — each is the field traced back along the velocity field of
an earlier step. Once Stokes overwrites V with the new step’s velocity
and the older solution is gone, those terms cannot be rebuilt. The
shift-history step (psi_star[i] = φ·psi_star[i-1] + (1-φ)·psi_star[i],
ddt.py ~2035) reads every term, so all must be correctly transferred
before update_pre_solve.
⇒ history terms get REMAP (or ALE). Never REINIT. REINIT is strictly for variables that hold no memory and are recomputed from scratch each solve.
The order-1 case looks recomputable (the re-record rebuilds psi_star[0]
from u_Field) — which is why the band-aid (§7) sufficed at θ=1 — but that is
a coincidence of order 1 and must not be generalised.
3. Interface — per-variable policy + operator hook¶
Two levels, because ALE/history transfer is not expressible per-variable (it moves a field and its history coherently under one mesh velocity — “T and Tdot together”):
MeshVariable.remesh_policy— enumREMAP(default) /CARRY/REINITfor standalone fields and the common case. The framework stamps this on hidden vars it creates (projections/proxies →REINIT); the user only flags their ownCARRYfields. Default-unflagged ⇒ REMAP ⇒ a forgotten variable fails safe.Operator
on_remesh(ctx)hook — solvers /DuDts register one and handle their own field+history set coherently (the SLCNDuDtdecides REMAP-all-history vs ALE-carry+v_mesh). Operator-managed vars are marked so the generic per-variable pass defers to the hook.
The adapt op (a new mesh.adapt / inside smooth_mesh_interior /
OT_adapt / follow_metric) becomes the single owner of the transfer dance
the harness does by hand today:
snapshot old coords + old field data (for REMAP/ALE vars)
move nodes to new_X (the mover; see §5)
for each registered var: apply remesh_policy
REMAP -> var.data = interp(old field) at new positions # robust, see §4
CARRY -> leave
REINIT -> leave / mark stale (recomputed on next use)
for each registered operator: operator.on_remesh(ctx) # ALE etc.
ctx (a RemeshContext) carries: old coords, new coords, total displacement
Δx, a bound interpolator, dt, and a scratch slot where operators stash
v_mesh.
3a. REMAP is robust by construction¶
REMAP evaluates the old field at the new node positions. Those are
generic interior points of the old mesh, so global_evaluate locates them
fine — this is exactly the T-remap that already works and that FIXED_DEFORM
proved clean (np4-vs-serial 7e-4). It is not the degenerate “evaluate a
field at its own vertices” that the band-aid had to dodge (that is the
per-step re-record, a different operation). So the general REMAP does not
inherit the band-aid’s on-vertex fragility.
Implementation: snapshot old field on the old mesh; after the move, do the proven deform-back → evaluate-at-new → deform-forward dance once for all REMAP vars together (not per-var, not per-deform).
5. Audit: deferring the update for multi-shot movers (OT n_outer≥12)¶
Findings (smoothing.py, discretisation_mesh.py):
_deform_meshdoes NOT touch field data (discretisation_mesh.py:2001). It writes coords, callsnuke_coords_and_rebuild, rebuilds the_coordsview, sets every registered solveris_setup=False, invalidates the evaluation hash + DMInterpolation cache, bumps_topology_version, syncs submeshes. Field.datais untouched. ⇒ field transfer is already one-shot and external to the mover; there is no per-inner-deform field remap to eliminate.Each mover (
_winslow_elliptic~1570,_winslow_equidistribute~1861,_winslow_anisotropic~2639) calls_deform_meshonce per outer step (inside thefor outer in range(n_outer)loop). So OT paysnuke_coords_and_rebuild×n_outer.nuke_coords_and_rebuild(discretisation_mesh.py:1757) per call:clearDS/createDS;createCoordinateSpace+dm_force_coordinate_field; kd-tree rebuild (_build_kd_tree_index);_get_mesh_sizes(centroids / radii / search_lengths);copyDS; invalidate normals + face control points; and refill the coord cache for every registered variable (for _var in self.vars: self._get_coords_for_var(_var), line 1890).The mover’s inner loop consumes the rebuilt state: it calls
uw.function.evaluate(metric, Dcoords)(needs the kd-tree of the just-deformed mesh) andsolver.solve()(needs the DS) every outer step. ⇒ the kd-tree + DS rebuild are genuinely needed per inner step and cannot be naively deferred.
So the deferral opportunity is narrower than “defer the rebuild”:
The avoidable waste is the per-variable coord-cache refill (line 1890): it refills all registered vars (T, V, P, every
psi_star, …) on every inner deform, but the mover only needs its own work-vars (metric, φ, gradients). Refilling the user fields’ caches n_outer× is pure waste.That refill is a parallel-correctness BUGFIX (#130): it was made eager to avoid a subset-of-ranks collective deadlock in
_get_coords_for_basis(lazy rbf refill). So any deferral must keep collective access correct — defer the non-mover vars’ refill to a single full refill at sweep exit (safe, since user fields aren’t accessed mid-sweep), and refill only the mover’s work-vars inner-loop.
Options for point 2 (ranked):
Scope the per-var refill (low risk, real win). Give
nuke_coords_and_rebuildan optional “active vars” set; during a mover sweep refill only the mover’s work-vars; do one full refill at the end. Saves(n_outer−1) × n_user_varsrefills. No context manager strictly needed — the mover already owns its sweep.Light vs full rebuild split. Separate “geometry rebuild” (DS + coords, needed per inner step) from “search/eval rebuild” (kd-tree, centroids, per-var cache). But the mover’s inner
evaluate(metric,…)needs the kd-tree, so this only helps if the metric eval is reworked to not need the kd-tree (e.g. evaluate-at-own-nodes as a data read) — larger change.mesh.batch_deform()context manager. Suppresses the non-mover-var refill + defers a single full rebuild of the search/eval state to exit. Cleanest API but must respect the #130 collective constraint and ensure no stale kd-tree/centroid read happens mid-sweep (the mover does read them, so “defer the kd-tree” is unsafe — the CM can only defer the per-var cachefield-side state, i.e. option 1 wrapped in a CM).
Recommendation: option 1 (scoped refill), optionally surfaced as a context manager later. MA is one-shot so it is unaffected either way.
6. ALE = CARRY + lazy v_mesh correction¶
ALE leaves the history on its (now-moved) nodes — CARRY, no re-interpolation,
no interpolation diffusion — and corrects for the arbitrary mesh motion via
v_mesh = Δx_total/dt entering the next step’s characteristic: the SL
trace-back runs along (v − v_mesh) so it samples the carried history at the
correct relative-upstream point. The correction is lazy (applied at the
next solve, not baked in at adapt) and v_mesh is a one-step pulse (zero
again until the next adapt). It must use the total displacement across the
whole mover sweep (X_final − X_orig), not per inner deform.
The
v_meshcorrection is exactly what distinguishes ALE from plain CARRY: pure CARRY is only correct when nodes move with the material; a re-meshing adapt is arbitrary motion, so the correction restores correctness.First-order equal to REMAP (built-in cross-check: validate ALE ≈ REMAP on a smooth case, then trust it on the steep one), but avoids re-interpolating the unrecoverable higher-order history each adapt — the whole point.
Fits the deferred-update theme: CARRY is the zero-cost thing during the sweep; the
v_meshcorrection is accumulated and applied once on consumption.
ALE is a DDt property, scoped to the advective flavors (SemiLagrangian,
Lagrangian), not per-solver. The material-derivative + characteristic
logic lives in the DDt; put it there and it works uniformly for adv-diff,
Navier–Stokes, and VE-Stokes-via-DFDt with the solver doing nothing special.
The adapt op supplies v_mesh. The Eulerian DDt has no advection ⇒ REMAP.
A reset/re-mesh adapt (large discontinuous displacement) is not ALE-able ⇒
REMAP.
7. The band-aid (already landed) and its relationship to the true fix¶
ddt.py SemiLagrangian._record_psi_star_from_field_data() + a guarded call:
under MPI, when psi_fn is a single mesh-variable component on this mesh, the
per-step “record current field into psi_star[0]” copies the field’s nodal
data directly instead of evaluating it at its own (on-vertex) coords.
Serial keeps the validated shifted-evaluate path bit-identical (verified
6.66e-16); result: parallel adaptive adv-diff 0.137 → 6e-4.
Relationship to the true fix:
It is orthogonal to REMAP: it fixes the per-step re-record, not the adapt-time transfer. Keep it — it is the correct parallel-safe re-record, and it is what later lets the (recomputable, order-1)
psi_star[0]avoid a redundant REMAP. It is order-1 only (touches nopsi_star[1+]); higher order needs REMAP of the full history stack regardless.A cleanup later: consider promoting the direct nodal copy to the standard re-record (un-gate from MPI) after re-validating serial — it is more correct than the shifted-evaluate, not just a band-aid.
8. Validation gaps / cautions to carry into the build¶
Order-2 untested. The harness only ran θ=1 / order-1, so higher-order history transfer is unexercised. Build a 2nd-order adaptive case and verify the full
psi_starstack transfers (it is the load-bearing path for §2a).Swarm proxy staleness.
SwarmVariable._proxy_stale(swarm.py) is a lazy flag not set by_deform_mesh. Under adaptation a proxyMeshVariablemay read stale on the new mesh. Proxies should beREINIT(re-project from the swarm) and the adapt must trigger_proxy_stale = True(or the re-projection) — otherwise a different silent-staleness bug.Vector/tensor DuDt (NS / VE flux history) under parallel adaptation is not covered by the band-aid and must be validated under Phase-1 REMAP.
Solver stability is paramount: any change touching the DS rebuild / solver assembly path needs the tier-A/B suite + a bit-identical serial probe.
9. Phasing¶
Phase 1 — REMAP in the adapt op. Policy enum + framework stamping + adapt op owns snapshot/move/transfer (all REMAP, incl. full history stack). Replaces the harness’s manual transfer; makes parallel adaptation correct for all hidden vars and all DDt orders. Keep the band-aid. Add the scoped per-var refill (§5 option 1). Validate: parallel adapt sweep np4-vs-serial (target ≤ no-adapt baseline), order-2 case, tier-A serial bit-identical.
Phase 2 — ALE on the advective DDt.
on_remeshhook: CARRY history + stash total-Δxv_meshfor the next(v − v_mesh)trace-back. Validate against Phase-1 REMAP via the first-order agreement; benefits adv-diff + VE-Stokes; removes the interpolation-diffusion cost on 12-shot OT.
File map¶
file |
site |
role |
|---|---|---|
|
|
coords/cache rebuild; scoped-refill change (§5) |
|
movers @1570/1861/2639 (per-outer |
adapt op owns transfer; mover sweep refill scope |
|
|
history policy = REMAP/ALE; |
|
|
use DDt; nothing solver-specific for ALE |
|
|
proxy REINIT + adapt-triggered staleness (§8) |