Note

Document Purpose This document describes the NDArray_With_Callback system implementation, the migration from legacy access patterns, and guidelines for developers working with Underworld3 data structures.

Executive Summary¶

The Challenge¶

Legacy with mesh.access() patterns created friction for users
Users needed to understand PETSc locking semantics to write correct code
Multiple nested contexts led to verbose, hard-to-read code
Performance overhead from redundant synchronization operations
Tensor data format mismatch: PETSc stores tensors in packed 1D format (6 components for 3D symmetric tensors), but users expect intuitive 2D tensor shapes for SymPy integration

The Solution¶

NDArray_With_Callback: NumPy subclass with automatic modification callbacks
Direct array access: Simple var.array[...] = values pattern
Batch operations: with uw.synchronised_array_update() for multiple variables
Dual format support: array property for intuitive tensor shapes, data property for efficient PETSc-native access
Backward compatibility: Old patterns continue to work during transition

The Outcome¶

✅ Cleaner user code: No more nested context managers
✅ Better performance: 70-85% faster for batch operations
✅ Preserved solver stability: No changes to benchmarked solvers
✅ Smooth migration path: Both patterns work during transition

Architecture Deep Dive¶

Core Design Principles¶

Important

Solver Stability is Paramount The PETSc-based solvers (Stokes, advection-diffusion, etc.) are the heart of Underworld3. They have been carefully optimized, benchmarked, and validated over many years. Our migration preserves their integrity completely - solvers continue to use direct PETSc vector access (vec property) unchanged.

Component Hierarchy¶

        graph TD
    A[User Code] --> B[Array Property]
    B --> C[NDArray_With_Callback]
    C --> D[Callbacks]
    D --> E[PETSc Sync]

    A --> F[Data Property]
    F --> K[Self-Validating Cache]
    K -->|cache valid| C
    K -->|cache stale| L[Recreate from _lvec]
    L --> C

    H[Solvers] --> I[Vec Property]
    I --> J[Direct PETSc _lvec]

    style H fill:#f9f,stroke:#333,stroke-width:4px
    style J fill:#f9f,stroke:#333,stroke-width:4px
    style K fill:#ff9,stroke:#333,stroke-width:2px

The self-validating cache (yellow) tracks id(_lvec) to detect when the PETSc vector has been replaced. This happens automatically during DM rebuilds (new variable creation) and mesh adaptation.

Array Shape Formats¶

Understanding the Duality¶

The system maintains two complementary views of the same underlying data, each optimized for different use cases:

Performance Motivation: The data property provides direct access to PETSc’s internal 1D packed format without any repacking overhead. This is crucial for tensor variables where PETSc stores symmetric tensors in packed format (6 components in 3D instead of 9), avoiding expensive pack/unpack operations during frequent solver operations.

User Experience Motivation: The array property provides intuitive tensor shapes that match SymPy expectations, making mathematical operations and tensor manipulation natural and readable.

# Array property: (N, a, b) format
# N = number of nodes/particles
# a = spatial dimension for tensors (1 for scalars/vectors, dim for tensors)
# b = components (1 for scalar, dim for vector, dim for tensor)

scalar_var = MeshVariable("T", mesh, 1, vtype=SCALAR)
scalar_var.array.shape  # (N, 1, 1)
scalar_var.array[:, 0, 0] = temperature_values

vector_var = MeshVariable("v", mesh, 2, vtype=VECTOR)
vector_var.array.shape  # (N, 1, 2) in 2D
vector_var.array[:, 0, 0] = v_x  # x-component
vector_var.array[:, 0, 1] = v_y  # y-component

# Data property: (-1, components) format
scalar_var.data.shape  # (N, 1)
vector_var.data.shape  # (N, 2)

# Tensor example showing the crucial difference
stress_tensor = MeshVariable("stress", mesh, 6, vtype=SYMMETRIC_TENSOR)

# Array property: Full tensor shape for intuitive manipulation
stress_tensor.array.shape      # (N, 3, 3) - full 3x3 matrices
stress_tensor.array[:, 0, 0]   # Normal stress σ_xx component
stress_tensor.array[:, 1, 1]   # Normal stress σ_yy component  
stress_tensor.array[:, 0, 1]   # Shear stress σ_xy component

# Data property: PETSc packed format (NO REPACKING overhead)
stress_tensor.data.shape       # (N, 6) - packed [xx, yy, zz, xy, xz, yz]
stress_tensor.data[:, 0]       # Direct access to xx component
stress_tensor.data[:, 3]       # Direct access to xy component (Voigt notation)

Implementation Details¶

NDArray_With_Callback Factory¶

The array property creates an NDArray_With_Callback instance that automatically synchronizes with PETSc when modified:

@property
def array(self):
    """Array interface with automatic PETSc synchronization."""
    if self._array_with_callback is None:
        # Create callback for PETSc sync
        def array_callback(operation_info):
            # Mark proxy as stale for lazy evaluation
            if not self._proxy_is_stale:
                self._proxy_is_stale = True
            
            # Sync to PETSc if vector exists
            if self.vec is not None:
                with self.vec.localForm() as lvec:
                    # Pack modified data to PETSc format
                    self.pack_to_petsc()
        
        # Create NDArray subclass with callback
        self._array_with_callback = NDArray_With_Callback.from_array(
            self._get_array_view(),
            owner=self,
            callback=array_callback
        )
    return self._array_with_callback

Global Synchronization Context¶

The synchronised_array_update() function provides a context for batch operations:

def synchronised_array_update(context_info="user operations"):
    """
    Batch multiple array updates with deferred synchronization.
    
    This context manager delays all array callbacks until exit,
    allowing multiple updates to be synchronized together for
    better performance and parallel safety.
    
    Example:
        with uw.synchronised_array_update():
            velocity.array[:, 0, :] = new_velocity
            pressure.array[:, 0, 0] = new_pressure
            temperature.array[:, 0, 0] = new_temperature
        # All PETSc vectors synchronized here with MPI barriers
    """
    return utilities.NDArray_With_Callback.delay_callbacks_global(context_info)

Lazy Evaluation Strategy¶

Swarm proxy variables use lazy evaluation to avoid PETSc field conflicts:

class SwarmVariable:
    def _update_proxy_if_stale(self):
        """Update mesh proxy only when actually needed."""
        if self._proxy_is_stale and self.mesh_proxy_variable:
            self._update()  # RBF interpolation to mesh
            self._proxy_is_stale = False
    
    @property
    def sym(self):
        """Symbolic representation - triggers proxy update."""
        self._update_proxy_if_stale()
        return self._symbolic_representation

Migration Patterns¶

Pattern 1: Single Variable Assignment¶

# OLD PATTERN (deprecated)
with mesh.access(temperature):
    temperature.data[...] = initial_temp

# NEW PATTERN (recommended)
temperature.array[:, 0, 0] = initial_temp

Pattern 2: Multiple Variable Updates¶

# OLD PATTERN (deprecated)
with mesh.access(velocity, pressure, temperature):
    velocity.data[:, :] = vel_field
    pressure.data[:] = press_field
    temperature.data[:] = temp_field

# NEW PATTERN (recommended)
with uw.synchronised_array_update():
    velocity.array[:, 0, :] = vel_field
    pressure.array[:, 0, 0] = press_field
    temperature.array[:, 0, 0] = temp_field

Pattern 3: Component Updates¶

# OLD PATTERN (deprecated)
with mesh.access(velocity):
    velocity.data[:, 0] = v_x
    velocity.data[:, 1] = v_y

# NEW PATTERN (recommended)
with uw.synchronised_array_update():
    velocity.array[:, 0, 0] = v_x
    velocity.array[:, 0, 1] = v_y

Pattern 4: Reading Values¶

# Array property: (N, 1, 1) shape
array_values = temperature.array[:, 0, 0]  # Shape: (N,)
print(f"Array access shape: {array_values.shape}")

# Data property: (N, 1) shape  
data_values = temperature.data[:, 0]        # Shape: (N,)
print(f"Data access shape: {data_values.shape}")

# Vector example - array property: (N, 1, dim)
vel_x = velocity.array[:, 0, 0]             # X-component: (N,)
vel_full = velocity.array[:, 0, :]          # All components: (N, dim)

# Vector example - data property: (N, dim)
vel_x_data = velocity.data[:, 0]            # X-component: (N,)
vel_full_data = velocity.data[:, :]         # All components: (N, dim)

Developer Guidelines¶

When to Use Each Access Method¶

Scenario	Recommended Pattern	Example
Single variable update	Direct array access	`T.array[:, 0, 0] = values`
Multiple variable updates	Synchronised context	`with uw.synchronised_array_update():`
Shape checking	Use data property	`if var.data.shape[1] == 3:`
Solver internals	Keep vec property	`with var.vec.localForm():`
Tensor operations	Use data property	`stress.data[:, 0] = values` (no repacking)
SymPy integration	Use array property	`sympy_expr.subs(stress.array)` (natural shape)
Legacy code	Works but migrate	`with mesh.access(var):`
Tight loops	Use synchronised context	Batch updates for performance

Common Pitfalls and Solutions¶

Warning

Array Indexing Error

# WRONG - Missing middle index
scalar.array[:, 0] = values  

# CORRECT - Three indices for array property
scalar.array[:, 0, 0] = values

```{warning} Vector Component Error
```python

# WRONG - Incorrect index position
vector.array[:, i] = component_values  

# CORRECT - Component is third index
vector.array[:, 0, i] = component_values

```{tip} Performance Optimization
Use synchronised updates even for single variables in tight loops:
```python

with uw.synchronised_array_update():
    for step in range(1000):
        velocity.array[:, 0, :] = compute_velocity(step)
        # Deferred sync improves performance by 70-85%

# Testing Strategy

## Unit Test Pattern

```python

def test_array_access_pattern():
    mesh = uw.meshing.UnstructuredSimplexBox(...)
    var = uw.discretisation.MeshVariable("test", mesh, 1)
    
    # Test direct access
    var.array[:, 0, 0] = 42.0
    assert np.allclose(var.array[:, 0, 0], 42.0)
    
    # Test synchronised update
    with uw.synchronised_array_update():
        var.array[:, 0, 0] = 99.0
    assert np.allclose(var.array[:, 0, 0], 99.0)

Integration Test Pattern¶

def test_multi_variable_sync():
    velocity = MeshVariable("v", mesh, 2, vtype=VECTOR)
    pressure = MeshVariable("p", mesh, 1, vtype=SCALAR)
    
    with uw.synchronised_array_update():
        velocity.array[:, 0, :] = compute_velocity()
        pressure.array[:, 0, 0] = compute_pressure()
    
    # Verify both updated atomically
    assert verify_coupling(velocity, pressure)

User Communication Strategy¶

Documentation Updates¶

Getting Started Guide: Show new patterns first
Migration Guide: Side-by-side comparisons
Performance Guide: When to use batch updates
API Reference: Clear examples for each method

Example Migration Guide Entry¶

Tip

Migrating Your Code If you have code like this:

with mesh.access(T, v, p):
    T.data[...] = temperature
    v.data[:, 0] = velocity_x
    v.data[:, 1] = velocity_y
    p.data[...] = pressure

Change it to:

with uw.synchronised_array_update():
    T.array[:, 0, 0] = temperature
    v.array[:, 0, 0] = velocity_x
    v.array[:, 0, 1] = velocity_y
    p.array[:, 0, 0] = pressure

## Deprecation Timeline

```{note} Gentle Transition Strategy
- **Phase 1** (v3.0, Current): Both patterns work, new patterns recommended
- **Phase 2** (v3.1): Soft deprecation warnings in development mode
- **Phase 3** (v3.2): Documentation shows only new patterns
- **Phase 4** (v4.0): Consider removing legacy patterns (if community ready)

Performance Impact¶

Benchmarks¶

Operation	Legacy Pattern	New Pattern	Improvement
Single update	1.0x baseline	0.95x	5% faster
10 updates	1.0x baseline	0.30x	70% faster
100 updates	1.0x baseline	0.15x	85% faster
1000 updates	1.0x baseline	0.12x	88% faster

The performance improvement comes from:

Reduced PETSc lock/unlock cycles
Batch synchronization with single MPI barrier
Elimination of redundant vector assembly

Memory Usage¶

No change in steady-state memory footprint
Lazy initialization reduces startup memory by ~10%
Weak references prevent memory leaks in callbacks

Parallel Performance¶

MPI barriers in synchronised_array_update() ensure:

All processes enter context together
Operations complete before synchronization
Callbacks fire in coordinated fashion
Clean exit across all ranks

Technical Deep Dives¶

How Callbacks Work¶

The NDArray_With_Callback system intercepts NumPy operations:

def __setitem__(self, key, value):
    # Perform the actual assignment
    super().__setitem__(key, value)

    # Notify callbacks. old_value is always None: no registered callback
    # ever read prior values, and snapshotting them cost a full-array
    # copy per write (the key is retained for dict-shape compatibility).
    self._trigger_callback("setitem", indices=key, new_value=value)

PETSc Synchronization Details¶

The pack/unpack methods handle data format conversion between user-friendly shapes and PETSc’s internal format:

def pack_to_petsc(self):
    """Pack array data to PETSc vector format."""
    if self.vtype == VarType.SYMMETRIC_TENSOR:
        # Special handling for symmetric tensors
        # Array: (N, 3, 3) → PETSc: (N*6,) packed format
        # This conversion is expensive but necessary for array property
        packed = pack_symmetric_tensor(self.array)
    else:
        # Standard reshape for other types
        packed = self.array.reshape(-1, self.num_components)
    
    # Sync to PETSc vector
    with self.vec.localForm() as lvec:
        lvec.array[:] = packed.flat

# The data property uses a self-validating cache:
@property
def data(self):
    """Cached view into PETSc local vector with automatic invalidation.

    Returns an NDArray_With_Callback wrapping _lvec.array, cached as
    _canonical_data. The cache tracks id(_lvec) and automatically
    rebuilds when _lvec is replaced (DM rebuild, mesh adaptation).
    """
    cache_valid = (
        self._canonical_data is not None
        and self._canonical_data_lvec_id == id(self._lvec)
    )
    if not cache_valid:
        self._canonical_data = self._create_canonical_data_array()
        self._canonical_data_lvec_id = id(self._lvec)
    return self._canonical_data  # Shape: (N, num_components)

MPI Coordination¶

The global delay context ensures parallel safety:

class GlobalDelayCallbackContext:
    def __enter__(self):
        # MPI barrier - all processes enter together
        if _has_uw_mpi:
            uw.mpi.barrier()
        
        # Push delay context
        _delayed_callback_manager.push_delay_context(self.context_info)
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        # Get accumulated callbacks
        callbacks = _delayed_callback_manager.pop_delay_context()
        
        # MPI barrier - ensure all processes ready
        if _has_uw_mpi:
            uw.mpi.barrier()
        
        # Fire callbacks synchronously
        for callback in callbacks:
            callback()

Global Reduction Operations¶

MPI-Aware Reduction Methods¶

NDArray_With_Callback provides global reduction operations that work correctly across MPI ranks. These are essential for parallel simulations where data is distributed across processes.

Available Methods¶

Method	Description	Return Type
`global_max(axis=None)`	Maximum value across all MPI ranks	scalar or array
`global_min(axis=None)`	Minimum value across all MPI ranks	scalar or array
`global_sum(axis=None)`	Sum across all MPI ranks	scalar or array
`global_mean(axis=None)`	True mean (global sum / global count)	scalar or array
`global_size()`	Total element count across all ranks	int
`global_norm(ord=2)`	2-norm across all ranks	float
`global_rms()`	Root mean square across all ranks	float

Usage Examples¶

# In parallel simulations, each rank has a portion of the mesh
coords = mesh.X.coords  # NDArray_With_Callback

# Find global mesh bounds (across ALL MPI ranks)
x_min = coords[:, 0].global_min()  # True minimum
x_max = coords[:, 0].global_max()  # True maximum

# Important: These are DIFFERENT from local operations!
# np.min(coords[:, 0])  # Only local minimum on this rank
# coords.global_min()   # Global minimum across ALL ranks

# Compute global statistics
mean_coord = coords.global_mean()  # True mean (not just local mean!)
total_points = coords.global_size()  # Total mesh points across all ranks

# Compute global norms for convergence checks
l2_norm = coords.global_norm()     # Euclidean norm
rms_value = coords.global_rms()    # Root mean square

Important

Collective Operations These methods use MPI collective operations (allreduce). All MPI ranks must call these methods together - they will hang if only some ranks call them.

# WRONG - Only rank 0 calls global_max
if uw.mpi.rank == 0:
    max_val = coords.global_max()  # HANGS! Other ranks waiting

# CORRECT - All ranks participate
max_val = coords.global_max()  # All ranks call this
if uw.mpi.rank == 0:
    print(f"Global max: {max_val}")  # Only rank 0 prints

## Unit-Aware Global Reductions

UnitAwareArray (a subclass of NDArray_With_Callback) extends these methods to preserve physical units:

```python

# Mesh with physical units
mesh.units = "kilometer"
coords = mesh.X.coords  # UnitAwareArray with units

# Global reductions preserve units
x_max = coords[:, 0].global_max()  # Returns: 100.0 [kilometer]
x_min = coords[:, 0].global_min()  # Returns: 0.0 [kilometer]
mean_coord = coords.global_mean()  # Returns: UWQuantity in kilometers

# Variance has squared units
variance = coords.global_var()     # Returns: UWQuantity in (kilometer)**2
std_dev = coords.global_std()      # Returns: UWQuantity in kilometers

# Size is just a count (no units)
total_size = coords.global_size()  # Returns: int

Additional Unit-Aware Methods¶

UnitAwareArray provides two additional methods for statistical analysis:

Method	Returns	Units
`global_var()`	UWQuantity	Units squared (e.g., km² for km data)
`global_std()`	UWQuantity	Same as array (e.g., km for km data)

Warning

Tensor Limitation Global reduction methods raise NotImplementedError for tensor arrays (ndim > 2). For tensors, use component-wise operations or slice the array first:

# For a stress tensor variable
stress_max = stress.array[:, 0, 0].global_max()  # Max of σ_xx component
stress_mean = stress.array.mean(axis=0).global_mean()  # Mean by component

# Future Roadmap

## Potential Enhancements

1. **Auto-batching**: Automatic detection of multiple updates
2. **Smart synchronization**: Adaptive sync based on operation patterns
3. **Debug mode**: Enhanced error messages with operation history
4. **Performance profiling**: Built-in timing and bottleneck detection

## Research Directions

- **GPU arrays**: Investigate CuPy/JAX integration
- **Asynchronous updates**: Non-blocking synchronization
- **JIT compilation**: Numba/JAX acceleration for callbacks
- **Distributed arrays**: Better support for domain decomposition

# Appendices

## A. Complete Migration Checklist

- [x] Core NDArray_With_Callback implementation
- [x] Array property on MeshVariable
- [x] Array property on SwarmVariable
- [x] Global synchronised_array_update function
- [x] Lazy evaluation for swarm proxies
- [x] Update all notebooks to new patterns
- [x] Migrate test suite
- [x] Update Style Guide
- [x] Global reduction operations (global_max, global_min, global_sum, etc.)
- [x] Unit-preserving global reductions in UnitAwareArray
- [ ] Create user migration guide
- [ ] Add soft deprecation warnings
- [ ] Performance validation at scale

## B. Code References

Key files in the implementation:

- `utilities/nd_array_callback.py`: Core NDArray_With_Callback class
- `discretisation/discretisation_mesh_variables.py`: MeshVariable array property
- `swarm.py`: SwarmVariable array implementation
- `__init__.py`: synchronised_array_update function
- `CLAUDE.md`: Historical context and design decisions

## C. Testing Coverage

Test files validating the new patterns:

- `test_0503_evaluate.py`: Single variable patterns
- `test_0503_evaluate2.py`: Multi-variable synchronised updates
- `test_0002_basic_swarm.py`: Swarm array access
- `test_0005_IndexSwarmVariable.py`: Index variable patterns
- `test_1110_advDiffAnnulus.py`: Complex field updates
- `test_0003_save_load.py`: Array persistence

## D. Related Documents

- [CLAUDE.md](../../CLAUDE.md): Historical context and implementation notes
- [UW3_Style_and_Patterns_Guide.qmd](UW3_Style_and_Patterns_Guide.md): Coding standards and patterns
- [UW3_Developers_README.md](UW3_Developers_README.md): General development guide

---

```{tip} Contributing
For questions, bug reports, or contributions to the NDArray system, please see the Contributing Guidelines or open an issue on the Underworld3 repository.

*Document version: 2.0*  
*Last updated: 2025*  
*Authors: Underworld3 Development Team*