Symbol Disambiguation in Underworld3¶
Date: 2025-12-15
Status: IMPLEMENTED
Replaces: Invisible whitespace hack (\hspace{})
Table of Contents¶
Executive Summary¶
Underworld3 needs to distinguish between symbolic variables that have the same display name but belong to different meshes (e.g., solver1.v vs solver2.v). This document explains the clean, SymPy-native mechanism that replaced the previous invisible whitespace hack.
The Problem¶
When users create variables on multiple meshes with the same name:
mesh1 = uw.meshing.StructuredQuadBox(...)
mesh2 = uw.meshing.StructuredQuadBox(...)
v1 = uw.discretisation.MeshVariable("v", mesh1, 2)
v2 = uw.discretisation.MeshVariable("v", mesh2, 2)
These variables must be symbolically distinct so that:
v1.sym != v2.sym(different SymPy objects)v1.sym + v2.symkeeps both terms (doesn’t simplify to2*v)Each variable can be independently substituted in expressions
The JIT compiler can map each symbol to the correct mesh’s data
The Old Solution (Deprecated)¶
Previously, Underworld3 used invisible LaTeX whitespace to make names unique:
# In discretisation_mesh_variables.py (OLD CODE - REMOVED)
if mesh.instance_number > 1:
invisible = rf"\hspace{{ {mesh.instance_number/10000}pt }}"
self.symbol = f"{{ {invisible} {symbol} }}"
This created symbols like { \hspace{ 0.0002pt } v } which:
❌ Made printed output ugly and confusing
❌ Complicated LaTeX rendering
❌ Broke serialization/deserialization
❌ Was hard to debug and understand
❌ Required cleanup regex in
mpi.pyfor printing
The New Solution¶
We use SymPy’s native mechanisms for symbol identity:
1. For UWexpression (Symbol Subclass)¶
Pattern: Override _hashable_content() to include a unique ID, following the same pattern as sympy.Dummy.
class UWexpression(Symbol):
__slots__ = ('_uw_id',)
def __new__(cls, name, *args, _unique_name_generation=False, **kwargs):
# Determine unique ID
if _unique_name_generation:
uw_id = UWexpression._expr_count
else:
uw_id = None
# CRITICAL: Use __xnew__ to bypass SymPy's internal cache
# (The cache doesn't know about _uw_id)
obj = Symbol.__xnew__(cls, name)
obj._uw_id = uw_id
return obj
def _hashable_content(self):
"""Include _uw_id in hash for disambiguation."""
base_content = Symbol._hashable_content(self)
if self._uw_id is not None:
return base_content + (self._uw_id,)
return base_content
Why __xnew__?
SymPy’s Symbol.__new__ has an internal cache keyed by (cls, name, assumptions). This cache runs before our _hashable_content() is called. Using Symbol.__xnew__ bypasses this cache, ensuring each call creates a fresh object that we can customize.
2. For UnderworldFunction (Creates UndefinedFunction)¶
Pattern: Pass _uw_id as a keyword argument to UndefinedFunction(). SymPy automatically uses kwargs in __eq__ and __hash__ for function classes.
class UnderworldFunction(sympy.Function):
def __new__(cls, name, meshvar, vtype, component=0, ...):
mesh = meshvar.mesh
uw_id = mesh.instance_number if mesh.instance_number > 1 else None
# SymPy uses _uw_id in __eq__ and __hash__ automatically!
ourcls = sympy.core.function.UndefinedFunction(
fname,
bases=(UnderworldAppliedFunction,),
_uw_id=uw_id, # This makes functions distinct
**options
)
ourcls.meshvar = weakref.ref(meshvar)
return ourcls
How SymPy Identity Works¶
For Symbols¶
SymPy symbols use _hashable_content() for identity:
class Symbol:
def __hash__(self):
return hash(self._hashable_content())
def __eq__(self, other):
if type(self) is not type(other):
return NotImplemented
return self._hashable_content() == other._hashable_content()
def _hashable_content(self):
return (self.name,) + tuple(sorted(self.assumptions0.items()))
By adding _uw_id to _hashable_content(), we make symbols with the same name but different IDs distinct.
For Functions (UndefinedFunction)¶
SymPy’s UndefinedFunction stores kwargs and includes them in equality:
# Inside sympy.core.function
class UndefinedFunction:
def __new__(cls, name, **kwargs):
# kwargs are stored and used in __eq__/__hash__
...
When we pass _uw_id=mesh.instance_number, SymPy automatically makes functions from different meshes distinct.
The sympy.Dummy Precedent¶
Our approach mirrors sympy.Dummy, which uses this exact pattern:
# From sympy/core/symbol.py
class Dummy(Symbol):
_count = 0
__slots__ = ('dummy_index',)
def __new__(cls, name=None, dummy_index=None, **assumptions):
if dummy_index is None:
dummy_index = Dummy._count
Dummy._count += 1
cls._sanitize(assumptions, cls)
obj = Symbol.__xnew__(cls, name, **assumptions) # Bypass cache!
obj.dummy_index = dummy_index
return obj
def _hashable_content(self):
return Symbol._hashable_content(self) + (self.dummy_index,)
The key insight: Dummy symbols with the same name are distinct because dummy_index is included in _hashable_content().
Implementation Details¶
Files Modified¶
src/underworld3/function/expressions.pyAdded
__slots__ = ('_uw_id',)toUWexpressionModified
__new__to useSymbol.__xnew__()and assign_uw_idAdded
_hashable_content()overrideAdded
__getnewargs_ex__()for pickling support
src/underworld3/function/_function.pyxModified
UnderworldFunction.__new__to pass_uw_idtoUndefinedFunction()Applied same fix to derivative function classes
src/underworld3/discretisation/discretisation_mesh_variables.pyRemoved the
\hspace{}hack (lines 199-201)Added comment explaining new mechanism
When _uw_id Is Assigned¶
Object Type |
When |
|---|---|
|
When |
|
When |
Derivative functions |
Same as parent MeshVariable |
Backward Compatibility¶
Symbols created on the first mesh (
instance_number == 1) have_uw_id = NoneThis matches previous behavior where no disambiguation was needed
Expressions created without
_unique_name_generation=Trueare shared by name (singleton pattern)
Testing¶
Verification Script¶
import underworld3 as uw
# Create two meshes
mesh1 = uw.meshing.StructuredQuadBox(elementRes=(4, 4))
mesh2 = uw.meshing.StructuredQuadBox(elementRes=(4, 4))
# Create variables with same name
v1 = uw.discretisation.MeshVariable("v", mesh1, 2)
v2 = uw.discretisation.MeshVariable("v", mesh2, 2)
# Test 1: Symbols are distinct
assert v1.sym != v2.sym, "Symbols should be different"
# Test 2: Display names are clean
assert '\\hspace' not in str(v1.sym), "No invisible whitespace"
assert '\\hspace' not in str(v2.sym), "No invisible whitespace"
# Test 3: Expression keeps both terms
expr = v1.sym + v2.sym
from underworld3.function._function import UnderworldAppliedFunction
atoms = expr.atoms(UnderworldAppliedFunction)
assert len(atoms) == 4, "Should have 4 atoms (2 components × 2 variables)"
# Test 4: Meshvar accessible via weakref
assert v1.sym[0,0].func.meshvar().mesh is mesh1
print("✅ All disambiguation tests passed!")
Test Files¶
tests/test_symbol_disambiguation_prototype.py- Comprehensive unit testsCore solver tests verify no regressions
Benefits¶
Aspect |
Old ( |
New ( |
|---|---|---|
Display names |
Ugly, cluttered |
Clean |
LaTeX rendering |
Artifacts possible |
Perfect |
Debugging |
Confusing |
Clear |
Serialization |
Problematic |
Works correctly |
SymPy integration |
Hack/workaround |
Native mechanism |
Code complexity |
Regex cleanup needed |
Simple |
Troubleshooting¶
Issue: Symbols from same mesh are incorrectly distinct¶
Cause: _unique_name_generation=True when it shouldn’t be.
Solution: Only use _unique_name_generation=True for truly ephemeral expressions that need unique identity regardless of name.
Issue: Symbols from different meshes are incorrectly equal¶
Cause: mesh.instance_number not incrementing properly.
Solution: Check that mesh creation increments Mesh._instance_count. First mesh is 1, subsequent meshes should be 2, 3, etc.
Issue: Pickle/unpickle changes symbol identity¶
Cause: __getnewargs_ex__() not returning _uw_id.
Solution: Ensure __getnewargs_ex__ includes _uw_id in kwargs:
def __getnewargs_ex__(self):
return ((self.name,), {'_uw_id': self._uw_id})
Conclusion¶
The _uw_id mechanism provides clean, maintainable symbol disambiguation using SymPy’s native identity system. It eliminates the need for invisible whitespace hacks while ensuring correct behavior for multi-mesh simulations.
The related UWCoordinate isolation fix ensures coordinates are also mesh-specific, preventing cache pollution bugs. While this creates some friction for expression portability between meshes, it’s the safer default that prevents subtle bugs. Explicit substitution provides a clear path for mesh transfer when needed.