PI-DeepONets with Zero Coordinate Shift (ZCS) ============================================= Zero Coordinate Shift (ZCS) is a low-level technique for maximizing memory and time efficiency of physics-informed DeepONets (`Leng et al., 2023 `__). In this tutorial, we will explain how to activate ZCS in an existing DeepXDE script. Usually, ZCS can reduce GPU memory consumption and wall time for training by an order of magnitude. Prerequisite ------------ Your current script can be easily equipped with ZCS if you are using - TensorFlow 2.x, PyTorch or Paddle as the backend (**use PyTorch for best performance**), - ``dde.data.PDEOperatorCartesianProd`` as the data class, and - ``dde.nn.DeepONetCartesianProd`` as the network class. Usage ----- Switching to ZCS requires two steps. Step 1: Replacing the classes in the following table ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +-------------------------------------------+------------------------------------------+ | **FROM** | **TO** | +===========================================+==========================================+ | ``deepxde.data.PDEOperatorCartesianProd`` | ``deepxde.zcs.PDEOperatorCartesianProd`` | +-------------------------------------------+------------------------------------------+ | ``deepxde.Model`` | ``deepxde.zcs.Model`` | +-------------------------------------------+------------------------------------------+ Step 2: Changing the PDE equation(s) to ZCS format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In DeepXDE, the user function for the PDE equation(s) is declared as .. code:: python def pde(x, u, v): # ... To use ZCS, we first create a ``deepxde.zcs.LazyGrad`` object, passing ``x`` and ``u`` as the arguments. The derivatives of ``u`` w.r.t. ``x`` at any higher ``orders`` can then be computed by ``LazyGrad.compute(orders)``. For example, the Laplace equation (:math:`u_{xx}+u_{yy}=0`) can be coded as .. code:: python def pde(x, u, v): grad_u = dde.zcs.LazyGrad(x, u) du_xx = grad_u.compute((2, 0)) du_yy = grad_u.compute((0, 2)) return du_xx + du_yy **Note**: ``deepxde.zcs.LazyGrad`` is smart enough to avoid re-calculating any lower-order derivatives if a higher-order one has been calculated based on them. For example, in the above function, if you add ``du_x = grad_u.compute((1, 0))`` *after* ``du_xx = grad_u.compute((2, 0))``, ``du_x`` will be returned instantly from a cache inside ``grad_u`` without extra computation. These are all you need! Example 1: diffusion reaction ----------------------------- In this example, we activate ZCS in the demo of diffusion reaction equation. The PDE is :math:`u_{t} - D u_{xx} + k u^2 -v=0`, which is implemented in `the original script `__ as .. code:: python def pde(x, u, v): D = 0.01 k = 0.01 du_t = dde.grad.jacobian(u, x, j=1) du_xx = dde.grad.hessian(u, x, j=0) return du_t - D * du_xx + k * u ** 2 - v In `the ZCS script `__, we change the PDE to (along with replacing the classes in Step 1) .. code:: python def pde(x, u, v): D = 0.01 k = 0.01 grad_u = dde.zcs.LazyGrad(x, u) du_t = grad_u.compute((0, 1)) du_xx = grad_u.compute((2, 0)) return du_t - D * du_xx + k * u ** 2 - v The GPU memory and wall time we measured on a Nvidia V100 (with CUDA 12.2) are reported below. For these measurements, we have increased the number of points in the domain from 200 to 4000, as 200 is likely to be insufficient for real applications. Time is measured for 1000 iterations. =========== ========== ============ ============= **BACKEND** **METHOD** **GPU / MB** **TIME / s** =========== ========== ============ ============= PyTorch Aligned 5779 186 \ Unaligned 5873 117 \ ZCS 655 11 TensorFlow Aligned 9205 73 (with jit) \ Unaligned 11694 70 (with jit) \ ZCS 591 35 (no jit\ :sup:`†`) Paddle Aligned 5805 197 \ Unaligned 6923 385 \ ZCS 1353 15 =========== ========== ============ ============= \ :sup:`†`\ ZCS with Jit is on our TODO list. Example 2: Stokes flow ---------------------- The Problem ^^^^^^^^^^^ In this example, we use a PI-DeepONet to approach the system of Stokes for fluids. The domain is a 2D square full of liquid, with its lid moving horizontally at a given variable speed. The full equations and boundary conditions are .. math:: \begin{aligned} \mu\left(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2}\right) - \frac{\partial p}{\partial x}=0, \quad & x\in (0,1), y\in(0, 1);\\ \mu\left(\frac{\partial^2 v}{\partial x^2} + \frac{\partial^2 v}{\partial y^2}\right) - \frac{\partial p}{\partial y}=0, \quad & x\in (0,1), y\in(0, 1);\\ \frac{\partial u}{\partial x} + \frac{\partial v}{\partial y}=0, \quad & x\in (0,1), y\in(0, 1);\\ u(x,1)=u_1(x), v(x,1)=0, \quad & x\in(0, 1);\\ u(x,0)=v(x,0)=p(x,0)=0, \quad & x\in(0, 1);\\ u(0,y)=v(0,y)=0, \quad & y\in(0, 1);\\ u(1,y)=v(1,y)=0, \quad & y\in(0, 1). \end{aligned} We attempt to learn an operator mapping from :math:`u_1(x)` to :math:`\{u, v, p\}(x, y)`, with :math:`u_1(x)` sampled from a Gaussian process. The true solution for validation is computed using `FreeFEM++ `__ following `this tutorial `__. PDE implementation ^^^^^^^^^^^^^^^^^^ Without ZCS, `the script with aligned points `__ implements the PDE as .. code:: python def pde(xy, uvp, aux): mu = 0.01 # first order du_x = dde.grad.jacobian(uvp, xy, i=0, j=0) dv_y = dde.grad.jacobian(uvp, xy, i=1, j=1) dp_x = dde.grad.jacobian(uvp, xy, i=2, j=0) dp_y = dde.grad.jacobian(uvp, xy, i=2, j=1) # second order du_xx = dde.grad.hessian(uvp, xy, component=0, i=0, j=0) du_yy = dde.grad.hessian(uvp, xy, component=0, i=1, j=1) dv_xx = dde.grad.hessian(uvp, xy, component=1, i=0, j=0) dv_yy = dde.grad.hessian(uvp, xy, component=1, i=1, j=1) motion_x = mu * (du_xx + du_yy) - dp_x motion_y = mu * (dv_xx + dv_yy) - dp_y mass = du_x + dv_y return motion_x, motion_y, mass Accordingly, `the script with ZCS `__ implements the PDE as .. code:: python def pde(xy, uvp, aux): mu = 0.01 u, v, p = uvp[..., 0:1], uvp[..., 1:2], uvp[..., 2:3] grad_u = dde.zcs.LazyGrad(xy, u) grad_v = dde.zcs.LazyGrad(xy, v) grad_p = dde.zcs.LazyGrad(xy, p) # first order du_x = grad_u.compute((1, 0)) dv_y = grad_v.compute((0, 1)) dp_x = grad_p.compute((1, 0)) dp_y = grad_p.compute((0, 1)) # second order du_xx = grad_u.compute((2, 0)) du_yy = grad_u.compute((0, 2)) dv_xx = grad_v.compute((2, 0)) dv_yy = grad_v.compute((0, 2)) motion_x = mu * (du_xx + du_yy) - dp_x motion_y = mu * (dv_xx + dv_yy) - dp_y mass = du_x + dv_y return motion_x, motion_y, mass Both of them should be self-explanatory. Results ^^^^^^^ After 50,000 iterations of training, the relative errors for both velocity and pressure should converge to around 10%. The following figure shows the true and the predicted solutions for :math:`u_1(x)=x(1-x)`. Note that ZCS does not affect the accuracy of the resultant model -- it just gears up the training while saving GPU memory. You may want to decrease the number of iterations for a quicker run. .. image:: stokes.png :width: 600 The memory and time measurements on a Nvidia A100 (80 GB, CUDA 12.2) are given below. Note that the wall time is measured for 100 iterations. =========== ========== ================= ================ **BACKEND** **METHOD** **GPU / MB** **TIME / s** =========== ========== ================= ================ PyTorch Aligned 70630 431 \ ZCS 4067 17 TensorFlow Aligned Failed\ :sup:`†` Failed\ :sup:`†` \ ZCS 8632 81 =========== ========== ================= ================ \ :sup:`†`\ ``Aligned`` failed with TensorFlow (v2.15.0) because graphing by ``@tf.function`` (either with ``jit_compile`` on or off) got stuck on both the two machines we tested on. If you manage to run it successfully, please report the results in an issue. Related code ------------ - `Diffusion reaction equation with aligned points `__ - `Diffusion reaction equation with unaligned points `__ - `Stokes flow with aligned points `__ - PI-DeepONets with Zero Coordinate Shift (ZCS) - `Diffusion reaction equation with aligned points using ZCS `__ - `Stokes flow with aligned points using ZCS `__