PI-DeepONets with Zero Coordinate Shift (ZCS)

Zero Coordinate Shift (ZCS) is a low-level technique for maximizing memory and time efficiency of physics-informed DeepONets (Leng et al., 2023). In this tutorial, we will explain how to activate ZCS in an existing DeepXDE script. Usually, ZCS can reduce GPU memory consumption and wall time for training by an order of magnitude.

Prerequisite

Your current script can be easily equipped with ZCS if you are using

  • TensorFlow 2.x, PyTorch or Paddle as the backend (use PyTorch for best performance),

  • dde.data.PDEOperatorCartesianProd as the data class, and

  • dde.nn.DeepONetCartesianProd as the network class.

Usage

Switching to ZCS requires two steps.

Step 1: Replacing the classes in the following table

FROM

TO

deepxde.data.PDEOperatorCartesianProd

deepxde.zcs.PDEOperatorCartesianProd

deepxde.Model

deepxde.zcs.Model

Step 2: Changing the PDE equation(s) to ZCS format

In DeepXDE, the user function for the PDE equation(s) is declared as

def pde(x, u, v):
    # ...

To use ZCS, we first create a deepxde.zcs.LazyGrad object, passing x and u as the arguments. The derivatives of u w.r.t. x at any higher orders can then be computed by LazyGrad.compute(orders). For example, the Laplace equation (\(u_{xx}+u_{yy}=0\)) can be coded as

def pde(x, u, v):
    grad_u = dde.zcs.LazyGrad(x, u)
    du_xx = grad_u.compute((2, 0))
    du_yy = grad_u.compute((0, 2))
    return du_xx + du_yy

Note: deepxde.zcs.LazyGrad is smart enough to avoid re-calculating any lower-order derivatives if a higher-order one has been calculated based on them. For example, in the above function, if you add du_x = grad_u.compute((1, 0)) after du_xx = grad_u.compute((2, 0)), du_x will be returned instantly from a cache inside grad_u without extra computation.

These are all you need!

Example 1: diffusion reaction

In this example, we activate ZCS in the demo of diffusion reaction equation. The PDE is \(u_{t} - D u_{xx} + k u^2 -v=0\), which is implemented in the original script as

def pde(x, u, v):
    D = 0.01
    k = 0.01
    du_t = dde.grad.jacobian(u, x, j=1)
    du_xx = dde.grad.hessian(u, x, j=0)
    return du_t - D * du_xx + k * u ** 2 - v

In the ZCS script, we change the PDE to (along with replacing the classes in Step 1)

def pde(x, u, v):
    D = 0.01
    k = 0.01
    grad_u = dde.zcs.LazyGrad(x, u)
    du_t = grad_u.compute((0, 1))
    du_xx = grad_u.compute((2, 0))
    return du_t - D * du_xx + k * u ** 2 - v

The GPU memory and wall time we measured on a Nvidia V100 (with CUDA 12.2) are reported below. For these measurements, we have increased the number of points in the domain from 200 to 4000, as 200 is likely to be insufficient for real applications. Time is measured for 1000 iterations.

BACKEND

METHOD

GPU / MB

TIME / s

PyTorch

Aligned

5779

186

Unaligned

5873

117

ZCS

655

11

TensorFlow

Aligned

9205

73 (with jit)

Unaligned

11694

70 (with jit)

ZCS

591

35 (no jit)

Paddle

Aligned

5805

197

Unaligned

6923

385

ZCS

1353

15

ZCS with Jit is on our TODO list.

Example 2: Stokes flow

The Problem

In this example, we use a PI-DeepONet to approach the system of Stokes for fluids. The domain is a 2D square full of liquid, with its lid moving horizontally at a given variable speed. The full equations and boundary conditions are

\[\begin{split}\begin{aligned} \mu\left(\frac{\partial^2 u}{\partial x^2} + \frac{\partial^2 u}{\partial y^2}\right) - \frac{\partial p}{\partial x}=0, \quad & x\in (0,1), y\in(0, 1);\\ \mu\left(\frac{\partial^2 v}{\partial x^2} + \frac{\partial^2 v}{\partial y^2}\right) - \frac{\partial p}{\partial y}=0, \quad & x\in (0,1), y\in(0, 1);\\ \frac{\partial u}{\partial x} + \frac{\partial v}{\partial y}=0, \quad & x\in (0,1), y\in(0, 1);\\ u(x,1)=u_1(x), v(x,1)=0, \quad & x\in(0, 1);\\ u(x,0)=v(x,0)=p(x,0)=0, \quad & x\in(0, 1);\\ u(0,y)=v(0,y)=0, \quad & y\in(0, 1);\\ u(1,y)=v(1,y)=0, \quad & y\in(0, 1). \end{aligned}\end{split}\]

We attempt to learn an operator mapping from \(u_1(x)\) to \(\{u, v, p\}(x, y)\), with \(u_1(x)\) sampled from a Gaussian process. The true solution for validation is computed using FreeFEM++ following this tutorial.

PDE implementation

Without ZCS, the script with aligned points implements the PDE as

def pde(xy, uvp, aux):
    mu = 0.01
    # first order
    du_x = dde.grad.jacobian(uvp, xy, i=0, j=0)
    dv_y = dde.grad.jacobian(uvp, xy, i=1, j=1)
    dp_x = dde.grad.jacobian(uvp, xy, i=2, j=0)
    dp_y = dde.grad.jacobian(uvp, xy, i=2, j=1)
    # second order
    du_xx = dde.grad.hessian(uvp, xy, component=0, i=0, j=0)
    du_yy = dde.grad.hessian(uvp, xy, component=0, i=1, j=1)
    dv_xx = dde.grad.hessian(uvp, xy, component=1, i=0, j=0)
    dv_yy = dde.grad.hessian(uvp, xy, component=1, i=1, j=1)
    motion_x = mu * (du_xx + du_yy) - dp_x
    motion_y = mu * (dv_xx + dv_yy) - dp_y
    mass = du_x + dv_y
    return motion_x, motion_y, mass

Accordingly, the script with ZCS implements the PDE as

def pde(xy, uvp, aux):
    mu = 0.01
    u, v, p = uvp[..., 0:1], uvp[..., 1:2], uvp[..., 2:3]
    grad_u = dde.zcs.LazyGrad(xy, u)
    grad_v = dde.zcs.LazyGrad(xy, v)
    grad_p = dde.zcs.LazyGrad(xy, p)
    # first order
    du_x = grad_u.compute((1, 0))
    dv_y = grad_v.compute((0, 1))
    dp_x = grad_p.compute((1, 0))
    dp_y = grad_p.compute((0, 1))
    # second order
    du_xx = grad_u.compute((2, 0))
    du_yy = grad_u.compute((0, 2))
    dv_xx = grad_v.compute((2, 0))
    dv_yy = grad_v.compute((0, 2))
    motion_x = mu * (du_xx + du_yy) - dp_x
    motion_y = mu * (dv_xx + dv_yy) - dp_y
    mass = du_x + dv_y
    return motion_x, motion_y, mass

Both of them should be self-explanatory.

Results

After 50,000 iterations of training, the relative errors for both velocity and pressure should converge to around 10%. The following figure shows the true and the predicted solutions for \(u_1(x)=x(1-x)\). Note that ZCS does not affect the accuracy of the resultant model – it just gears up the training while saving GPU memory. You may want to decrease the number of iterations for a quicker run.

../../_images/stokes.png

The memory and time measurements on a Nvidia A100 (80 GB, CUDA 12.2) are given below. Note that the wall time is measured for 100 iterations.

BACKEND

METHOD

GPU / MB

TIME / s

PyTorch

Aligned

70630

431

ZCS

4067

17

TensorFlow

Aligned

Failed

Failed

ZCS

8632

81

Aligned failed with TensorFlow (v2.15.0) because graphing by @tf.function (either with jit_compile on or off) got stuck on both the two machines we tested on. If you manage to run it successfully, please report the results in an issue.