Helmholtz equation over a 2D square domain: Hyper-parameter optimization

Finding proper hyper-parameters for PINNs infrastructures is a common issue for practicioners. To remedy this concern, we apply hyper-parameter optimization (HPO) via Gaussian processes (GP)-based Bayesian optimization.

This example is issued from: Hyper-parameter tuning of physics-informed neural networks: Application to Helmholtz problems, [Neurocomputing].

Notice that this script can be easilly adapted to other examples (either forward or inverse problems).

More scripts are available at HPOMax GitHub repository.

Problem setup

We consider the same setting as in Helmholtz equation over a 2D square domain.

We apply GP-based Bayesian optimization via scikit-optimize (see documentation) over 50 calls. We use the Expected Improvement as acquisition function, define the minimum test error for each call as the (outer) loss function for the HPO.

We optimize the following hyper-parameters:

  • Learning rate \(\alpha\);

  • Width \(N\): number of nodes per layer;

  • Depth \(L − 1\): number of dense layers;

  • Activation function \(\sigma\).

We define every configuration as:

\[\lambda := [\alpha, N, L-1, \sigma]\]

and start with an initial setting \(\lambda_0 := [1e-3, 4, 50, \sin]\).

Implementation

We highlight the most important parts of the code. At each iteration, the HPO defines a model and trains it. Therefore, we define:

def create_model(config):
    # Define the model
    return model

which sets the model for a given configuration \(\lambda\). Next, we define:

def train_model(model, config):
    # Train the model
    # Define the metric we seek to optimize
    return error

which allows to obtain the HPO loss for each configuration. In our case, we seek at minimizing the best test error. We are ready to define the search space and default parameters:

dim_learning_rate = Real(low=1e-4, high=5e-2, name="learning_rate", prior="log-uniform")
dim_num_dense_layers = Integer(low=1, high=10, name="num_dense_layers")
dim_num_dense_nodes = Integer(low=5, high=500, name="num_dense_nodes")
dim_activation = Categorical(categories=["sin", "sigmoid", "tanh"], name="activation")

dimensions = [
    dim_learning_rate,
    dim_num_dense_layers,
    dim_num_dense_nodes,
    dim_activation,
]

default_parameters = [1e-3, 4, 50, "sin"]

Next, we define the fitness function, which is an input to gp_minimize:

@use_named_args(dimensions=dimensions)
def fitness(learning_rate, num_dense_layers, num_dense_nodes, activation):

    config = [learning_rate, num_dense_layers, num_dense_nodes, activation]
    global ITERATION

    print(ITERATION, "it number")
    # Print the hyper-parameters.
    print("learning rate: {0:.1e}".format(learning_rate))
    print("num_dense_layers:", num_dense_layers)
    print("num_dense_nodes:", num_dense_nodes)
    print("activation:", activation)
    print()

    # Create the neural network with these hyper-parameters.
    model = create_model(config)
    # possibility to change where we save
    error = train_model(model, config)
    # print(accuracy, 'accuracy is')

    if np.isnan(error):
        error = 10**5

    ITERATION += 1
    return error

The test error can yield nan values. We replace this value by a overkill value of 10**5. Finally, we apply the GP-based HPO and plot the convergence results:

ITERATION = 0

search_result = gp_minimize(
    func=fitness,
    dimensions=dimensions,
    acq_func="EI",  # Expected Improvement.
    n_calls=n_calls,
    x0=default_parameters,
    random_state=1234,
)

search_result.x

plot_convergence(search_result)
plot_objective(search_result, show_points=True, size=3.8)

Complete code

"""Backend supported: tensorflow.compat.v1, tensorflow, pytorch, paddle"""

import deepxde as dde
from matplotlib import pyplot as plt
import numpy as np
import skopt
from skopt import gp_minimize
from skopt.plots import plot_convergence, plot_objective
from skopt.space import Real, Categorical, Integer
from skopt.utils import use_named_args

# Function 'gp_minimize' of package 'skopt(scikit-optimize)' is used in this example.
# However 'np.int' used in skopt 0.9.0(the latest version) was deprecated since NumPy 1.20.
# Monkey patch here to fix the error.
np.int = int

if dde.backend.backend_name == "pytorch":
    sin = dde.backend.pytorch.sin
elif dde.backend.backend_name == "paddle":
    sin = dde.backend.paddle.sin
else:
    from deepxde.backend import tf

    sin = tf.sin

# General parameters
d = 2
n = 2
k0 = 2 * np.pi * n
precision_train = 10
precision_test = 30
iterations = 10000


def pde(x, y):
    dy_xx = dde.grad.hessian(y, x, i=0, j=0)
    dy_yy = dde.grad.hessian(y, x, i=1, j=1)
    f = (d - 1) * k0**2 * sin(k0 * x[:, 0:1]) * sin(k0 * x[:, 1:2])
    return -dy_xx - dy_yy - k0**2 * y - f


def func(x):
    return np.sin(k0 * x[:, 0:1]) * np.sin(k0 * x[:, 1:2])


def transform(x, y):
    res = x[:, 0:1] * (1 - x[:, 0:1]) * x[:, 1:2] * (1 - x[:, 1:2])
    return res * y


def boundary(_, on_boundary):
    return on_boundary


def create_model(config):
    learning_rate, num_dense_layers, num_dense_nodes, activation = config

    geom = dde.geometry.Rectangle([0, 0], [1, 1])
    k0 = 2 * np.pi * n
    wave_len = 1 / n

    hx_train = wave_len / precision_train
    nx_train = int(1 / hx_train)

    hx_test = wave_len / precision_test
    nx_test = int(1 / hx_test)

    data = dde.data.PDE(
        geom,
        pde,
        [],
        num_domain=nx_train**d,
        num_boundary=2 * d * nx_train,
        solution=func,
        num_test=nx_test**d,
    )

    net = dde.maps.FNN(
        [d] + [num_dense_nodes] * num_dense_layers + [1],
        activation,
        "Glorot uniform",
    )

    net.apply_output_transform(transform)

    model = dde.Model(data, net)
    model.compile("adam", lr=learning_rate, metrics=["l2 relative error"])
    return model


def train_model(model, config):
    losshistory, train_state = model.train(iterations=iterations)
    train = np.array(losshistory.loss_train).sum(axis=1).ravel()
    test = np.array(losshistory.loss_test).sum(axis=1).ravel()
    metric = np.array(losshistory.metrics_test).sum(axis=1).ravel()

    error = test.min()
    return error


# HPO setting
n_calls = 50
dim_learning_rate = Real(low=1e-4, high=5e-2, name="learning_rate", prior="log-uniform")
dim_num_dense_layers = Integer(low=1, high=10, name="num_dense_layers")
dim_num_dense_nodes = Integer(low=5, high=500, name="num_dense_nodes")
dim_activation = Categorical(categories=["sin", "sigmoid", "tanh"], name="activation")

dimensions = [
    dim_learning_rate,
    dim_num_dense_layers,
    dim_num_dense_nodes,
    dim_activation,
]

default_parameters = [1e-3, 4, 50, "sin"]


@use_named_args(dimensions=dimensions)
def fitness(learning_rate, num_dense_layers, num_dense_nodes, activation):
    config = [learning_rate, num_dense_layers, num_dense_nodes, activation]
    global ITERATION

    print(ITERATION, "it number")
    # Print the hyper-parameters.
    print("learning rate: {0:.1e}".format(learning_rate))
    print("num_dense_layers:", num_dense_layers)
    print("num_dense_nodes:", num_dense_nodes)
    print("activation:", activation)
    print()

    # Create the neural network with these hyper-parameters.
    model = create_model(config)
    # possibility to change where we save
    error = train_model(model, config)
    # print(accuracy, 'accuracy is')

    if np.isnan(error):
        error = 10**5

    ITERATION += 1
    return error


ITERATION = 0

search_result = gp_minimize(
    func=fitness,
    dimensions=dimensions,
    acq_func="EI",  # Expected Improvement.
    n_calls=n_calls,
    x0=default_parameters,
    random_state=1234,
)

print(search_result.x)

plot_convergence(search_result)
plot_objective(search_result, show_points=True, size=3.8)