deepxde.optimizers.config module

deepxde.optimizers.config.set_LBFGS_options(maxcor=100, ftol=0, gtol=1e-08, maxiter=15000, maxfun=None, maxls=50)[source]

Sets the hyperparameters of L-BFGS.

The L-BFGS optimizer used in each backend:

I find empirically that torch.optim.LBFGS and scipy.optimize.minimize are better than tfp.optimizer.lbfgs_minimize in terms of the final loss value.

  • maxcor (int) – maxcor (scipy), num_correction_pairs (tfp), history_size (torch), history_size (paddle). The maximum number of variable metric corrections used to define the limited memory matrix. (The limited memory BFGS method does not store the full hessian but uses this many terms in an approximation to it.)

  • ftol (float) – ftol (scipy), f_relative_tolerance (tfp), tolerance_change (torch), tolerance_change (paddle). The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol.

  • gtol (float) – gtol (scipy), tolerance (tfp), tolerance_grad (torch), tolerance_grad (paddle). The iteration will stop when max{|proj g_i | i = 1, …, n} <= gtol where pg_i is the i-th component of the projected gradient.

  • maxiter (int) – maxiter (scipy), max_iterations (tfp), max_iter (torch), max_iter (paddle). Maximum number of iterations.

  • maxfun (int) – maxfun (scipy), max_eval (torch), max_eval (paddle). Maximum number of function evaluations. If None, maxiter * 1.25.

  • maxls (int) – maxls (scipy), max_line_search_iterations (tfp), maxls=0 disables line search and otherwise defaults to 25 (torch). Maximum number of line search steps (per iteration).


If L-BFGS stops earlier than expected, set the default float type to ‘float64’:

deepxde.optimizers.config.set_hvd_opt_options(compression=None, op=None, backward_passes_per_step=1, average_aggregated_gradients=False)[source]

Sets the parameters of hvd.DistributedOptimizer.

The default parameters are the same as for hvd.DistributedOptimizer.

  • compression – Compression algorithm used to reduce the amount of data sent and received by each worker node. Defaults to not using compression.

  • op – The reduction operation to use when combining gradients across different ranks. Defaults to Average.

  • backward_passes_per_step (int) – Number of backward passes to perform before calling hvd.allreduce. This allows accumulating updates over multiple mini-batches before reducing and applying them.

  • average_aggregated_gradients (bool) – Whether to average the aggregated gradients that have been accumulated over multiple mini-batches. If true divides gradient updates by backward_passes_per_step. Only applicable for backward_passes_per_step > 1.