The Apparent Pareto Front of
Physics-Informed Neural Network

F. M. Rohrhofer, S. Posch, C. Gößnitzer and B. C. Geiger,
Data vs. Physics: The Apparent Pareto Front
of Physics-Informed Neural Networks,
in IEEE Access, vol. 11, 2023

Exam Presentation
Simulation Intelligence Course
SDIC Master Degree, University of Trieste (UniTS)

Marco Tallone
June 2025

Table of Contents

  1. MULTI-OBJECTIVE OPTIMIZATION IN PINNs
  2. CASE STUDY EXAMPLES
  3. EFFECT OF PARAMETERS ON LOSS RESIDUALS
  4. OPTIMAL CHOICE OF LOSS WEIGHTS

OBJECTIVE

Understand how parameters of the physical system and coefficients of differential equations affect the multi-objective optimization problem and the optimal choice of loss weights in PINNs

Multi-Objective Optimization in PINNs

PINNs Formulation

Consider the general parametrized PDE of the form:

\[ \frac{\partial}{\partial t} u(t, x) + \mathcal{F}[u, \lambda] = 0, \quad x \in \Omega, \quad t \in [0, T] \tag{2} \]

PINNs aim to approximate the solution $u(t, x)$ by a neural network $u_\theta(t, x)$,
where $\theta$ are the trainable parameters of the network

MO Optimization Problem

PINNs training can be seen as a multi-objective (MO) optimization problem, given the need to minimize multiple loss functions simultaneously, related to either data or physics

$\Rightarrow$ gradient-based optimization of a linear scalarized MO loss function:

\[ \min_{\theta} \sum_i \alpha_i \mathcal{L}_i(\theta) \tag{1} \]

where $\alpha_i$ are loss weights that balance the contributions of each loss function $\mathcal{L}_i(\theta)$

MO Optimization Problem

In vanilla (unweighted) PINNs, $\alpha_i = 1, \quad \forall i$

PROBLEM

possible optimization failures with stall minimization of some loss terms and poor predictions

\[\downarrow\]

💡 IDEA: understand how physical parameters affect the individual loss terms $\mathcal{L}_i(\theta)$ and find optimal wieghts $\alpha_i$ to balance the contributions and compensate different scales

Data and Physics Loss Functions

Data Loss Function

\[ \mathcal{L}_{\mathcal{D}}(\theta) = \frac{1}{N_{\mathcal{D}}} \sum_{i=1}^{N_{\mathcal{D}}} |e_i|^2 \tag{3a} \] \[ e_i = u_\theta(t_i, x_i) - u_i \tag{3b} \]

with data residuals $e_i$ evaluated
on a labelled dataset (IC and BC): \[ \{(t_i, x_i), u_i\}_{i=1}^{N_{\mathcal{D}}} \]

Physics Loss Function

\[ \mathcal{L}_{\mathcal{F}}(\theta) = \frac{1}{N_{\mathcal{F}}} \sum_{i=1}^{N_{\mathcal{F}}} |f_i|^2 \tag{4a} \] \[ f_i = \frac{\partial}{\partial t} u_\theta(t_i, x_i) + \mathcal{F}[u_\theta, \lambda]\quad \tag{4b} \]

with physics residuals $f_i$ evaluated
on an unlabelled dataset (collocation points): \[ \{(t_i, x_i)\}_{i=1}^{N_{\mathcal{F}}} \]

Data and Physics Loss Functions

For simplicity: consider only data loss and physics loss terms
and a single loss weight $\alpha$ to balance between the two:

\[ \mathcal{L}(\theta; \alpha) = \alpha \mathcal{L}_{\mathcal{D}}(\theta) + (1 - \alpha) \mathcal{L}_{\mathcal{F}}(\theta), \quad \alpha \in ]0, 1[ \tag{5} \]

with:

  • $\alpha \to 1$ favoring data loss
  • $\alpha \to 0$ favoring physics loss

Apparent Pareto Front

Pareto Front: set of all possible Pareto optimal solutions attained by continuous adjustment of loss weights $\alpha$

Gradient descent converges to local optima $\Rightarrow$ approximations of Pareto optimal solutions

$\downarrow$

Pareto Optimal Solutions
Apparent Pareto Front: set of all (sub-optimal) loss values achievable with gradient based optimization for different values of $\alpha$

Case Study Examples

Diffusion Example

Consider the classical heat equation:

\[ \frac{\partial}{\partial t} u(t, x) - \kappa \frac{\partial^2}{\partial x^2} u(t, x) = 0, \quad x \in [0, L], \quad t \in [0, T] \tag{6} \]

where $u$ is the temperature, $\kappa$ is the thermal diffusivity, and $L$ is the length of the rod,
then define the following initial (IC) and boundary conditions (BC):

\[ \text{IC}: \quad u(0, x) = \sin(\pi \frac{x}{L}), \quad x \in [0, L] \tag{7} \] \[ \text{BC}: \quad u(t, 0) = u(t, L) = 0, \quad t \in [0, T] \tag{8} \]

Diffusion Example

For simplicity, we define the characteristic diffusive time scale as: \[ \tau = \frac{L^2}{\kappa} \tag{9} \] Hence, consider simulation time in multiples of $\tau$: $\quad T = \lambda \tau, \quad \lambda \in \mathbb{R}^+$

Such problem is well-posed with unique analytical solution (used as ground-truth):

\[ u(t, x) = e^{-\pi^2 t / \tau} \sin(\pi \frac{x}{L}) \tag{10} \]

Diffusion Example

Diffusion Example Solution

Diffusion Example

Data Loss Function

\[ \mathcal{L}_{\mathcal{D}}(\theta) = \frac{1}{N_{\mathcal{D}}} \sum_{i=1}^{N_{\mathcal{D}}} |u_\theta(t_i, x_i) - u(t_i, x_i)|^2 \]

on IC and BC Dataset ✖️: $\{(t_i, x_i), u(t_i, x_i)\}_{i=1}^{N_{\mathcal{D}}}$

Physics Loss Function

\[ \mathcal{L}_{\mathcal{F}}(\theta) = \frac{1}{N_{\mathcal{F}}} \sum_{i=1}^{N_{\mathcal{F}}} \left| \frac{\partial u_\theta(t_i, x_i)}{\partial t} - \kappa \frac{\partial^2 u_\theta(t_i, x_i)}{\partial x^2} \right|^2 \]

on Collocation Dataset ⚪: $\{(t_i, x_i)\}_{i=1}^{N_{\mathcal{F}}}$

$\Rightarrow$ changing $\kappa$ and $L$, we can study how different system
parameterizations affect data loss and physics loss terms

Navier-Stokes Example

Consider the Navier-Stokes equations in the case of a 2D incompressible steady-state flow:

\[ (\mathbf{u} \cdot \nabla) \mathbf{u} + \frac{1}{\rho} \nabla p - \nu \nabla^2 \mathbf{u} = 0 \quad \text{(conservation of momentum)} \tag{11a} \] \[ \nabla \cdot \mathbf{u} = 0 \quad \text{(continuity equation)}^{\textcolor{red}{*}} \tag{11b} \]

where:

  • $\mathbf{u} = (u, v)$: velocity vector field
  • $p$: pressure field
  • $\rho$: fluid density (constant)
  • $\nu$: kinematic viscosity (constant)

$^{\textcolor{red}{*}}$ can be directly hard coded in PINNs

Navier-Stokes Example

To obtain a ground-truth for comparison, we consider a particular case of a laminar fluid flow in a 2D grid (Kovasznay flow) at constant $\rho$. This has the analytical solution $(12)$:

\[ u(x, y) = u_0 \left(1 - e^{\gamma \frac{x}{L}} \cos\left(\frac{2 \pi y}{L}\right)\right) \]
\[ v(x, y) = \frac{u_0\gamma}{2 \pi} e^{\gamma \frac{x}{L}} \sin\left(\frac{2 \pi y}{L}\right) \]
\[ p(x, y) = u_0^2 e^{2 \gamma \frac{x}{L}} + C \]

where:

  • $L$: characteristic spacing of the 2D grid
  • $u_0$: the mean velocity in the $x$ direction
  • $C$: an arbitrary constant
  • $\gamma = \frac{1}{2\nu} - \sqrt{\frac{1}{4\nu^2} + 4\pi^2}$

Navier-Stokes Example

Other assumptions:

  • fixed Reynolds number: $\text{Re} = \frac{u_0 L}{\nu}$
  • $u_0=\frac{1}{L}$ $\Rightarrow$ same flow for any grid size $L$
  • fixed grid $x \in [-L, L], y \in [-L, L]$

⚠️ Note:
flow is steady state
$\Rightarrow$ no initial conditions (IC)
Navier-Stokes Example Solution

Navier-Stokes Example

Data Loss Function

\[ \mathcal{L}_{\mathcal{D}}(\theta) = \frac{1}{N_{\mathcal{D}}} \sum_{i=1}^{N_{\mathcal{D}}} |\mathbf{u}_\theta(t_i, x_i) - \mathbf{u}(t_i, x_i)|^2 \]

on BC Dataset ✖️: $\{(t_i, x_i), \mathbf{u}(t_i, x_i)\}_{i=1}^{N_{\mathcal{D}}}$

Physics Loss Function

\[ \mathcal{L}_{\mathcal{F}}(\theta) = \frac{1}{N_{\mathcal{F}}} \sum_{i=1}^{N_{\mathcal{F}}} \left| (\mathbf{u}_{\theta, i} \cdot \nabla) \mathbf{u}_{\theta, i} + \frac{1}{\rho} \nabla p_{\theta, i} - \nu \nabla^2 \mathbf{u}_{\theta, i} \right|^2 \]

on Collocation Dataset ⚪:
$\{(t_i, x_i)\}_{i=1}^{N_{\mathcal{F}}}$

$\Rightarrow$ changing $L$ we study how different parameterizations affect data loss and physics loss

⚠️ Note: pressure is entirely learned by the PINN and inferred from physics residuals only

Effect of Parameters on Loss Residuals

Parameters in Heat Equation

Consider the input ranges scaled to be in unitary interval: \[ \hat{t} = \frac{t}{T} \quad\quad \text{and} \quad\quad \hat{x} = \frac{x}{L} \] Recalling that $T = \lambda \tau = \lambda \frac{L^2}{\kappa}$, we rewrite physics residuals:

\[ \begin{aligned} f_i &= \frac{1}{T} \frac{\partial}{\partial \hat{t}} u_{\theta}(\hat{t}_i, \hat{x}_i) - \frac{\kappa}{L^2} \frac{\partial^2}{\partial \hat{x}^2} u_{\theta}(\hat{t}_i, \hat{x}_i)\\ &= \frac{\kappa}{L^2} \left( \frac{1}{\lambda} \frac{\partial}{\partial \hat{t}} u_{\theta}(\hat{t}_i, \hat{x}_i) - \frac{\partial^2}{\partial \hat{x}^2} u_{\theta}(\hat{t}_i, \hat{x}_i) \right) \end{aligned} \]

Parameters in Heat Equation

Physics residuals are scaled by a factor of $\frac{\kappa}{L^2}$

Note that data residuals always $e_i \sim 1$ due to chosen IC.

$\Rightarrow$ data and physics residuals scale differently according to ratio:

\[ \frac{f_i}{e_i} \sim \frac{\kappa}{L^2} \tag{17} \]

with:

  • $\frac{\kappa}{L^2} \ll 1$ scale of data residuals predominant
  • $\frac{\kappa}{L^2} \gg 1$ scale of physics residuals predominant

Parameters in Navier-Stokes Equations

Consider the input ranges scaled to be in $[-1,+1]$: \[ \hat{x} = \frac{x}{L} \quad\quad \text{and} \quad\quad \hat{y} = \frac{y}{L} \quad\quad \text{thus} \quad\quad \hat{\nabla} = L \nabla \] Then, consider velocities and pressures in terms of scaled quantities: \[ \hat{\mathbf{u}} = \frac{\mathbf{u}}{u_0} \quad\quad \text{and} \quad\quad \hat{p} = \frac{p}{\rho u_0^2} \] Recalling $u_0 = \frac{1}{L}$, we rewrite physics residuals:

\[ \begin{aligned} f_i &= \frac{u_0^2}{L} \left( \hat{\mathbf{u}}_{\theta} \cdot \hat{\nabla} \right) \hat{\mathbf{u}}_{\theta} + \frac{u_0^2}{L} \hat{\nabla} \hat{p}_{\theta} - \nu \frac{u_0}{L^2} \hat{\nabla}^2 \hat{\mathbf{u}}_{\theta}\\ &= \frac{1}{L^3} \left( \left(\hat{\mathbf{u}}_{\theta} \cdot \hat{\nabla} \right) \hat{\mathbf{u}}_{\theta} + \hat{\nabla} \hat{p}_{\theta} - \frac{\nu}{u_0} \hat{\nabla}^2 \hat{\mathbf{u}}_{\theta} \right) \end{aligned} \]

Parameters in Navier-Stokes Equations

Physics residuals are scaled by a factor of $\frac{1}{L^3}$

Data residuals are scaled by a factor of $e_i \sim \frac{1}{L}$ since $e_i = u_0 ( \hat{\mathbf{u}}_{\theta}(\hat{x}_i, \hat{y}_i) - \hat{\mathbf{u}}_i)$

$\Rightarrow$ data and physics residuals scale differently according to ratio:

\[ \frac{f_i}{e_i} \sim \frac{1}{L^2} \tag{22} \]

with:

  • $\frac{1}{L^2} \ll 1$ scale of data residuals predominant
  • $\frac{1}{L^2} \gg 1$ scale of physics residuals predominant

Apparent Pareto Front
and Optimal Choice of Loss Weights

PINN Set-Up

$1$. PINNs parameters and training details:

Network Structure:
$4$ hidden layers
$50$ neurons each
Activations:
$\tanh()$ in hidden
$\text{linear}$ in output
Optimizer: ADAM
learning rate $10^{-2}$
$10^5$ training epochs
Loss Weights:
$\alpha \in \{0.001, 0.01, 0.1,$
$0.5, 0.9, 0.99, 0.999\}$

$2$. Training datasets sizes:

  • Diffusion:              $N_{\mathcal{D}} = 384$ boundary points, $N_{\mathcal{F}} = 1024$ collocation points
  • Navier-Stokes:   $N_{\mathcal{D}} = 512$ boundary points, $N_{\mathcal{F}} = 1024$ collocation points

$3$. Evaluation metric: relative $\ell_2$ error on independent test set of $N_{\mathcal{T}} = 1024$ points:

\[ \varepsilon_u = \frac{||u - u_{\theta}||_2}{||u||_2} \tag{23} \]

Diffusion Example Results

Diffusion Example Results $\frac{\kappa}{L^2} \ll 1$ data predominant            $\frac{\kappa}{L^2} \gg 1$ physics predominant

Navier-Stokes Example Results

Diffusion Example Results $\frac{1}{L^2} \ll 1$ data predominant            $\frac{1}{L^2} \gg 1$ physics predominant