home

Adjoint optimization in RCWA

Performing adjoint optimization in Khepri. For starters, we are just optimizing a uniform slab.

Background

Inverse design techniques are very popular now in electrodynamics. Inverse design aims to produce designs $\vec p$ that minimize some $f(\vec p)$, often called loss or figure of merit. I usually rely on heuristic methods which are, in a nutshell, smart ways to randomly sample the space containing $\vec x$, usually $\mathbb{R}^N$. Among inverse design methods we also find the adjoint optimization method. This method optimizes designs much like a neural network optimizes its parameters during training. Adjoint optimization however extremely efficient as it doesn't rely on automatic differentiation. In brief, this method uses two simulations, one forward and a adjoint to compute the gradient of $f(\vec p)$ relative to the design parameters.

The chain rule

The first thing to do in order to perform gradient based optimization is to find an expression for the derivative of the figure of merit relative to the parameters

$$\frac{df}{d\vec p} = \frac{\partial f}{\partial E}\frac{\partial E}{\partial p}+\frac{df}{dE^\star}\frac{dE^\star}{dp}=2\Re\left[\frac{\partial f}{\partial E}\frac{\partial E}{\partial p}\right].$$

In this expression, $\frac{\partial f}{\partial E}$ is usually an analytical expression like the norm of the Poynting vector or the intensity of the fields over any spatial extent. Therefore, the first factor can be computed analytically or through automatic differentiation. The second term however demands more attention and requires the two simulations.

It is now useful to express Maxwell equations in Fourier space $$ \left[\nabla\times\nabla\times - k_0^2\varepsilon_r(\vec r)\right] E(\vec r) = -i\mu_0\omega J(\vec r), $$ leading to a linear system of the form $$ \Omega E=\alpha J. $$ By taking the derivative of this equation with respect to the design parameters, we obtain $$ \frac{d\Omega}{dp}E+\Omega\frac{dE}{dp}=0, $$ as the source term is independent of the parameters. This leads to an expression for $$ \frac{dE}{dp}=-\Omega^{-1}\frac{d\Omega}{dp}E. $$ By substitution, the full gradient equation becomes $$ \frac{df}{dp} = 2\Re\left[-\frac{df}{dE}\Omega^{-1}\frac{d\Omega}{dp}E\right]. $$ The adjoint simulation can be identified in the above expression as $$ E_\text{ADJ}^T=-\frac{df}{dE}\Omega^{-1} $$ $$ \Omega E_\text{ADJ}=-\frac{df}{dE}^T. $$ This adjoint simulation uses the right hand side of the last equation as source, using the same geometry.

Implementation using Khepri

We will first tackle a simple problem: we are going to optimize the dielectric constant of a uniform slab ($\varepsilon_s$) to maximize its transmission ($S_{12}$). The incident beam has normal incidence and is polarized along x.

First, the forward simulation is set up

wl = 1.1 # Wavelength doesn't matter but this one leads to pretty graph
def forward_cl(eps):
  clfwd = Crystal((1,1)) # Only (0,0) diffraction order
  clfwd.add_layer_uniform("U", eps, 0.7) # The slab of fixed height
  clfwd.set_device("U", True)
  clfwd.set_source(wl, 1, 0) # Simple X-polarized source
  clfwd.solve()
  return clfwd

There is not much to say about it, it is pretty standard, the only parameter being the $\varepsilon_s$ of the uniform slab. We also define a function for computing the adjoint crystal

# Adjoint simulation
def adjoint_cl(eps):
  cl = Crystal((1,1))
  cl.add_layer_uniform("U", eps, 0.7)
  cl.add_layer_uniform("B", 1, 0.5)
  cl.set_device("BU", True)
  cl.set_source(wl, np.nan, 0)
  cl.solve()
  return cl

The only original part is a $0.5a$ air buffer called B in front of the crystal. We will see why it exists in a few code blocks. The last function we need will compute the field $E_x$ where we need it. Notice how we override the incident wavevector of Crystal.set_source. This is because we need to pass normalize=False to use a specific value of $E_x$.

from khepri.alternative import incident
def get_field(cl, zmin, zmax, ex):
  x = y = np.array([0])
  x, y = np.meshgrid(x, y)
  z = np.linspace(zmin, zmax, 50)
  efield = incident((1,1), np.conj(ex), 0, 
    k_vector=(0, 0, 2 * np.pi / wl), normalize=False)
  hfield = np.zeros_like(efield)
  # Call fields_volume with the override of incident fields.
  E, _ = cl.fields_volume(x, y, z, incident_fields=(efield, hfield))
  return E[:, 0, :, 0] # Only take E_x

Finally, we can put everything together in the optimization loop:

eps = 8 # Initial value for epsilon
# Let's init some storages
profile, eps_vals, grads = [], [], []
epochs, lr = 300, 0.05
for i in range(epochs):
  # We start with the forward simulation, Ex=1
  # Pretty standard, we store the fields in the slab
  # from 0.01 to 0.7
  Efwd = get_field(clfwd := forward_cl(eps), 0.01, 0.7, 1)
  # We also get the probe whose intensity we will maximize
  # The probe is placed 0.5 after the slab.
  fwd_probe = clfwd.fields_volume(np.array([0]), np.array([0]), [0.7+0.5])[0][0, 0, 0]
  # We use the probed field as source for the adjoint simulation
  Ebck = get_field(adjoint_cl(eps), 0.5, 0.5+0.7, fwd_probe)
  # Flipping the simulation upside-down
  Ebck = np.flipud(Ebck)*1j
  # Compute df/dp
  dfdx = np.mean(-2*np.real(Efwd*Ebck)) * 4 * np.pi**2/wl**2
  # Gradient ascent
  eps += lr * dfdx
  eps_vals.append(eps)
  grads.append(0.05*dfdx)
  profile.append(np.abs(fwd_probe)**2)

From the above code, we see that we start with a value of $\varepsilon=8$, which is chosen because it is far from the optimum. The optimization is performed by gradient ascent, maximizing the value of the field at the probe point. Different archive arrays are used to track gradients, figure of merit and value of epsilon.

We can plot the different archives of performance using matplotlib.

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(6, 3))
ax1.plot(profile, label='R')
ax1.plot(grads[:], label='R')
ax2.plot(np.linspace(1, 12, 50), np.abs(E_eps)**2, label='R')
ax2.plot(eps_vals, profile, 'r.', markersize=1)
ax2.plot(eps_vals, grads, 'g-', markersize=1)
plt.legend()

We obtain the following figures. In the left subplot, we show the figure of merit $|E|^2$ function of the optimizer iterations or epochs. As the optimizer iterates, we see this metric raising steadily to reach an optimum value of $0.25$ which corresponds to roughly $100\%$ transmission. On the right hand side, the figure of merit is charted for a wide range of $\varepsilon$. This figure therefore represents $f(p)$. We can do this as the problem has only one parameter and is extremely quick to evaluate. The red crosses represent the trajectory of the optimizer on this curve.

Adjoint optimization results for our slab.

Implementation of depth-varying dielectric profile

Now that we have a working implementation of the uniform slab, we can extend our implementation to $\varepsilon(z)$. Surprisingly, there is not much to do from our previous example.

The following figure shows on the right the now minimized figure of merit (minimizing transmission). The figure on the left shows the evolution, from bottom to top of the z-varying dielectric profile. We start from an initial uniform slab and end up with a bragg mirror. Indeed, the resulting period is approximately $\Lambda\approx0.19$ in reduced units with a thinner high index region. The distributed Bragg mirror predicts a large bandgap $$ \frac{\Delta f_0}{f_0}=\frac{4}{\pi}\arcsin{\frac{n_2-n_1}{n_1+n_2}}\approx 1.28 $$ which is centered around, from eyeballing the figure,

$$ \frac{\lambda_\text{gap}}{4} = nd = n_1d_1+n_2d_2\approx0.05\times3.46+0.14\times1.0=0.313 $$ then $\lambda_\text{gap}\approx1.252$, which, given $\Delta f_0$ easily covers our wavelength $\lambda=1.1$.

Adjoint optimization results for our z-varying slab.

We can observe the transmission spectra to convince ourselves.

home