Performing adjoint optimization in Khepri. For starters, we are just optimizing a uniform slab.
Inverse design techniques are very popular now in electrodynamics. Inverse design aims to produce designs $\vec p$ that minimize some $f(\vec p)$, often called loss or figure of merit. I usually rely on heuristic methods which are, in a nutshell, smart ways to randomly sample the space containing $\vec x$, usually $\mathbb{R}^N$. Among inverse design methods we also find the adjoint optimization method. This method optimizes designs much like a neural network optimizes its parameters during training. Adjoint optimization however extremely efficient as it doesn't rely on automatic differentiation. In brief, this method uses two simulations, one forward and a adjoint to compute the gradient of $f(\vec p)$ relative to the design parameters.
The first thing to do in order to perform gradient based optimization is to find an expression for the derivative of the figure of merit relative to the parameters
$$\frac{df}{d\vec p} = \frac{\partial f}{\partial E}\frac{\partial E}{\partial p}+\frac{df}{dE^\star}\frac{dE^\star}{dp}=2\Re\left[\frac{\partial f}{\partial E}\frac{\partial E}{\partial p}\right].$$
In this expression, $\frac{\partial f}{\partial E}$ is usually an analytical expression like the norm of the Poynting vector or the intensity of the fields over any spatial extent. Therefore, the first factor can be computed analytically or through automatic differentiation. The second term however demands more attention and requires the two simulations.
It is now useful to express Maxwell equations in Fourier space $$ \left[\nabla\times\nabla\times - k_0^2\varepsilon_r(\vec r)\right] E(\vec r) = -i\mu_0\omega J(\vec r), $$ leading to a linear system of the form $$ \Omega E=\alpha J. $$ By taking the derivative of this equation with respect to the design parameters, we obtain $$ \frac{d\Omega}{dp}E+\Omega\frac{dE}{dp}=0, $$ as the source term is independent of the parameters. This leads to an expression for $$ \frac{dE}{dp}=-\Omega^{-1}\frac{d\Omega}{dp}E. $$ By substitution, the full gradient equation becomes $$ \frac{df}{dp} = 2\Re\left[-\frac{df}{dE}\Omega^{-1}\frac{d\Omega}{dp}E\right]. $$ The adjoint simulation can be identified in the above expression as $$ E_\text{ADJ}^T=-\frac{df}{dE}\Omega^{-1} $$ $$ \Omega E_\text{ADJ}=-\frac{df}{dE}^T. $$ This adjoint simulation uses the right hand side of the last equation as source, using the same geometry.
We will first tackle a simple problem: we are going to optimize the dielectric constant of a uniform slab ($\varepsilon_s$) to maximize its transmission ($S_{12}$). The incident beam has normal incidence and is polarized along x.
First, the forward simulation is set up
wl = 1.1 # Wavelength doesn't matter but this one leads to pretty graph
def forward_cl(eps):
clfwd = Crystal((1,1)) # Only (0,0) diffraction order
clfwd.add_layer_uniform("U", eps, 0.7) # The slab of fixed height
clfwd.set_device("U", True)
clfwd.set_source(wl, 1, 0) # Simple X-polarized source
clfwd.solve()
return clfwd
There is not much to say about it, it is pretty standard, the only parameter being the $\varepsilon_s$ of the uniform slab. We also define a function for computing the adjoint crystal
# Adjoint simulation
def adjoint_cl(eps):
cl = Crystal((1,1))
cl.add_layer_uniform("U", eps, 0.7)
cl.add_layer_uniform("B", 1, 0.5)
cl.set_device("BU", True)
cl.set_source(wl, np.nan, 0)
cl.solve()
return cl
The only original part is a $0.5a$ air buffer called B in front of the crystal. We will see why it exists in a few code blocks. The last function we need will compute the field $E_x$ where we need it. Notice how we override the incident wavevector of Crystal.set_source. This is because we need to pass normalize=False to use a specific value of $E_x$.
from khepri.alternative import incident
def get_field(cl, zmin, zmax, ex):
x = y = np.array([0])
x, y = np.meshgrid(x, y)
z = np.linspace(zmin, zmax, 50)
efield = incident((1,1), np.conj(ex), 0,
k_vector=(0, 0, 2 * np.pi / wl), normalize=False)
hfield = np.zeros_like(efield)
# Call fields_volume with the override of incident fields.
E, _ = cl.fields_volume(x, y, z, incident_fields=(efield, hfield))
return E[:, 0, :, 0] # Only take E_x
Finally, we can put everything together in the optimization loop:
eps = 8 # Initial value for epsilon
# Let's init some storages
profile, eps_vals, grads = [], [], []
epochs, lr = 300, 0.05
for i in range(epochs):
# We start with the forward simulation, Ex=1
# Pretty standard, we store the fields in the slab
# from 0.01 to 0.7
Efwd = get_field(clfwd := forward_cl(eps), 0.01, 0.7, 1)
# We also get the probe whose intensity we will maximize
# The probe is placed 0.5 after the slab.
fwd_probe = clfwd.fields_volume(np.array([0]), np.array([0]), [0.7+0.5])[0][0, 0, 0]
# We use the probed field as source for the adjoint simulation
Ebck = get_field(adjoint_cl(eps), 0.5, 0.5+0.7, fwd_probe)
# Flipping the simulation upside-down
Ebck = np.flipud(Ebck)*1j
# Compute df/dp
dfdx = np.mean(-2*np.real(Efwd*Ebck)) * 4 * np.pi**2/wl**2
# Gradient ascent
eps += lr * dfdx
eps_vals.append(eps)
grads.append(0.05*dfdx)
profile.append(np.abs(fwd_probe)**2)
From the above code, we see that we start with a value of $\varepsilon=8$, which is chosen because it is far from the optimum. The optimization is performed by gradient ascent, maximizing the value of the field at the probe point. Different archive arrays are used to track gradients, figure of merit and value of epsilon.
We can plot the different archives of performance using matplotlib.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(6, 3))
ax1.plot(profile, label='R')
ax1.plot(grads[:], label='R')
ax2.plot(np.linspace(1, 12, 50), np.abs(E_eps)**2, label='R')
ax2.plot(eps_vals, profile, 'r.', markersize=1)
ax2.plot(eps_vals, grads, 'g-', markersize=1)
plt.legend()
We obtain the following figures. In the left subplot, we show the figure of merit $|E|^2$ function of the optimizer iterations or epochs. As the optimizer iterates, we see this metric raising steadily to reach an optimum value of $0.25$ which corresponds to roughly $100\%$ transmission. On the right hand side, the figure of merit is charted for a wide range of $\varepsilon$. This figure therefore represents $f(p)$. We can do this as the problem has only one parameter and is extremely quick to evaluate. The red crosses represent the trajectory of the optimizer on this curve.
Now that we have a working implementation of the uniform slab, we can extend our implementation to $\varepsilon(z)$. Surprisingly, there is not much to do from our previous example.
The following figure shows on the right the now minimized figure of merit (minimizing transmission). The figure on the left shows the evolution, from bottom to top of the z-varying dielectric profile. We start from an initial uniform slab and end up with a bragg mirror. Indeed, the resulting period is approximately $\Lambda\approx0.19$ in reduced units with a thinner high index region. The distributed Bragg mirror predicts a large bandgap $$ \frac{\Delta f_0}{f_0}=\frac{4}{\pi}\arcsin{\frac{n_2-n_1}{n_1+n_2}}\approx 1.28 $$ which is centered around, from eyeballing the figure,
$$ \frac{\lambda_\text{gap}}{4} = nd = n_1d_1+n_2d_2\approx0.05\times3.46+0.14\times1.0=0.313 $$ then $\lambda_\text{gap}\approx1.252$, which, given $\Delta f_0$ easily covers our wavelength $\lambda=1.1$.
We can observe the transmission spectra to convince ourselves.
home