.. _muller_potential:

=============================
Learning the Müller potential
=============================

Here we use the Müller potential and samples from it to show how to use potential contrasting to learn a potential energy function that can reproduce a distribution of samples.

1. The Müller potential
-----------------------

The Müller potential is defined as

.. math::

   U(x_1, x_2) = s \cdot \sum_{k = 1}^{4} A_k \cdot \exp\left( \alpha_k (x_1 - a_k)^2 + \beta_k (x_1 - a_k)(x_2 - b_k) + \gamma_k (x_2 - b_k)^2 \right),

where :math:`(A_1, A_2, A_3, A_4) = (-200, -100, -170, 15)`, :math:`(\alpha_1, \alpha_2, \alpha_3, \alpha_4) = (-1, -1, -6.5, 0.7)`, :math:`(\beta_1, \beta_2, \beta_3, \beta_4) = (0, 0, 11, 0.6)`, :math:`(\gamma_1, \gamma_2, \gamma_3, \gamma_4) = (-10, -10, -6.5, 0.7)`, :math:`(a_1, a_2, a_3, a_4) =  (1, 0, -0.5, -1)`, and :math:`(b_1, b_2, b_3, b_4) = (0, 0.5, 1.5, 1)`.
:math:`s` is a scaling parameter and is set to :math:`0.05` in this tutorial.

Now let us define the Müller potential in a ``Python`` function.

.. code-block:: python

   import torch
   
   def compute_Muller_potential(scale, x):
       A = (-200.0, -100.0, -170.0, 15.0)
       beta = (0.0, 0.0, 11.0, 0.6)
       alpha_gamma = (
           x.new_tensor([-1.0, -10.0]),
           x.new_tensor([-1.0, -10.0]),
           x.new_tensor([-6.5, -6.5]),
           x.new_tensor([0.7, 0.7]),
       )
   
       ab = (
           x.new_tensor([1.0, 0.0]),
           x.new_tensor([0.0, 0.5]),
           x.new_tensor([-0.5, 1.5]),
           x.new_tensor([-1.0, 1.0]),
       )
   
       U = 0
       for i in range(4):
           diff = x - ab[i]
           U = U + A[i] * torch.exp(
               torch.sum(alpha_gamma[i] * diff**2, -1) + beta[i] * torch.prod(diff, -1)
           )
   
       U = scale * U
       return U
		
		
Because the potential function is defined over a two dimensional space, we can
visualize it using a two dimensional heatmap.

.. code-block:: python

   import matplotlib.pyplot as plt
   from matplotlib import cm
   import numpy as np
   
   def generate_grid(x1_min, x1_max, x2_min, x2_max, size=100):
       x1 = torch.linspace(x1_min, x1_max, size)
       x2 = torch.linspace(x2_min, x2_max, size)
       grid_x1, grid_x2 = torch.meshgrid(x1, x2, indexing="ij")
       grid = torch.stack([grid_x1, grid_x2], dim=-1)
       x = grid.reshape((-1, 2))
       return x
   
   
   x1_min, x1_max = -1.5, 1.0
   x2_min, x2_max = -0.5, 2.0
   
   grid_size = 100
   x_grid = generate_grid(x1_min, x1_max, x2_min, x2_max, grid_size)
   fig, axes = plt.subplots()
   scale = 0.05
   U = compute_Muller_potential(scale, x_grid)
   U = U.reshape(100, 100)
   U[U > 9] = 9
   U = U.T
   plt.contourf(
       U,
       levels=np.linspace(-9, 9, 19),
       extent=(x1_min, x1_max, x2_min, x2_max),
       cmap=cm.viridis_r,
   )
   plt.xlabel(r"$x_1$", fontsize=24)
   plt.ylabel(r"$x_2$", fontsize=24)
   plt.colorbar()
   axes.set_aspect("equal")
   plt.tight_layout()
   plt.savefig("./data/mp.png")
   plt.close()
   		
The heatmap of the potential looks like the following. It shows that the
potential has three meta-stable states.

.. image:: ../../examples/MullerPotential/data/mp.png

The Müller potential defines a Boltzmann distribution:
:math:`p(x_1, x_2) \propto \exp\left( -U(x_1, x_2) \right)`.
Samples from the Boltzmann distribution are stored in the file
``./data/samples.csv`` and can be loaded as the following.

.. code-block:: python

   x_data = np.loadtxt("./data/samples.csv", delimiter=",")
   x_data = torch.from_numpy(x_data)
   n_data = x_data.shape[0]

We can visualize these samples by plotting them the two dimensional space.

.. code-block:: python

   fig = plt.figure()
   fig.clf()
   plt.plot(x_data[::10, 0].numpy(), x_data[::10, 1].numpy(), ".", alpha=0.5)
   plt.xlim((x1_min, x1_max))
   plt.ylim((x2_min, x2_max))
   plt.xlabel(r"$x_1$", fontsize=24)
   plt.ylabel(r"$x_2$", fontsize=24)
   axes.set_aspect("equal")
   plt.tight_layout()
   plt.savefig("./data/mp_samples.png")
   plt.close()
      
.. image:: ../../examples/MullerPotential/data/mp_samples.png
   
..
   We can also draw sample from the potential based on its Boltzmann distribution
   :math:`p(x_1, x_2) \propto \exp\left( -U(x_1, x_2) \right)`.
   Here we use a temperature replica exchange Monte Carlo approach to draw samples
   from it.
   .. note::
   
      The main purpose of the following Monte Carlo sampling code is to draw samples
      from the Müller potential so that we can use the samples as training data
      in potential contrasting. You don't have to understand the Monte Carlo sampling
      code for understanding potential contrasting. You can skip it if you find it
      confusing at this point and just keep in mind that running it provides samples
      from the Müller potential and the samples are saved in the numpy array ``x_samples``.
      
   .. code-block:: python
		   
      ## draw samples from the Müller potential using temperature replica exchange
      ## Monte Carlo sampling
      ############################################################################
      
      num_reps = 10 # number of replicas
      scales = torch.linspace(0.0, scale, num_reps)
      
      num_steps = 110000
      x_record = []
      accept_rate = 0
      x = torch.stack((x1_min + torch.rand(num_reps)*(x1_max - x1_min),
                       x2_min + torch.rand(num_reps)*(x2_max - x2_min)),
                      dim = -1)
      energy = compute_Muller_potential(1.0, x)
      
      for k in range(num_steps):
          if (k + 1) % 10000 == 0:
              print("steps: {} out of {} total steps".format(k, num_steps))
              
          ## sampling within each replica
          delta_x = torch.normal(0, 1, size = (num_reps, 2))*0.3
          x_p = x + delta_x
          energy_p = compute_Muller_potential(1.0, x_p)
      
          ## accept based on energy
          accept_prop = torch.exp(-scales*(energy_p - energy))
          accept_flag = torch.rand(num_reps) < accept_prop
      
          ## considering the bounding effects
          accept_flag = accept_flag & torch.all(x_p > x_p.new_tensor([x1_min, x2_min]), -1) \
                                    & torch.all(x_p < x_p.new_tensor([x1_max, x2_max]), -1)
          
          x_p[~accept_flag] = x[~accept_flag]
          energy_p[~accept_flag] = energy[~accept_flag]    
          x = x_p
          energy = energy_p
      
          ## calculate overall accept rate
          accept_rate = accept_rate + (accept_flag.float() - accept_rate)/(k+1)    
          
          ## exchange
          if k % 10 == 0:
              for i in range(1, num_reps):
                  accept_prop = torch.exp((scales[i] - scales[i-1])*(energy[i] - energy[i-1]))
                  accept_flag = torch.rand(1) < accept_prop
                  if accept_flag.item():
                      tmp = x[i].clone()
                      x[i] = x[i-1]
                      x[i-1] = tmp
      
                      tmp = energy[i].clone()
                      energy[i] = energy[i-1]
                      energy[i-1] = tmp
   		   
              if k >= 10000:
                  x_record.append(x.clone().numpy())
   
      x_record = np.array(x_record)
      
      x_samples = x_record[:,-1,:]
      

In the following, we pretend that we don't know the energy function of the Müller potential
and only have the samples from it. We want to use the samples as training
data and use the potential contrasting method to learn a potential energy function that should be close to the
Müller potential energy function.
	   
2. Define a noise distribution and generate noise samples
-----------------------------------------------------------

To use potential contrasting, we need to define a noise distribution and generate noise samples from it.
Because the Müller potential has just two dimensions, we can use as the noise
distribution a uniform distribution over the coordinates' range.
The following code block defines the noise distribution and generates noises
samples.

.. code-block:: python

   import math
   
   def compute_log_q(x, x1_limits = (x1_min, x1_max), x2_limits = (x2_min, x2_max)):
       x1_min, x1_max = x1_limits
       x2_min, x2_max = x2_limits
       log_q = math.log(1/(x1_max - x1_min) * 1/(x2_max - x2_min))
       return torch.ones(x.shape[0])*log_q
   
   n_noise = n_data
   x1_noise = torch.rand(n_noise)*(x1_max - x1_min) + x1_min
   x2_noise = torch.rand(n_noise)*(x2_max - x2_min) + x2_min
   x_noise = torch.stack((x1_noise, x2_noise), dim = 1)

	   
3. Learn an energy function using potential contrasting
--------------------------------------------------------

Now we are ready to learn a potential energy function that can reproduce the data distribution.
First we need to parameterize the function used to represent the potential energy.
Here we use a two-dimensional cubic spline model, i.e.,

.. math::

   U(x_1, x_2; \boldsymbol{\theta}) = \sum_{i=1}^{M_1}\sum_{j=1}^{M_2} \theta_{ij} g_{ij}(x_1, x_2),

where :math:`g_{ij}(x_1, x_2) = h_{1i}(x_1) h_{2j}(x_2), i = 1, ..., M_1, j = 1, ..., M_2`.
:math:`h_{1i}(x_1), i = 1, ..., M_1` are a set of cubic spline basis functions for the coordinate :math:`x_1` and
:math:`h_{2j}(x_2), j = 1, ..., M_2` for the coordinate :math:`x_2`.
:math:`\theta_{ij}, i = 1, ..., M_1, j = 1, ..., M_2` are parameters that will be learned in potential contrasting.
There are many different ways to define basis functions for a cubic spline and we use B-spline basis functions.
We use the following function to compute values of B-spline basis functions for a given set of inputs :math:`x`.

.. code-block:: python

   import PC
   
   def compute_2d_cubic_spline_basis(
       x, M1=10, M2=10, x1_limits=(x1_min, x1_max), x2_limits=(x2_min, x2_max)
   ):
       x1_min, x1_max = x1_limits
       x2_min, x2_max = x2_limits
   
       ## degree of spline
       k = 3
   
       num_knots_x1 = M1 - k - 2
       num_knots_x2 = M2 - k - 2
   
       ## knots of cubic spline
       knots_x1 = torch.linspace(x1_min, x1_max, num_knots_x1 + 2)[1:-1]
       knots_x2 = torch.linspace(x2_min, x2_max, num_knots_x2 + 2)[1:-1]
   
       boundary_knots_x1 = torch.tensor([x1_min, x1_max])
       boundary_knots_x2 = torch.tensor([x2_min, x2_max])
   
       basis_x1 = PC.spline.bs(x[:, 0], knots_x1, boundary_knots_x1)
       basis_x2 = PC.spline.bs(x[:, 1], knots_x2, boundary_knots_x2)
   
       basis = basis_x1[:, :, None] * basis_x2[:, None, :]
       basis = basis.reshape(-1, M1 * M2)
       return basis
   
Once we specify the basis functions :math:`g_{ij}(x_1, x_2)`, learning the
energy function :math:`U(x_1, x_2; \boldsymbol{\theta})` reduces to fitting
the parameters :math:`\boldsymbol{\theta}`. We will use the noise contrastive
estimation (implemented in the function ``PC.NCE``) to do that.

.. code-block:: python
		
   basis_data = compute_2d_cubic_spline_basis(x_data)
   basis_noise = compute_2d_cubic_spline_basis(x_noise)
   log_q_data = compute_log_q(x_data)
   log_q_noise = compute_log_q(x_noise)
   theta, dF = PC.NCE(log_q_noise, log_q_data, basis_noise, basis_data)

After fitting the parameters :math:`\boldsymbol{\theta}`, we can visulize the
learned energy function by computing and plotting its values on aa two-dimensional
grid.
   
.. code-block:: python

   basis_grid = compute_2d_cubic_spline_basis(x_grid)
   U_grid = torch.matmul(basis_grid, theta)
   U_grid = U_grid.reshape((grid_size, grid_size))
   
   U_grid = U_grid - U_grid.min() + U.min()
   U_grid[U_grid > 9] = 9
   fig, axes = plt.subplots()
   plt.contourf(
       U_grid.T.numpy(),
       levels=np.linspace(-9, 9, 19),
       extent=(x1_min, x1_max, x2_min, x2_max),
       cmap=cm.viridis_r,
   )
   plt.xlabel(r"$x_1$", fontsize=24)
   plt.ylabel(r"$x_2$", fontsize=24)
   plt.colorbar()
   axes.set_aspect("equal")
   plt.tight_layout()
   plt.savefig("./data/learned_potential.png")
   plt.close()
   		
.. image:: ../../examples/MullerPotential/data/learned_potential.png

We see that the learned potential energy function matches the Müller potential well.