notes on Large Deviation Principle
Published:
Large Deviation Principle
Definition:
The large deviation principle defines a rate function $I(\mathbf{x})$ that: \(\lim_{n\to\infty}-\frac{1}{n}\ln p(\mathbf{x})=I(\mathbf{x})\)
We should notice that the above limit doe not always exist. But for now we assume it exist.
Inversely, if we have the rate funciton, we can write down the system’s probability density function as: \(p(\mathbf{x})=\lim_{n\to\infty}e^{-nI(\mathbf{x})}\)
Here we know if large deviation principle applies, then the rate function contains full information of $p(\mathbf{x})$. We can calculate the rate function instead to describe the system.
How to evaluate:
We can compute $p(\mathbf{x})$ canb convert it (which is not very fun). or
Gartner-Ellis Theorem
\(\begin{align*} \lambda(k)&=\lim_{n\to\infty}\frac{1}{n}\ln E[e^{nk\mathbf{x}}]\\ &=\lim_{n\to\infty}\frac{1}{n}\ln\int p(\mathbf{x})e^{nk\mathbf{x}}d\mathbf{x}\\ \text{if LDP works}&=\lim_{n\to\infty}\frac{1}{n}\ln\int e^n\cdot e^{k\mathbf{x}-I(\mathbf{x})}d\mathbf{x}\\ \text{laplacian approx.}&=\lim_{n\to\infty}\frac{1}{n}\ln e^{n\sup_\mathbf{x}\{k\mathbf{x}-I(\mathbf{x})\}}\\ &=\sup_\mathbf{x}\{k\mathbf{x}-I(\mathbf{x})\} \end{align*}\)
conversely speaking: \(I(\mathbf{x})=\sup_k\{k\mathbf{x}-\lambda(\mathbf{x})\}\)
Varadham Theorem
The above case can be generalize to: \(\begin{align} \Lambda[f]&=\lim_{n\to\infty}\frac{1}{n}\ln E\left[e^{nf(\mathbf{x})}\right]\\ &=\sup_{\mathbf{x}}\left\{f(\mathbf{x})-I(\mathbf{x})\right\} \end{align}\) This return to the Gartner-Ellis Theorem when $f(\mathbf{x})=k\mathbf{x}$.
The Contraction Principle
When $\mathbf{y}=g(\mathbf{x})$, if we know $\mathbf{x}$ follows the LDP, we can derive the rate function of $\mathbf{y}$ via that of $\mathbf{x}$: \(\begin{align*} p(\mathbf{y})&=\int_{\mathbf{x}:g(\mathbf{x})=\mathbf{y}}d\mathbf{x}p(\mathbf{x})\\ &=\int_{\mathbf{x}:g(\mathbf{x})=\mathbf{y}}d\mathbf{x}e^{-nI(\mathbf{x})}\\ &\approx\exp\left(-n\inf_{\mathbf{x}:g(\mathbf{x})=\mathbf{y}}I(\mathbf{x})\right) \end{align*}\) So: \(I(\mathbf{y})=\inf_{\mathbf{x}:g(\mathbf{x})=\mathbf{y}}I(\mathbf{x})\)
Application in statistical mechanics
The LDP work since in statistical mechanics, we often care about a observables $O_n(\mathbf{x})$. When $n$ goes to $\infty$, the probability density $p(\mathbf{x})$ will always decay expoentially around $\langle\mathbf{x}\rangle$.
We use $\Lambda_n$ as the space that $\mathbf{x}$ are defined on. So the volume should goes exponetial with $n$ as $ \Lambda_n =C^n$ Where $C$ is some constant. - The system energy function are defined as $H_n(\mathbf{x})$, which should goes linearly with $n$. So when $n\to\infty$, $h_n(\mathbf{x})=\frac{1}{n}H_n(\mathbf{x})$ should converge to a function. And $\epsilon=h_n(\mathbf{x})$ should follows the LDP as $\mathbf{x}$ does, where its rate function can be derived using the contraction principle.
Entropy
\(\begin{align*} p(\epsilon)&=\int_{\left\{\mathbf{x}\in\Lambda_n:h_n(\mathbf{x})=\epsilon\right\}}p(\mathbf{x})d\mathbf{x}\\ &\propto C^{-n}\int_{\left\{\mathbf{x}\in\Lambda_n:h_n(\mathbf{x})=\epsilon\right\}}d\mathbf{x}\\ &=C^{-n}\Omega(\epsilon) \end{align*}\) We can compute the rate function of $\epsilon$: \(\begin{align*} I(\epsilon)&=\lim_{n\to\infty}-\frac{1}{n}\ln p(\epsilon)\\ &=\ln C-\frac{1}{n}\ln\Omega(\epsilon) \end{align*}\)
Here we found the rate function related to the entropy with a constant $\ln C$. So we absorb the constant in entropy definition, by writing: \(s(\epsilon)=\frac{1}{n}\ln p(\epsilon)\)
Free Energy
The free energy of the system is strong related to the SCGF function: \(\begin{align*} \lambda(k)=\lim_{n\to\infty}\frac{1}{n}\ln E\left[e^{nk\epsilon}\right] \end{align*}\) We should notice that $n\epsilon=H_n(\mathbf{x})$, and the system partition function is: \(Z_n(\beta)=E\left[e^{-\beta H_n(\mathbf{x})}\right]\) Where $\beta=k_BT=-k$, therefore, we can rewrite the SCGF function as: \(F(\beta)=\lambda(-k)=\lim_{n\to\infty}\frac{1}{n}\ln E\left[e^{-\beta H_n(\mathbf{x})}\right]=\lim_{n\to\infty}\frac{1}{n}\ln Z_n(\beta)\)
The maximum entropy and minimum free energy principle can be derived using a observation function $m=M(\mathbf{x})$, and compute the entropy and free energy using the contraction principle of the observable $m$.
Application to a Continuous Time Markov process
For continous time random variable $X(t)$. For given time t, there will be a probability distribution of $X(t)=x$ as $p_t(x)$. Discrete $t$, we have a transition rate function: \(p(x_{t+\delta t},x_t)=\pi(x_{t+\delta t}|x_t)p(x_t)\) So: \(\begin{align*} p(x,t+\delta t)&=\int dx_t\pi_{\delta t}(x_{t+\delta t}|x_t)p(x_t)\\ &=\pi_{\delta t}p(x,t) \end{align*}\) Taking $\delta t$ to zero limit, we shall see $\pi_{\delta t}$ approaching $e^{G\delta t}\approx1+G\delta t$, where $G$ satisfies: \(\frac{\partial p(x,t)}{\partial t}=\int dx'G(x,x')p(x',t)\)
given the transition probability $\pi(x | x’)$, and $S=\frac{1}{t}\int_0^Tx(t)dt$, the SCGF can be wriiten as: |
\(\begin{align*} \lambda(k)&=\lim_{t\to\infty}\frac{1}{t}\ln E\left[e^{tkS}\right]\\ &=v[\~G](k) \end{align*}\) Where $~G(k)=G+k\delta(x,x’)$, $v~G$ is the largest eigenvalue of matrix $~G(k)$ at each k. We can derive the rate function of $S$ using $\lambda(k)$
NOTE: The definition of tilted generator is strictly related to the observable $S$. Which should varied each time.
With the tilted generator, we can parameterize the probability density function using neural network, and optmize using variational principle:
\[\begin{align*} \rho(\mathbf{x})&=\argmax_\psi\left\{\frac{\langle\psi|\~G|\psi\rangle}{\langle\psi|\psi\rangle}\right\}\\ \end{align*}\]Large Deviation Principle for Quantum Steady States
In quantum mechanics, the system can be describe using Lindblad master equation: \(\frac{\partial\hat{\rho}}{\partial t}=\mathcal{L}(\hat{\rho})\\ \mathcal{L}=-\frac{1}{i\hbar}\left[\hat{H},(\cdot)\right]+\sum_i\left(J_i(\cdot)J_i^\dagger-\frac{1}{2}\left\{J_i^\dagger J_i,(\cdot)\right\}\right)\) If the system does not have coupling to the environment, the equation reduced to quantum Liouville equation: \(\frac{\partial \hat{\rho}}{\partial t}=\frac{1}{i\hbar}\left[\hat{H},\hat{\rho}\right]\)
According to “Unravelling the large deviation statistics of Markovian open quantum systems”, the tilted generator is: \(\mathcal{L}_\mathbf{\lambda}^\dagger[X] = i[H, X] + \sum_i \left( e^{-\lambda_i} J_i^\dagger X J_i - \frac{1}{2} \left[ X J_i^\dagger J_i + J_i^\dagger J_i X \right] \right),\)
The SCGF function for the steady-state solution can be solved from this as its largest eigenvalue, as the function of $\lambda$.
Question:
- Since $\mathcal{L}_\mathbf{\lambda}^\dagger$ are operator acting on density operator, what is the variational principle to find largest eigenvalue here?
- How to express a transport system with Lindblad master equation? Is it general?
- Is there a simple quantum non-equilibrium system that has exact solution that I can use as a benchmark?