文書集合が与えられた時に、 トピック に応じて文書分類をしたい。
トピックが異なると使われる単語が異なる。
ひとつの文書には複数のトピックが含まれうる
各トピックごとに異なる単語分布を持っている
各文書はトピック分布で特徴付けられる
トピック分布からトピックを決め、そのトピックの単語分布から単語をサンプリングすることを繰り返して文書を作る。
データの表現
添字が省略されている際には、その添字の範囲を全て含むベクトル・行列を表すものとする
$$ \begin{array} \ & p(w_{m,n}=v \mid z_{m,n}=k, \beta) = \beta_{k,v}\\ \text{where } &\sum_{v=1}^V \beta_{k,v} = 1~(k=1,\dots,K) \end{array} $$
データ $w=\{{w}_1,\dots,{w}_M\}$ が与えられた時、
$$ p(\theta, z \mid w, \alpha, \beta) = \dfrac{p(\theta, z, w \mid \alpha, \beta)}{p(w \mid \alpha, \beta)} $$
分子は計算できる $$ p(\theta, z, w \mid \alpha, \beta) = \prod_{m=1}^M \left[ p(\theta_m \mid \alpha) \prod_{n=1}^{N_m} \left[p(z_{m,n} \mid \theta_m) p(w_{m,n} \mid z_{m,n}, \beta)\right]\right] $$
分母は計算できない $$ \begin{align} p(w \mid \alpha, \beta) &=\prod_{m=1}^M \int \sum_{z} p(\theta_m, z_m, w_m \mid \alpha, \beta) \mathrm{d}\theta\\ &= \prod_{m=1}^M \dfrac{\Gamma(K\alpha)}{\Gamma(\alpha)^K} \int \left(\prod_{k=1}^K \theta_k^{\alpha-1}\right) \left( \prod_{n=1}^{N_m} \sum_{k=1}^K \prod_{v=1}^V (\theta_{m,k} \beta_{k,v})^{w_{m,n}} \right)\mathrm{d}\theta \end{align} $$
事後分布を以下の分布で近似する $$ \begin{align} q(\theta, z \mid \gamma, \phi) &= \prod_{m=1}^{M} q(\theta_m, z_m \mid \gamma_m, \phi_m)\\ q(\theta_m, z_m \mid \gamma_m, \phi_m)&=q(\theta_m \mid \gamma_m)\prod_{n=1}^{N_m}q(z_{m,n} \mid \phi_{m,n}) \end{align} $$
事後分布との距離が近くなるように $\gamma$ と $\phi$ を学習する
$$ (\gamma^\star, \phi^\star) = \arg\min_{\gamma, \phi} \mathrm{KL}(q(\theta, z \mid \gamma, \phi) || p(\theta, z \mid w, \alpha, \beta)) $$ ここで $q(\theta, z \mid \gamma, \phi) = \prod_{m=1}^M q(\theta_m, z_m \mid \gamma_m, \phi_m)$
事後分布との距離が近くなるように $\gamma_m$ と $\phi_m$ を学習する
$$ (\gamma_m^\star, \phi_m^\star) = \arg\min_{\gamma_m, \phi_m} \mathrm{KL}(q(\theta_m, z_m \mid \gamma_m, \phi_m) || p(\theta_m, z_m \mid w_m, \alpha, \beta)) $$
以下の等式が成り立つ: $$ \mathrm{KL}(q(\theta_m, z_m \mid \gamma_m, \phi_m) || p(\theta_m, z_m \mid w_m, \alpha, \beta)) = \log p(w_m \mid \alpha, \beta) - \mathbb{E}_{q_m} \left[\log p(w_m, z_m, \theta_m \mid \alpha, \beta)\right] + \mathbb{E}_{q_m} \left[\log q(\theta_m, z_m \mid \gamma_m, \phi_m)\right] $$
左辺は事後分布 $p(\theta_m, z_m\mid w_m,\alpha, \beta)$があって計算できないが、右辺は事後分布がないため計算できることが嬉しい
$$ \begin{align} \log p(w_m \mid \alpha, \beta) &= \log \int \sum_{z_m} p(w_m, \theta_m, z_m \mid \alpha, \beta) \mathrm{d}\theta_m\\ &=\log \int \sum_{z_m} \dfrac{p(w_m, \theta_m, z_m \mid \alpha, \beta)}{q(\theta_m, z_m \mid \gamma_m, \phi_m)} q(\theta_m, z_m \mid \gamma_m, \phi_m) \mathrm{d}\theta_m\\ &\geq \int \sum_{z_m} q(\theta_m, z_m \mid \gamma_m, \phi_m) \log \left( \dfrac{p(w_m, \theta_m, z_m \mid \alpha, \beta)}{q(\theta_m, z_m \mid \gamma_m, \phi_m)}\right) \mathrm{d}\theta_m \end{align} $$
また $$ \begin{align} \log p(w_m \mid \alpha, \beta) = \int \sum_{z_m} q(\theta_m, z_m \mid \gamma_m, \phi_m) \log p(w_m \mid \alpha, \beta) \mathrm{d}\theta_m \end{align} $$ なので、(左辺) - (右辺)は、 $$ \begin{align} &\int \sum_{z_m} q(\theta_m, z_m \mid \gamma_m, \phi_m) \log \left( \dfrac{q(\theta_m, z_m \mid \gamma_m, \phi_m)p(w_m \mid \alpha, \beta)}{p(w_m, \theta_m, z_m \mid \alpha, \beta)}\right) \mathrm{d}\theta_m\\ =&\int \sum_{z_m} q(\theta_m, z_m \mid \gamma_m, \phi_m) \log \left( \dfrac{q(\theta_m, z_m \mid \gamma_m, \phi_m)}{p(\theta_m, z_m \mid w_m, \alpha, \beta)}\right) \mathrm{d}\theta_m\\ =& \mathrm{KL}(q(\theta_m, z_m \mid \gamma_m, \phi_m) || p(\theta_m, z_m \mid w_m, \alpha, \beta)) \end{align} $$
事後分布との距離が近くなるように $\gamma_m$ と $\phi_m$ を学習する
$$ \begin{align} (\gamma_m^\star, \phi_m^\star) &= \arg\min_{\gamma_m, \phi_m} \mathrm{KL}(q(\theta_m, z_m \mid \gamma_m, \phi_m) || p(\theta_m, z_m \mid w_m, \alpha, \beta))\\ &= \arg\max_{\gamma_m, \phi_m}\left( \mathbb{E}_{q_m} \left[\log p(w_m, z_m, \theta_m \mid \alpha, \beta)\right] - \mathbb{E}_{q_m} \left[\log q(z_m, \theta_m \mid \gamma_m, \phi_m)\right]\right) \end{align} $$
$$ \begin{align} &\mathbb{E}_{q_m} \left[\log p(w_m, z_m, \theta_m \mid \alpha, \beta)\right] - \mathbb{E}_{q_m} \left[\log q(z_m, \theta_m \mid \gamma_m, \phi_m)\right]\\ =& \mathbb{E}_{q_m} \left[\log p(\theta_m \mid \alpha)\right] + \mathbb{E}_{q_m} \left[\log p(z_m \mid \theta_m)\right] + \mathbb{E}_{q_m} \left[\log p(w_m \mid z_m, \beta)\right] \\ & - \mathbb{E}_{q_m} \left[\log q(\theta_m \mid \gamma_m)\right] - \mathbb{E}_{q_m} \left[\log q(z_m \mid \phi_m)\right] \end{align} $$
第一項 $$ \begin{align} \mathbb{E}_{q_m} \left[\log p(\theta_m \mid \alpha)\right] &= \mathbb{E}_{q(\theta_m \mid \gamma_m)} \left[\log \Gamma\left(\sum_{k=1}^{K}\alpha_k\right) - \sum_{k=1}^K \log\Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k - 1) \log \theta_{m,k}\right]\\ &= \log \Gamma\left(\sum_{k=1}^{K}\alpha_k\right) - \sum_{k=1}^K \log\Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k - 1) \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \end{align} $$
第二項 $$ \begin{align} \mathbb{E}_{q_m} \left[\log p(z_m \mid \theta_m)\right] &= \mathbb{E}_{q_m}\left[\sum_{n=1}^{N_m} \log p(z_{m,n} \mid \theta_m)\right]\\ &= \mathbb{E}_{q(\theta_m \mid \gamma_m)}\left[ \sum_{n=1}^{N_m} \sum_{k=1}^K q(z_{m,n}=k \mid \phi_{m,n}) \log p(z_{m,n}=k \mid \theta_m)\right]\\ &=\mathbb{E}_{q(\theta_m \mid \gamma_m)}\left[ \sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \log \theta_{m,k}\right]\\ &=\sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \end{align} $$
第三項 $$ \begin{align} \mathbb{E}_{q_m} \left[\log p(w_m \mid z_m, \beta)\right] &= \sum_{n=1}^{N_m} \mathbb{E}_{q(z_{m,n} \mid \phi_{m,n})} \left[\log p(w_{m,n} \mid z_{m,n}, \beta)\right] \\ &= \sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \log p(w_{m,n} \mid z_{m,n}=k, \beta)\\ &= \sum_{n=1}^{N_m} \sum_{k=1}^K \sum_{v=1}^{V} \phi_{m,n,k} \mathbb{1}[w_{m,n} = v] \log \beta_{k, v} \end{align} $$
第四項 $$ \begin{align} \mathbb{E}_{q_m} \left[\log q(\theta_m \mid \gamma_m)\right] &= \mathbb{E}_{q(\theta_m \mid \gamma_m)} \left[\log \Gamma\left(\sum_{k=1}^{K}\gamma_{m,k}\right) - \sum_{k=1}^K \log \Gamma(\gamma_{m,k}) + \sum_{k=1}^K (\gamma_{m,k} - 1) \log \theta_{m,k}\right]\\ &= \log \Gamma\left(\sum_{k=1}^{K}\gamma_{m,k}\right) - \sum_{k=1}^K \log \Gamma(\gamma_{m,k}) + \sum_{k=1}^K (\gamma_{m,k} - 1) \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \end{align} $$
第五項 $$ \begin{align} \mathbb{E}_{q_m} \left[\log q(z_m \mid \phi_m)\right] &= \sum_{n=1}^{N_m} \mathbb{E}_{q(z_{m,n} \mid \phi_{m,n})} \left[\log q(z_{m,n} \mid \phi_{m,n})\right] \\ &=\sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \log \phi_{m,n,k} \end{align} $$
$$ \begin{align} &L_m(\gamma, \phi; \alpha, \beta) \\ =& \log \Gamma\left(\sum_{k=1}^{K}\alpha_k\right) - \sum_{k=1}^K \log\Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k - 1) \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \\ &+ \sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \\ &+ \sum_{n=1}^{N_m} \sum_{k=1}^K \sum_{v=1}^{V} \phi_{m,n,k} \mathbb{1}[w_{m,n} = v] \log \beta_{k, v}\\ &- \left( \log \Gamma\left(\sum_{k=1}^{K}\gamma_{m,k}\right) - \sum_{k=1}^K \log\Gamma(\gamma_{m,k}) + \sum_{k=1}^K (\gamma_{m,k} - 1) \left( \Psi(\gamma_k) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right)\right)\\ &- \sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \log \phi_{m,n,k} \end{align} $$
各 $m,n$ について $\sum_{k=1}^K \phi_{m,n,k} = 1$ のもとで最大化
ラグランジアンは
$$ \begin{align} \ &L_m(\phi) \\ =&\sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \\ +& \sum_{n=1}^{N_m} \sum_{k=1}^K \sum_{v=1}^{V} \phi_{m,n,k} \mathbb{1}[w_{m,n} = v] \log \beta_{k, v}\\ -& \sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \log \phi_{m,n,k}\\ +& \sum_{n=1}^{N_m}\lambda_n \left(\sum_{k=1}^K \phi_{m,n,k}-1\right) \end{align} $$
$\phi_{m,n,k}$ で微分すると
$$ \begin{align} \ &\dfrac{\partial}{\partial \phi_{m,n,k}}L_m(\phi) \\ = &\Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) + \sum_{v=1}^{V} \mathbb{1}[w_{m,n} = v] \log \beta_{k, v}- \log \phi_{m,n,k} - 1 + \lambda_n=0 \end{align} $$
整理すると
$$ \begin{align} \phi_{m,n,k} \propto \exp\left[\Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) + \sum_{v=1}^{V} \mathbb{1}[w_{m,n} = v] \log \beta_{k, v} \right] \end{align} $$
$\lambda_n$ は正規化に使われる
$\gamma_{m,k}$ に関する項は
$$ \begin{align} &L_m(\gamma) \\ =& \sum_{k=1}^K (\alpha_k - 1) \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \\ &+ \sum_{n=1}^{N_m} \sum_{k=1}^K \phi_{m,n,k} \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \\ &- \left( \log \Gamma\left(\sum_{k=1}^{K}\gamma_{m,k}\right) - \sum_{k=1}^K\log\Gamma(\gamma_{m,k}) + \sum_{k=1}^K (\gamma_{m,k} - 1) \left( \Psi(\gamma_k) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right)\right) \end{align} $$
整理すると
$$ \begin{align} &L_m(\gamma) \\ =& \sum_{k=1}^K \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k^\prime=1}^K \gamma_{m,k^\prime}\right) \right) \left(\alpha_k - \gamma_{m,k} + \sum_{n=1}^{N_m} \phi_{m,n,k}\right) \\ &- \left( \log \Gamma\left(\sum_{k=1}^{K}\gamma_{m,k}\right) - \sum_{k=1}^K \log\Gamma(\gamma_{m,k}) \right) \end{align} $$
$\gamma_{m,k}$ で微分すると
$$ \begin{align} &\dfrac{\partial}{\partial \gamma_{m,k}}L_m(\gamma) \\ = &- \Psi(\gamma_{m,k}) + \Psi\left(\sum_{k^\prime=1}^K \gamma_{m,k^\prime}\right)\\ &+ \Psi^\prime(\gamma_{m,k}) \left(\alpha_k - \gamma_{m,k} + \sum_{n=1}^{N_m} \phi_{m,n,k}\right) \\ &- \Psi^\prime\left(\sum_{k^\prime=1}^K \gamma_{m,k^\prime}\right)\sum_{k=1}^K \left(\alpha_k - \gamma_{m,k} + \sum_{n=1}^{N_m} \phi_{m,n,k}\right) \\ &- \Psi\left(\sum_{k^\prime=1}^K \gamma_{m,k^\prime}\right) + \Psi(\gamma_{m,k})\\ =& \Psi^\prime(\gamma_{m,k}) \left(\alpha_k - \gamma_{m,k} + \sum_{n=1}^{N_m} \phi_{m,n,k}\right) \\ &- \Psi^\prime\left(\sum_{k^\prime=1}^K \gamma_{m,k^\prime}\right)\sum_{k=1}^K \left(\alpha_k - \gamma_{m,k} + \sum_{n=1}^{N_m} \phi_{m,n,k}\right) \\ \end{align} $$
$\Psi^\prime$ の中身は忘れて係数を見ると、以下の$\gamma$で勾配が0になる
$$ \begin{align} \gamma_{m,k} = \alpha_k + \sum_{n=1}^{N_m} \phi_{m,n,k} \end{align} $$
下界の $\beta$ に関する項 + ラグランジアンは、 $$ \begin{align} L(\beta) = \sum_{m=1}^M \sum_{n=1}^{N_m} \sum_{k=1}^K \sum_{v=1}^{V} \phi_{m,n,k} \mathbb{1}[w_{m,n} = v] \log \beta_{k, v} + \sum_{k=1}^K \lambda_k \left(\sum_{v=1}^V \beta_{k,v} - 1\right) \end{align} $$
$\beta_{k,v}$で微分すると $$ \begin{align} \dfrac{\partial}{\partial \beta_{k,v}}L(\beta) = \sum_{m=1}^M \sum_{n=1}^{N_m} \dfrac{1}{\beta_{k,v}}\phi_{m,n,k} \mathbb{1}[w_{m,n} = v] + \lambda_k \end{align} $$
$\beta_{k,v}$で微分すると $$ \begin{align} \beta_{k,v} \propto \sum_{m=1}^M \sum_{n=1}^{N_m} \phi_{m,n,k} \mathbb{1}[w_{m,n}=v] \end{align} $$
下界の$\alpha$に関する項は、
$$ \begin{align} &L(\alpha) \\ =& \sum_{m=1}^M \left[\log \Gamma\left(\sum_{k=1}^{K}\alpha_k\right) - \sum_{k=1}^K \log\Gamma(\alpha_k) + \sum_{k=1}^K (\alpha_k - 1) \left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right)\right] \end{align} $$
下界の$\alpha_k$に関する勾配は、
$$ \begin{align} &\dfrac{\partial}{\partial\alpha_k}L(\alpha) \\ =& M \left(\Psi\left(\sum_{k=1}^{K}\alpha_k\right) - \Psi(\alpha_k) \right) + \sum_{m=1}^M\left( \Psi(\gamma_{m,k}) - \Psi\left(\sum_{k=1}^K \gamma_{m,k}\right) \right) \end{align} $$
勾配=0は解けないので、勾配法で最大化する