# lse 的概念

1,840次阅读

## 背景

{x_n}_{n=1}^N
，我们想要求
z = \log \sum_{n=1}^N \exp{x_n}

\frac{e^{x_j}}{\sum_{i=1}^n e^{x_i}}

\log \left( \frac{e^{x_j}}{\sum_{i=1}^n e^{x_i}} \right) = \log(e^{x_j}) – \log \left( \sum_{i=1}^n e^{x_i} \right)
= x_j – \log \left( \sum_{i=1}^n e^{x_i} \right)

## 数据溢出问题

>>> import math
>>> math.e**1000
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: (34, 'Result too large')
>>> math.e**-1000
0.0


## LSE 求解

\log \text{SumExp}(x_1 \ldots x_n) = \log \left( \sum_{i=1}^{n} e^{x_i} \right)

\begin{aligned}
\log \left( \sum_{i=1}^{n} e^{x_i}\right) &= \log \left( \sum_{i=1}^{n} e^{x_i – c} e^c \right)\\
&= \log \left( e^c \sum_{i=1}^{n} e^{x_i – c} \right)\\
&= \log \left( \sum_{i=1}^{n} e^{x_i – c} \right) + \log(e^c)\\
&= \log \left( \sum_{i=1}^{n} e^{x_i – c} \right) + c
\end{aligned}

\begin{aligned}
\text{loss}(\text{Softmax}(x_j, x_1, \ldots, x_n)) &= x_j – \log(\text{SumExp}(x_1, \ldots, x_n))\\
&= x_j – \log \left( \sum_{i=1}^{n} e^{x_i} \right)\\
&= x_j – \log \left( \sum_{i=1}^{n} e^{x_i – c} \right) – c
\end{aligned}

\text{loss}(\text{Softmax}(1000, [1000, 1000, 1000])) = 1000 – \log(3) – 1000
= -\log(3)

LSE函数是一个数学上的技巧，它对一组数值的指数求和后取对数，通常用于计算概率归一化（例如在softmax函数中）时数值稳定性问题。LSE函数的定义是
\text{LSE}(x_1, \ldots, x_n) = \log(\exp{x_1} + \ldots + \exp{x_n})

\text{LSE}(0, x_1, \ldots, x_n) = \text{LSE}(c, x_1 – c, \ldots, x_n – c)
，其中 ( c ) 是一个常数。这个性质基于函数 ( f(x) ) 在 ( x ) 点的泰勒展开，其中
f(x) \approx f(c) + f'(c)(x – c)

\max{x_1, \ldots, x_n} \leq \text{LSE}(x_1, \ldots, x_n) \leq \max{x_1, \ldots, x_n} + \log(n)