🌐 Linear Regression: $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon$
🌐 Logistic Regression: $P(y|x) = \sigma(w^Tx)$
🌐 Support Vector Machine: $\min_{w,b} \frac{1}{2} ||w||^2 + C \sum_{i=1}^{n} \max(0, 1 - y_i(w^Tx_i + b))$
🌐 Perceptron: $y = \mathbf{1}(w^Tx + b > 0)$
🌐 Adaline: $y = \phi(w^Tx + b)$, where $\phi$ is a linear activation function.
Naive Bayes
🌐 Naive Bayes Classifier: $p(y|x) = p(y)\prod_{i=1}^{n}p(x_i|y)$
🌐 Gaussian Naive Bayes: $p(x_i|y) = \mathcal{N}(x_i|\mu_{iy},\sigma_{iy}^2)$
🌐 Multinomial Naive Bayes: $p(x_i|y) = \frac{(\sum_{j=1}^{n} x_{ij})!}{\prod_{j=1}^{n} x_{ij}!} \prod_{j=1}^{n} \theta_{yj}^{x_{ij}}$
🌐 Bernoulli Naive Bayes: $p(x_i|y) = \theta_{iy}^{x_i} (1-\theta_{iy})^{1-x_i}$
Tree-Based Models
🌐 Decision Tree: $h(x) = \sum_{i=1}^{m} y_i \mathbf{1}(x \in R_i)$
🌐 Random Forest: $h(x) = \frac{1}{T} \sum_{t=1}^{T} h_t(x)$
Clustering
🌐 k-Means Clustering: $J = \sum_{i=1}^{n} \min_{j=1}^{k} ||x_i - \mu_j||^2$, where $\mu_j$ is the mean of the $j$-th cluster.
🌐 Hierarchical Clustering: $D_{ij} = ||x_i - x_j||$, $\min$ or $\max$ linkage between clusters.
🌐 Gaussian Mixture Model (GMM): $p(x) = \sum_{k=1}^{K} \pi_k \mathcal{N}(x|\mu_k, \Sigma_k)$
🌐 DBSCAN: $\text{Cluster}(x) = \text{Density}(x, \epsilon, minPts)$
🌐 Isolation Forest: $\text{Cluster}(x) = \text{Splits}(x)$
Neural Networks
🌐 Artificial Neural Network: $a^{(1)}=x,a^{(l)}=\sigma(z^{(l)}),z^{(l)}=w^{(l)} a^{(l-1)} + b^{(l)}$
🌐 Convolutional Neural Network: $y_{i,j} = \sigma(\sum_{m=1}^{M} \sum_{k=1}^{K} w_{m,k} x_{i+k-1,j+m-1} + b_m)$
🌐 Recurrent Neural Network (RNN): $h_t = f(h_{t-1}, x_t)$, where $f$ is a recurrent function.
🌐 Long Short-Term Memory (LSTM) RNN: $f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f)$, $i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i)$, $o_t = \sigma(W_o x_t + U_o h_{t-1} + b_o)$, $c_t = f_t \odot c_{t-1} + i_t \odot g(W_c x_t + U_c h_{t-1} + b_c)$, $h_t = o_t \odot \sigma(c_t)$
🌐 Autoencoder: $\mathrm{encoder}: h = f(x;\theta), \mathrm{decoder}: r = g(h;\theta')$
🌐 Artificial Neural Network Backpropagation: $\delta^{(L)} = \nabla_a J \odot \sigma'(z^{(L)})$, $\delta^{(l)} = ((w^{(l+1)})^T \delta^{(l+1)}) \odot \sigma'(z^{(l)})$
Gradient Boosting and Ensemble
🌐 Gradient Boosting: $F(x) = \sum_{m=1}^{M} \gamma_m f_m(x)$
🌐 Gradient Boosting Decision Tree: $y = \sum_{m=1}^{M} \gamma_m f_m(x)$, where $f_m$ is a decision tree and $\gamma_m$ is the learning rate for the $m$-th tree.
🌐 XGBoost: $\text{obj}(\theta) = \sum_{i=1}^{n} l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^{t} \Omega(f_i)$
🌐 LightGBM: $\text{obj}(\theta) = \sum_{i=1}^{n} l(y_i, \hat{y}_i) + \sum_{j=1}^{T} \Omega(f_j)$
🌐 AdaBoost: $H(x) = \text{sign}\left(\sum_{t=1}^T \alpha_t h_t(x)\right)$,
🌐 Bagging: $h(x) = \frac{1}{T} \sum_{t=1}^T h_t(x)$,
🌐 Stacking: $\text{Meta-model}(M_1(x), M_2(x), ..., M_n(x))$
Dimensionality Reduction
🌐 Principal Component Analysis (PCA): $z = U^T(x-\mu)$, where $U$ is the eigenvector matrix.
🌐 Independent Component Analysis (ICA): $x = As$, where $A$ is the mixing matrix and $s$ is the independent source signals.
Classification Metrics
🌐 Receiver Operating Characteristic (ROC) Curve: $\text{TPR} = \frac{TP}{TP+FN}$, $\text{FPR} = \frac{FP}{FP+TN}$
🌐 Precision-Recall Curve: $Precision = \frac{TP}{TP+FP}$, $Recall = \frac{TP}{TP+FN}$
🌐 Confusion Matrix: $\begin{pmatrix}TN & FP\\FN & TP\end{pmatrix}$
🌐 F1 Score: $F1 = \frac{2 \times Precision \times Recall}{Precision + Recall}$
🌐 Area Under Curve (AUC): $\text{AUC} = \int \text{ROC}(\text{FPR}(t), \text{TPR}(t)) dt$
Regression Metrics
🌐 Mean Absolute Error (MAE): $\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$
🌐 Mean Squared Error (MSE): $\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$
🌐 Root Mean Squared Error (RMSE): $\text{RMSE} = \sqrt{\text{MSE}}$
🌐 R2 Score: $R^2 = 1 - \frac{\text{MSE}}{\text{Var}(y)}$
Regularization Techniques
🌐 L1 Regularization (Lasso): $J(\theta) = \frac{1}{m} \sum_{i=1}^{m} L(y^{(i)}, \hat{y}^{(i)}) + \lambda \sum_{j=1}^{n} |\theta_j|$
🌐 L2 Regularization (Ridge): $J(\theta) = \frac{1}{m} \sum_{i=1}^{m} L(y^{(i)}, \hat{y}^{(i)}) + \frac{\lambda}{2} \sum_{j=1}^{n} \theta_j^2$
🌐 Elastic Net Regularization: $J(\theta) = \frac{1}{m} \sum_{i=1}^{m} L(y^{(i)}, \hat{y}^{(i)}) + \lambda_1 \sum_{j=1}^{n} |\theta_j| + \frac{\lambda_2}{2} \sum_{j=1}^{n} \theta_j^2$
🌐 Dropout: $y = \frac{1}{1-p} x \odot m$, where $m \in \{0,1\}$ is a binary mask with probability $p$ of being $1$.
Overfitting Control
🌐 Early Stopping: Stop training when the validation loss stops decreasing,
Kernel Methods
🌐 Radial Basis Function (RBF): $K(x, x') = \exp \left(-\frac{\|x - x'\|^2}{2\sigma^2} \right)$,
🌐 Polynomial Kernel: $K(x, x') = (x^T x' + c)^d$,
Regression Models
🌐 Locally Weighted Linear Regression (LWLR): $\min_{w \in \mathbb{R}^d} \sum_{i=1}^N w(x_i) (y_i - w^T x_i)^2$,
Classification Models
🌐 K-Nearest Neighbors: $y = \mathrm{mode}(\{y_i : x_i \in \mathcal{N}_k(x)\})$
🌐 Softmax Regression: $P(y|x) = \frac{\exp(w_y^T x + b_y)}{\sum_{j=1}^k \exp(w_j^T x + b_j)}$,
🌐 Sigmoid Kernel: $K(x, x') = \tanh (\kappa x^T x' + c)$,
🌐 Kernel SVM: $f(x) = \sum_{i=1}^N \alpha_i y_i K(x, x_i) + b$,
Unsupervised Learning Models
🌐 Gaussian Process: $p(f(x)|X, y, x) = \mathcal{N}(m(x), \sigma^2(x))$,
🌐 Latent Dirichlet Allocation (LDA): $p(w|\theta, \beta) = \sum_{z=1}^K p(w|z, \beta)p(z|\theta)$,
🌐 t-Distributed Stochastic Neighbor Embedding (t-SNE): $p_{j|i} = \frac{\exp(-||x_i - x_j||^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2)}$,
🌐 Spectral Clustering: $L = D - W$, where $D$ is the degree matrix and $W$ is the adjacency matrix,
🌐 Collaborative Filtering: $R \approx U^TV$, where $R$ is the rating matrix, $U$ is the user matrix, and $V$ is the item matrix,
🌐 Matrix Factorization: $R \approx \sum_{k=1}^K u_{ik}v_{kj}$,
Collaborative Filtering
Time Series Analysis
Recommendation Systems
Generative Models
Natural Language Processing
🌐 Word2Vec: $f(w) = \text{Embedding}(w)$,
🌐 GloVe: $F(w_i, w_j, \tilde{w}_k) = w_i^T \tilde{w}_k + b_i + \tilde{b}_k - \log X_{ij}$,
🌐 FastText: $f(w) = \sum_{g \in ngrams(w)} \text{Embedding}(g)$,
🌐 ELMo: $h(x) = \sum_{j=1}^L \gamma_j h_j(x)$,
🌐 Attention Mechanism: $\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^T \exp(e_{ik})}$,
🌐 Self-Attention: $e_{ij} = a(s_i, s_j)$,
🌐 BERT: $\text{MaskedLM}(\text{Transformer}(x))$,
🌐 GPT: $p(x) = \prod_{t=1}^T p(x_t | x_{\le t})$,
Cluster Evaluation
🌐 Silhouette Score: $s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))}$,
Feature Engineering
🌐 One-hot Encoding: $x_i = \begin{cases} 1 & \text{if } c = c_i, \\ 0 & \text{otherwise} \end{cases}$,
🌐 Feature Standardization: $x' = \frac{x - \mu}{\sigma}$,
🌐 Label Encoding: Convert categorical variables into integer labels.
Model Evaluation
🌐 Cross-Validation: $\text{CV} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}$,
🌐 K-Fold Cross-Validation: $\text{CV} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}$
🌐 Stratified K-Fold Cross-Validation: $\text{CV}_{\text{stratified}} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}(\text{same class distribution})$
🌐 Time Series Cross-Validation: $\text{CV}_{\text{time}} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}(\text{time order})$
Activation Functions
🌐 Leaky ReLU: $f(x) = \begin{cases} x & \text{if } x > 0, \\ \alpha x & \text{otherwise} \end{cases}$,
🌐 Parametric ReLU (PReLU): $f(x) = \begin{cases} x & \text{if } x > 0, \\ \alpha_i x & \text{otherwise} \end{cases}$,
Feature Importance
🌐 Permutation Importance: $\text{importance} = \frac{\text{error} - \text{permuted error}}{\text{error}}$
Sequence Models
🌐 Hidden Markov Model (HMM): $\alpha_t(j) = p(o_1,o_2,...,o_t, q_t = j|\lambda)$, $\beta_t(j) = p(o_{t+1}, o_{t+2}, ..., o_T |q_t = j, \lambda)$, $\gamma_t(j) = p(q_t = j|O,\lambda)$, $\epsilon_t(i,j) = p(q_t = i,q_{t+1} = j|O,\lambda)$
Loss Functions
🌐 Mean Squared Logarithmic Error (MSLE): $L(y, \hat{y}) = \frac{1}{N} \sum_{i=1}^N (\log(1 + y_i) - \log(1 + \hat{y}_i))^2$,
🌐 Huber Loss: $L(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{for } |y - \hat{y}| \le \delta, \ \delta (|y - \hat{y}| - \frac{1}{2}\delta) & \text{otherwise} \end{cases}$,
🌐 Hinge Loss: $L(y, \hat{y}) = \max(0, 1 - y \cdot \hat{y})$,
🌐 KL Divergence: $D_{KL}(P || Q) = \sum_{i=1}^N P(i) \log \frac{P(i)}{Q(i)}$,
🌐 Categorical Cross-Entropy Loss: $L(y, \hat{y}) = -\sum_{i=1}^C y_i \log \hat{y}_i$,
🌐 Binary Cross-Entropy Loss: $L(y, \hat{y}) = -\sum_{i=1}^N [y_i \log \hat{y}_i + (1 - y_i)\log (1 - \hat{y}_i)]$,
Graph Embeddings
🌐 DeepWalk: $f(v) = \text{SkipGram}(v, v_1, \dots, v_{2k})$,
🌐 Node2Vec: $f(v) = \text{SkipGram}(v, v_1, \dots, v_{2k})$ with biased random walks,
🌐 GraphSAGE: $f(v) = \text{SAGE}(v, N(v))$,
Common ML Optimizers
🌐 Stochastic Gradient Descent with Momentum: $v_t = \gamma v_{t-1} + \eta \nabla f(x_t)$, $x_{t+1} = x_t - v_t$,
🌐 AdaGrad: $g_t = g_{t-1} + (\nabla f(x_t))^2$, $x_{t+1} = x_t - \frac{\eta}{\sqrt{g_t + \epsilon}} \nabla f(x_t)$,
🌐 Adam: $m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla f(x_t)$, $v_t = \beta_2 v_{t-1} + (1 - \beta_2) (\nabla f(x_t))^2$, $\hat{m}_t = \frac{m_t}{1 - \beta_1^t}$, $\hat{v}_t = \frac{v_t}{1 - \beta_2^t}$, $x_{t+1} = x_t - \frac{\eta \hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$,
🌐 AdaMax: $m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla f(x_t)$, $v_t = \max(\beta_2 v_{t-1}, |\nabla f(x_t)|)$, $x_{t+1} = x_t - \frac{\eta m_t}{v_t}$,
🌐 Nadam: $m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla f(x_t)$, $v_t = \beta_2 v_{t-1} + (1 - \beta_2) (\nabla f(x_t))^2$, $\hat{m}_t = \frac{m_t}{1 - \beta_1^t} + \frac{(1 - \beta_1) \nabla f(x_t)}{1 - \beta_1^t}$, $\hat{v}_t = \frac{v_t}{1 - \beta_2^t}$, $x_{t+1} = x_t - \frac{\eta \hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$,
Evaluation Metrics
🌐 ROUGE: $R = \frac{\sum_{s \in \text{reference}} \sum_{n \in \text{grams}} \min \left(c_n(s), c_n(\text{candidate}) \right)}{\sum_{s \in \text{reference}} \sum_{n \in \text{grams}} c_n(s)}$,
🌐 BLEU: $BP \cdot \exp \left(\sum_{n=1}^N w_n \log p_n \right)$,
Normalization
🌐 Batch Normalization: $\hat{x}_i = \frac{x_i - E[x_i]}{\sqrt{Var[x_i] + \epsilon}}$, $y_i = \gamma \hat{x}_i + \beta$
Data Augmentation
Hyperparameter Tuning
Others
🌐 EM Algorithm: $Q(\theta, \theta^{(t)}) = E_{Z|X,\theta^{(t)}}[\log p(X, Z|\theta)]$,
🌐 Bias-Variance Tradeoff: $\text{Error} = \text{Bias}^2 + \text{Variance} + \text{Noise}$,
🌐 Transfer Learning: $\text{Performance}(\text{new}) = \text{Pretrained Model}(\text{similar}) + \Delta \text{Performance}$
🌐 Stratified Sampling: $\text{Sampled Data} = \text{Sample}(\text{Class}_1, \dots, \text{Class}_n)$
Value Functions and Bellman Equations
🌐 Markov Decision Process: $\mathcal{M} = (S, A, P, R, \gamma)$
🌐 State transition function: $P(s'|s, a)$
🌐 Reward function: $R(s, a, s')$ or $R(s, a)$,
🌐 State-value function: $V^\pi(s) = E[\sum_{t=0}^\infty \gamma^t R_t | S_0 = s, \pi]$,
🌐 Action-value function: $Q^\pi(s, a) = E[\sum_{t=0}^\infty \gamma^t R_t | S_0 = s, A_0 = a, \pi]$,
🌐 Bellman equation for $V^\pi$: $V^\pi(s) = \sum_a \pi(a|s) \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma V^\pi(s') \right]$,
🌐 Bellman equation for $Q^\pi$: $Q^\pi(s, a) = \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma \sum_{a'} \pi(a'|s') Q^\pi(s', a') \right]$,
🌐 Optimal state-value function: $V^\star(s) = \max_a Q^\star(s, a)$
🌐 Optimal action-value function: $Q^\star(s, a) = \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma \max_{a'} Q^(s', a') \right]$
Policy Improvement
🌐 Policy improvement: $\pi'(s) = \arg\max_a Q^\pi(s, a)$,
🌐 Policy iteration: $(1) \text{Policy Evaluation} \rightarrow (2) \text{Policy Improvement} \rightarrow (3) \text{Repeat until convergence}$,
Value-based Algorithms
🌐 Value iteration: $V_{t+1}(s) = \max_a \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma V_t(s') \right]$,
🌐 Temporal Difference (TD) learning: $V(S_t) \leftarrow V(S_t) + \alpha \left[ R_{t+1} + \gamma V(S_{t+1}) - V(S_t) \right]$,
🌐 Q-learning: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma \max_a Q(S_{t+1}, a) - Q(S_t, A_t) \right]$,
🌐 SARSA: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t) \right]$,
Deep RL Algorithms
🌐 Markov Decision Process: $\mathcal{M} = (S, A, P, R, \gamma)$
🌐 Deep Q-Network (DQN) loss: $\min_\theta \sum_{(s, a, r, s', d) \in \mathcal{D}} \left[ r + (1 - d)\gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right]^2$,
🌐 Experience replay buffer: $\mathcal{D}$, a collection of tuples $(s, a, r, s', d)$ used to train the DQN,
🌐 Target network: $Q(s, a; \theta^-)$, a separate network with parameters $\theta^-$ that are periodically updated from the main network,
🌐 Double DQN (DDQN) loss: $\min_\theta \sum_{(s, a, r, s', d) \in \mathcal{D}} \left[ r + (1 - d)\gamma Q(s', \arg\max_{a'} Q(s', a'; \theta); \theta^-) - Q(s, a; \theta) \right]^2$,
🌐 Distributed DQN (DDQN): $Q(S_t, A_t; \theta) \leftarrow Q(S_t, A_t; \theta) + \alpha \left[ R_{t+1} + \gamma \max_a Q(S_{t+1}, a; \theta') - Q(S_t, A_t; \theta) \right]$,
🌐 Dueling DQN: $Q(s, a; \theta, \alpha, \beta) = V(s; \theta) + A(s, a; \alpha) - \frac{1}{|A|} \sum_{a'} A(s, a'; \beta)$,
🌐 Prioritized Experience Replay: $p_i = \left| r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right| + \epsilon$,
Policy Gradient Algorithms
🌐 Actor-Critic loss: $L(\theta, \phi) = -E_{\tau \sim \pi_{\theta}}[\sum_{t=0}^T \gamma^t (r(s_t, a_t) - V_{\phi}(s_t)) \nabla_\theta \log \pi_{\theta}(a_t|s_t)]$,
🌐 Advantage Actor-Critic (A2C): $L(\theta, \phi) = -E_{\tau \sim \pi_{\theta}}[\sum_{t=0}^T \gamma^t A_{\phi}(s_t, a_t)]$,
🌐 Proximal Policy Optimization (PPO): $L^{\text{CLIP}}(\theta) = E_t \left[ \min \left( \frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_\text{old}}(a_t|s_t)} A^\text{adv}t, \text{clip} \left( \frac{\pi\theta(a_t|s_t)}{\pi_{\theta_\text{old}}(a_t|s_t)}, 1 - \epsilon, 1 + \epsilon \right) A^\text{adv}_t \right) \right]$,
🌐 Soft Actor-Critic (SAC): $J(\theta, \phi) = E_{\tau \sim \pi_{\theta}} \left[ \sum_{t=0}^T \gamma^t \left( r(s_t, a_t) - \alpha \log \pi_{\theta}(a_t|s_t) \right) \right]$,
🌐 Trust Region Policy Optimization (TRPO): $\text{maximize } \Delta L(\theta) \text{ subject to } KL(\pi_\theta | \pi_{\theta_\text{old}}) \leq \delta$,
🌐 Monte Carlo Policy Gradient: $\nabla J(\theta) = E_{\tau \sim \pi_{\theta}}[\sum_{t=0}^T \nabla_\theta \log \pi_{\theta}(a_t|s_t) R_t]$,
🌐 REINFORCE: $\theta \leftarrow \theta + \alpha \sum_{t=0}^T \nabla_\theta \log \pi_{\theta}(a_t|s_t) (R_t - b)$,
🌐 Natural Policy Gradient: $\nabla_{\theta} J(\theta) = F^{-1}(\theta) \nabla_{\theta} J(\theta)$,
Deterministic Policy Gradient
Reinforcement Learning For Games
🌐 TD-Gammon: $\delta_t = r_{t+1} + \gamma V(s_{t+1}; \theta) - V(s_t; \theta)$,
Exploration Strategies
🌐 ε-greedy exploration: $\pi(a|s) = \begin{cases} 1 - \epsilon + \frac{\epsilon}{|A|} & \text{if } a = \arg\max_{a'} Q(s, a') \ \frac{\epsilon}{|A|} & \text{otherwise} \end{cases}$,
🌐 Boltzmann exploration: $\pi(a|s) = \frac{\exp(Q(s, a) / \tau)}{\sum_{a'} \exp(Q(s, a') / \tau)}$,
Bandit Algorithms
🌐 Multi-Armed Bandit: $A_t = \arg\max_{a \in A} \left( Q_t(a) + c \sqrt{\frac{\log t}{N_t(a)}} \right)$,
🌐 Upper Confidence Bound (UCB): $A_t = \arg\max{a \in A} \left( \hat{\mu}_a + \sqrt{\frac{2 \log t}{n_a}} \right)$,
🌐 Thompson Sampling: $A_t = \arg\max_a \theta^\star_a$, $\theta^\star_a \sim \text{Beta}(\alpha_a, \beta_a)$,
🌐 Contextual Bandit: $A_t = \arg\max_{a \in A} \left( \theta^T x_{t, a} \right)$,
🌐 Linear UCB: $A_t = \arg\max_{a \in A} \left( \theta^T x_{t, a} + \alpha \sqrt{x_{t, a}^T V^{-1} x_{t, a}} \right)$,
🌐 LinUCB: $A_t = \arg\max_{a \in A} \left( \theta^T x_{t, a} + \alpha \sqrt{x_{t, a}^T A_a^{-1} x_{t, a}} \right)$,
🌐 EXP3: $\pi_t(a) = \frac{(1 - \gamma) \hat{w}{t-1}(a) + \gamma / K}{\sum{a'=1}^K ((1 - \gamma) \hat{w}_{t-1}(a') + \gamma / K)}$,
Advanced RL Algorithms
🌐 Entropy-regularized objective: $J(\theta) = E_{\tau \sim \pi_{\theta}} \left[ \sum_{t=0}^T \gamma^t \left( r(s_t, a_t) + \alpha H(\pi_{\theta}(\cdot|s_t)) \right) \right]$,
🌐 DDPG: $\nabla_{\theta^\mu} J = \mathbb{E}{s_t \sim D} \left[\nabla{a} Q(s, a|\theta^Q) \nabla_{\theta^\mu} \mu(s|\theta^\mu)\right]$,
🌐 Monte Carlo Tree Search (MCTS): $Q(s, a) = \frac{\sum_{i=1}^n R_i(s, a)}{N(s, a)}$,
🌐 Upper Confidence Bound for Trees (UCT): $a^* = \arg\max_a \left( Q(s, a) + c \sqrt{\frac{\log N(s)}{N(s, a)}} \right)$,
Temporal Difference Variants
🌐 Expected SARSA: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma E_{\pi}[Q(S_{t+1}, A_{t+1})] - Q(S_t, A_t) \right]$,
🌐 Dyna-Q: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma \max_a Q(S_{t+1}, a) - Q(S_t, A_t) \right]$,
🌐 Model learning: $P(s'|s, a) \leftarrow P(s'|s, a) + \alpha \left[ 1 - P(s'|s, a) \right]$,
🌐 R-learning: $\rho \leftarrow \rho + \beta \left[ R_{t+1} - \rho + \gamma \max_a Q(S_{t+1}, a) - Q(S_t, A_t) \right]$,
🌐 Average Reward SARSA: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} - \bar{R}t + Q(S{t+1}, A_{t+1}) - Q(S_t, A_t) \right]$,
Other
Optimization Problems
🌐 Optimization problem: $\min_{x \in \mathcal{X}} f(x)$
🌐 Feasible set: $\mathcal{X} = \{x \in \mathbb{R}^n : g_i(x) \le 0, i = 1, \ldots, m\}$
🌐 Linear Programming: $\min_{x \in \mathbb{R}^n} c^T x$ subject to $Ax \le b$
🌐 Quadratic Programming: $\min_{x \in \mathbb{R}^n} \frac{1}{2} x^T Q x + c^T x$ subject to $Ax \le b$
🌐 Constrained Optimization: $\min_{x \in X} f(x)$ subject to $g(x) \le 0$ and $h(x) = 0$
Lagrangian and duality
🌐 Lagrangian: $L(x, \lambda, \nu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \nu_j h_j(x)$
🌐 Lagrange multipliers: $\lambda^\star_i = -\frac{\partial f}{\partial g_i}(x^\star)$
Convexity
🌐 Convex function: $f(\alpha x + (1-\alpha)y) \le \alpha f(x) + (1-\alpha) f(y)$
🌐 Concave function: $f(\alpha x + (1-\alpha)y) \ge \alpha f(x) + (1-\alpha) f(y)$
Optimality Conditions
🌐 First-order condition: $\nabla f(x^\star) = 0$
🌐 Second-order condition: $H(x^\star) \succ 0$
🌐 KKT conditions: $\begin{cases} \nabla f(x^\star) + \sum_{i=1}^m \lambda_i^\star \nabla g_i(x^\star) = 0 \ g_i(x^\star) \le 0, \lambda_i^\star \ge 0, \lambda_i^\star g_i(x^\star) = 0, i=1,\ldots,m \ x^\star \in \mathcal{X} \end{cases}$
Gradient and Hessian
🌐 Gradient: $\nabla f(x) = \left(\frac{\partial f}{\partial x_1}(x), \ldots, \frac{\partial f}{\partial x_n}(x)\right)$, Hessian matrix: $H = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \ \vdots & \ddots & \vdots \ \frac{\partial^2 f}{\partial x_n \partial x_1} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$,
Optimization Algorithms
🌐 Newton's method: $x^{(k+1)} = x^{(k)} - H^{-1}(x^{(k)})\nabla f(x^{(k)})$,
🌐 Quasi-Newton method: $x^{(k+1)} = x^{(k)} - B_k^{-1}\nabla f(x^{(k)})$,
🌐 Conjugate gradient: $x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)}$,
Descent and search methods
🌐 Steepest descent: $p^{(k)} = -\nabla f(x^{(k)})$,
🌐 Line search: $\alpha_k = \arg\min_{\alpha > 0} f(x^{(k)} + \alpha p^{(k)})$,
🌐 Armijo rule: $f(x^{(k)} + \alpha p^{(k)}) \le f(x^{(k)}) + c\alpha \nabla f(x^{(k)})^T p^{(k)}$,
🌐 Wolfe conditions: $\begin{cases} f(x^{(k)} + \alpha p^{(k)}) \le f(x^{(k)}) + c_1\alpha \nabla f(x^{(k)})^T p^{(k)} \ \nabla f(x^{(k)} + \alpha p^{(k)})^T p^{(k)} \ge c_2 \nabla f(x^{(k)})^T p^{(k)} \end{cases}$,
🌐 Golden section search: $\frac{a_{n+1} - a_n}{a_n - a_{n-1}} = \frac{b_{n+1} - b_n}{b_n - b_{n-1}} = \phi$,
🌐 Bisection method: $x^{(k+1)} = \frac{a^{(k)} + b^{(k)}}{2}$,
Root-finding and fixed-point methods
🌐 Secant method: $x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)}) (x^{(k)} - x^{(k-1)})}{f(x^{(k)}) - f(x^{(k-1)})}$
🌐 Fixed-point iteration: $x^{(k+1)} = g(x^{(k)})$
🌐 Banach fixed-point theorem: $||g(x) - g(y)|| \le L||x - y||$
🌐 Lipschitz constant: $L = \sup_{x \neq y} \frac{||g(x) - g(y)||}{||x - y||}$
Iterative methods and approximations
🌐 Successive approximations: $x^{(k)} = x^{(0)} + \sum_{i=1}^k \Delta x^{(i)}$
🌐 Perturbed optimization: $\min_{x \in \mathcal{X}} f(x) + \epsilon g(x)$
Homotopy and Saddle Point methods
🌐 Homotopy method: $\min_{x \in \mathcal{X}} (1 - \lambda)f(x) + \lambda g(x)$
🌐 Saddle point: $\nabla_x L(x^\star, \lambda^\star) = 0, \nabla_\lambda L(x^\star, \lambda^\star) = 0$
🌐 Primal-dual method: $x^{(k+1)} = \arg\min_{x \in \mathcal{X}} L(x, \lambda^{(k)})$
Penalty and Barrier Methods
🌐 Penalty function: $P(x) = f(x) + \sum_{i=1}^m \phi(g_i(x))$
🌐 Barrier function: $B(x) = f(x) - \mu \sum_{i=1}^m \log(-g_i(x))$
ADMM
🌐 ADMM update: $x^{(k+1)} = \arg\min_x \mathcal{L}_\rho(x, z^{(k)}, \lambda^{(k)})$, $z^{(k+1)} = \arg\min_z \mathcal{L}_\rho(x^{(k+1)}, z, \lambda^{(k)})$, $\lambda^{(k+1)} = \lambda^{(k)} + \rho(g(x^{(k+1)}) - z^{(k+1)})$
Proximal Gradient Methods
🌐 Proximal gradient: $x^{(k+1)} = \text{prox}_{\alpha h}(x^{(k)} - \alpha \nabla f(x^{(k)}))$,
🌐 Proximal operator: $\text{prox}_h(x) = \arg\min_y \left(h(y) + \frac{1}{2} ||y-x||^2\right)$,
🌐 FISTA update: $y^{(k+1)} = x^{(k)} - \alpha \nabla f(x^{(k)})$,
🌐 FISTA update: $x^{(k+1)} = \text{prox}_{\alpha h}(y^{(k+1)})$,
🌐 ISTA update: $x^{(k+1)} = \text{prox}_{\alpha h}(x^{(k)} - \alpha \nabla f(x^{(k)}))$,
🌐 Nesterov acceleration: $x^{(k+1)} = \text{prox}_{\alpha h}(y^{(k)} - \alpha \nabla f(y^{(k)}))$,
🌐 Nesterov momentum: $y^{(k+1)} = x^{(k+1)} + \frac{k}{k+3}(x^{(k+1)} - x^{(k)})$,
Linear Minimization and Frank-Wolfe Method
🌐 Frank-Wolfe method: $s^{(k)} = \arg\min_{s \in \mathcal{X}} \nabla f(x^{(k)})^T s$,
🌐 Frank-Wolfe update: $x^{(k+1)} = x^{(k)} + \gamma_k(s^{(k)} - x^{(k)})$,
Convergence and subdifferential calculus
🌐 Convergence rate: $\frac{f(x^{(k)}) - f(x^*)}{f(x^0) - f(x^\star)} \le \rho^k$,
🌐 Subdifferential: $\partial f(x) = \{v \in \mathbb{R}^n : f(y) \ge f(x) + v^T(y-x), \forall y \in \mathbb{R}^n\}$,
🌐 Subgradient method: $x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)}$,
🌐 Projected subgradient: $x^{(k+1)} = \mathcal{P}_{\mathcal{X}}(x^{(k)} - \alpha_k g^{(k)})$,
🌐 Concave-convex procedure: $x^{(k+1)} = \arg\min_x \left(\nabla f(x^{(k)})^T(x - x^{(k)}) + h(x)\right)$,
🌐 Smoothing approximation: $f_\epsilon(x) = \inf_{y \in \mathbb{R}^n} \left(f(y) + \frac{1}{2\epsilon} ||x - y||^2\right)$,
🌐 Augmented Lagrangian: $\mathcal{L}_\rho(x, \lambda) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \frac{\rho}{2} \sum_{i=1}^m g_i(x)^2$,
🌐 Projected gradient: $x^{(k+1)} = \mathcal{P}_{\mathcal{X}}(x^{(k)} - \alpha_k \nabla f(x^{(k)}))$,
Time Value
🌐 Compound interest: $A = P(1 + \frac{r}{n})^{nt}$
🌐 Continuous compounding: $A = Pe^{rt}$
🌐 Present value: $PV = \frac{FV}{(1 + r)^n}$
🌐 Future value: $FV = PV(1 + r)^n$
🌐 Simple interest: $I = Prt$
🌐 Annuity formula: $PV = \frac{PMT}{r}(1 - (1 + r)^{-n})$
🌐 Perpetuity formula: $PV = \frac{PMT}{r}$
Options
🌐 Black model: $C(F, K, T, r, \sigma) = e^{-rT}[FN(d_1) - KN(d_2)]$
🌐 Real options valuation: $\text{ROV} = f(S, K, T, r, \sigma, q)$
🌐 Black-Scholes-Merton: $C(S, K, T, r, \sigma, q) = Se^{-qT}N(d_1) - Ke^{-rT}N(d_2)$
🌐 Black-Scholes-Merton put: $P(S, K, T, r, \sigma, q) = Ke^{-rT}N(-d_2) - Se^{-qT}N(-d_1)$
🌐 Black-Scholes formula: $C(S, t) = SN(d_1) - Ke^{-r(T-t)}N(d_2)$
🌐 Black-Scholes: $C(S, K, T, r, \sigma) = SN(d_1) - Ke^{-rT}N(d_2)$
🌐 Black-Scholes put: $P(S, K, T, r, \sigma) = Ke^{-rT}N(-d_2) - SN(-d_1)$
🌐 $d_1$: $d_1 = \frac{1}{\sigma \sqrt{T}}\left[\ln\left(\frac{S}{K}\right) + \left(r + \frac{\sigma^2}{2}\right)T\right]$
🌐 $d_2$: $d_2 = d_1 - \sigma \sqrt{T}$
Greeks
🌐 Call rho: $\rho_C = KTe^{-rT}N(d_2)$
🌐 Put rho: $\rho_P = -KTe^{-rT}N(-d_2)$
🌐 Put-call parity: $C - P = S - Ke^{-rT}$
🌐 Binomial option pricing: $C_n = \frac{1}{(1+r)^n} \sum_{i=0}^n \binom{n}{i} p^i (1-p)^{n-i} \max(S(1+u)^i(1+d)^{n-i} - K, 0)$
🌐 Risk-neutral probability: $p = \frac{1+r-d}{u-d}$
🌐 Implied volatility: $\sigma_{\text{implied}} = \text{IV}(C, S, K, T, r)$
🌐 Option elasticity: $\text{Elasticity} = \frac{\Delta_C \cdot S}{C}$
🌐 Put-call ratio: $\text{PCR} = \frac{\text{Volume}_\text{Put}}{\text{Volume}_\text{Call}}$
🌐 Moneyness: $\text{Moneyness} = \frac{S}{K}$
🌐 Greeks neutralization: $\text{Neutralize} = -\sum \text{Greeks}$
🌐 Call delta: $\Delta_C = N(d_1)$
🌐 Put delta: $\Delta_P = -N(-d_1)$,
🌐 Call gamma: $\Gamma_C = \frac{N'(d_1)}{S\sigma\sqrt{T}}$,
🌐 Put gamma: $\Gamma_P = \Gamma_C$,
🌐 Call theta: $\Theta_C = -\frac{S\sigma N'(d_1)}{2\sqrt{T}} - rKe^{-rT}N(d_2)$,
🌐 Put theta: $\Theta_P = -\frac{S\sigma N'(-d_1)}{2\sqrt{T}} + rKe^{-rT}N(-d_2)$,
🌐 Call vega: $V_C = S\sqrt{T} N'(d_1)$,
🌐 Put vega: $V_P = V_C$,
Risk Measures
🌐 GARCH model: $\sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2$,
🌐 Risk-adjusted return: $\text{RAR} = \frac{\text{Return} - r_f}{\text{Volatility}}$,
🌐 Sharpe ratio: $\text{Sharpe} = \frac{\text{Return} - r_f}{\text{StandardDeviation}}$,
🌐 Sortino ratio: $\text{Sortino} = \frac{\text{Return} - r_f}{\text{DownsideDeviation}}$,
🌐 Treynor ratio: $\text{Treynor} = \frac{\text{Return} - r_f}{\text{Beta}}$,
🌐 Information ratio: $\text{IR} = \frac{\text{Return} - \text{Benchmark}}{\text{TrackingError}}$,
🌐 Jensen's alpha: $\text{Alpha} = \text{Return} - [r_f + \beta (\text{MarketReturn} - r_f)]$,
Portfolio Management
🌐 CAPM formula: $E(R_i) = R_f + \beta_i(E(R_m) - R_f)$,
🌐 Sharpe ratio: $\frac{E(R_p) - R_f}{\sigma_p}$,
🌐 Covariance of assets: $\beta_i = \frac{\text{Cov}(R_i, R_m)}{\text{Var}(R_m)}$,
🌐 Efficient frontier: $\sigma_p = \sqrt{w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\rho_{12}\sigma_1\sigma_2}$,
🌐 Markowitz portfolio: $w_i = \sum_{j=1}^n \frac{C_{ij}^{-1}(R_j - R_f)}{\sum_{j=1}^n \sum_{k=1}^n C_{jk}^{-1}(R_j - R_f)(R_k - R_f)}$,
🌐 Covariance matrix: $\Sigma = \begin{bmatrix} \sigma_1^2 & \rho_{12}\sigma_1\sigma_2 \\ \rho_{12}\sigma_1\sigma_2 & \sigma_2^2 \end{bmatrix}$,
🌐 Gordon growth model: $P_0 = \frac{D_1}{r - g}$,
Corporate Finance
Other
🌐 Dividend discount model: $P_0 = \sum_{t=1}^{\infty} \frac{D_t}{(1 + r)^t}$,
🌐 IRR formula: $NPV = \sum_{t=0}^n \frac{CF_t}{(1 + IRR)^t} = 0$,
🌐 NPV formula: $NPV = \sum_{t=0}^n \frac{CF_t}{(1 + r)^t}$,
🌐 WACC formula: $WACC = \frac{E}{V}R_e + \frac{D}{V}(1 - T_c)R_d$,
🌐 Dividend payout ratio: $\text{Payout Ratio} = \frac{\text{Dividends}}{\text{Net Income}}$,
🌐 Retention ratio: $\text{Retention Ratio} = 1 - \text{Payout Ratio}$,
🌐 Merton's model: $\text{DistanceToDefault} = \frac{\ln(\frac{V}{D}) + (r + \frac{\sigma_V^2}{2})T}{\sigma_V \sqrt{T}}$,
🌐 Risk-neutral density: $f^*(S_T) = \frac{e^{rT}}{C'(S_T)}$,
Bond Pricing and Duration
🌐 Effective duration: $\text{Duration} = \frac{V_+ - V_-}{2V \Delta y}$,
🌐 Effective convexity: $\text{Convexity} = \frac{V_+ + V_- - 2V}{V (\Delta y)^2}$,
🌐 Modified duration: $\text{ModDuration} = \frac{1}{1 + \frac{YTM}{m}} \cdot \text{Duration}$,
🌐 Macaulay duration: $\text{MacDuration} = \frac{\sum^n_{i=1} t_i PVCF_i}{V}$,
🌐 DV01: $\text{DV01} = \frac{\Delta V}{\Delta y}$,
🌐 Convexity adjustment: $\text{ConvexityAdj} = \frac{1}{2} \cdot \text{Convexity} \cdot (\Delta y)^2$,
🌐 Yield to maturity: $YTM = f(V, CF, t)$,
Fixed Income
🌐 Option adjusted spread: $OAS = Z - q$,
🌐 Z-spread: $Z = \text{Spread}(\text{YTM})$,
🌐 Forward rate: $f(t_1, t_2) = \frac{1}{t_2 - t_1} \left[\left(\frac{P(t_1)}{P(t_2)}\right)^{\frac{1}{t_2 - t_1}} - 1\right]$,
🌐 Swap rate: $S_t = \frac{\sum_{i=1}^n L(t_i) \Delta t}{\sum_{i=1}^n P(t_i) \Delta t}$,
🌐 Futures price: $F_t = Se^{(r - q)(T - t)}$,
🌐 Caplet pricing: $C_{\text{caplet}} = B(0, T)N(d_1) - KB(0, T_1)N(d_2)$,
🌐 Floorlet pricing: $C_{\text{floorlet}} = KB(0, T_1)N(-d_2) - B(0, T)N(-d_1)$,
🌐 Swaption pricing: $C_{\text{swaption}} = N(d_1) - (K / S)N(d_2)$,
🌐 CDS pricing: $\text{CDS} = \frac{\text{PV}_\text{Protection}}{\text{PV}_\text{Premium}}$,
Derivative Pricing Models
🌐 Cox-Ingersoll-Ross model: $dr_t = a(b - r_t)dt + \sigma\sqrt{r_t}dW_t$,
🌐 Vasicek model: $dr_t = a(b - r_t)dt + \sigma dW_t$,
🌐 Hull-White model: $dr_t = a(b(t) - r_t)dt + \sigma dW_t$,
🌐 Constant elasticity of variance: $dS_t = \mu S_t dt + \sigma S_t^\gamma dW_t$,
🌐 Heston model: $dS_t = \mu S_t dt + \sqrt{\nu_t} S_t dW_t^1$, $d\nu_t = \kappa(\theta - \nu_t) dt + \xi \sqrt{\nu_t} dW_t^2$,
🌐 Bachelier model: $C(S, K, T, \sigma) = (S - K)N(d) + \sigma \sqrt{T}N'(d)$,
🌐 Chen model: $dS_t = \mu S_t dt + \sigma S_t^\gamma dW_t$,
🌐 Girsanov theorem: $\frac{d\mathbb{Q}^*}{d\mathbb{Q}} = \exp\left(-\int_0^T \theta_t dW_t - \frac{1}{2}\int_0^T \theta_t^2 dt\right)$,
🌐 Martingale pricing: $V_t = \mathbb{E}^\mathbb{Q}\left[e^{-r(T-t)}V_T | \mathcal{F}_t\right]$,
🌐 Breeden-Litzenberger: $\frac{\partial^2 C}{\partial K^2} = e^{rT}f^*(K)$,
Basic Counting Principles
🌐 Factorial: $n! = n(n-1)(n-2)\dots1$,
🌐 Permutations: $_nP_r = \frac{n!}{(n-r)!}$,
🌐 Combinations: $_nC_r = \frac{n!}{r!(n-r)!}$,
🌐 Binomial theorem: $(a+b)^n = \sum_{k=0}^n {n \choose k} a^{n-k}b^k$,
🌐 Pascal's triangle: ${n \choose k} = {n-1 \choose k-1} + {n-1 \choose k}$,
🌐 Vandermonde's identity: $\sum_{k=0}^r {m \choose k} {n \choose r-k} = {m+n \choose r}$.
Advanced Counting
🌐
Inclusion-exclusion principle: $|A_1 \cup A_2 \cup \dots \cup A_n| = \sum_{i} |A_i| - \sum_{i
🌐 Double counting: $|A \times B| = |A| \cdot |B|$,
🌐 Permutations with repetition: $\frac{n!}{n_1!n_2!\dots n_k!}$, where $n_i$ are the repetitions of each element,
🌐 Derangement formula: $D_n = n!(1 - \frac{1}{1!} + \frac{1}{2!} - \frac{1}{3!} + \dots + (-1)^n \frac{1}{n!})$,
🌐 Necklace counting: $\frac{1}{n} \sum_{d|n} \phi(d) a^{n/d}$, where $a$ is the number of colors,
Generating Functions
🌐 Generating functions: $G(x) = \sum_{n=0}^{\infty} a_n x^n$,
🌐 Exponential generating functions: $E(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!}$
🌐 Ordinary generating functions: $F(x) = \sum_{n=0}^{\infty} a_n$
🌐 Generating functions for partitions: $p(n) = \sum_{k=0}^{\infty} p_k x^k$, where $p_k$ is the number of partitions of $n$ into exactly $k$ parts
Special Numbers
🌐 Catalan numbers: $C_n = \frac{1}{n+1}{2n \choose n}$,
🌐 Stirling numbers: $S(n,k) = S(n-1,k-1) + kS(n-1,k)$,
🌐 Stirling numbers of the first kind: $s(n,k) = (-1)^{n-k} \sum_{i=k}^n {n \choose i} (n-i)^{k-1}$
🌐 Bell numbers: $B_n = \sum_{k=0}^n S(n,k)$,
🌐 Euler's totient function: $\phi(n) = n\prod_{p|n}(1 - \frac{1}{p})$,
🌐 Moebius function: $\mu(n) = \begin{cases} 1 & \text{if } n = 1, \ 0 & \text{if } p^2 | n, \ (-1)^r & \text{if } n = p_1 p_2 \dots p_r \end{cases}$,
Number Theory
🌐 Mobius inversion formula: If $g(n) = \sum_{d|n} f(d)$, then $f(n) = \sum_{d|n} \mu(d) g(\frac{n}{d})$,
🌐 Eulerian numbers: $A(n, k) = (n-k)A(n-1, k-1) + (k+1)A(n-1, k)$,
Combinatorial Enumeration
🌐 Polya's enumeration theorem: If $G$ is a group of permutations acting on a set $X$, then $|X/G| = \frac{1}{|G|} \sum_{g \in G} |\operatorname{fix}(g)|$.
🌐 Pigeonhole principle: If $n$ items are placed into $m$ containers with $n > m$, then at least one container has more than one item,
Graph Combinatorics
🌐 Matching principle: $M_n = (2n+1)M_{n-1} + 3(2n-1)M_{n-2}$,
🌐 Chromatic polynomial: $P(G, k) = (-1)^{|V(G)|} \sum_{H \subseteq G} (-1)^{|V(H)|} k^{c(H)}$,
🌐 Kirchhoff's theorem: $\operatorname{spanning_trees}(G) = \frac{1}{n} \det(G^\star_{ij})$, where $G^\star_{ij}$ is the Laplacian matrix of G with row and column $i$ removed,
🌐 König's theorem: $\operatorname{minvertexcover}(G) = \operatorname{maxmatching}(G)$,
🌐 Hall's marriage theorem: If $|A| = |B|$, then there is a perfect matching in bipartite graph $G = (A \cup B, E)$ if and only if $|N(S)| \ge |S|$ for all $S \subseteq A$,
🌐 Dilworth's theorem: The minimum number of chains in a partition of a poset equals the length of the longest antichain,
🌐 Sperner's theorem: The largest antichain in the power set of an $n$-element set has size ${n \choose \lfloor \frac{n}{2} \rfloor}$,
Continuous Distributions
🌐 Probability density function: $f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$,
🌐 Cumulative distribution function: $F(x) = \int_{-\infty}^x f(t) dt$,
🌐 Exponential distribution: $f(x) = \lambda e^{-\lambda x}$,
🌐 Gamma distribution: $f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}$,
🌐 Beta distribution: $f(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}$,
🌐 Chi-squared distribution: $f(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{(k/2)-1}e^{-x/2}$,
🌐 F-distribution: $f(x) = \frac{\Gamma(\frac{v_1+v_2}{2})}{\Gamma(\frac{v_1}{2})\Gamma(\frac{v_2}{2})}(\frac{v_1}{v_2})^{\frac{v_1}{2}}\frac{x^{\frac{v_1}{2}-1}}{(1+\frac{v_1x}{v_2})^{\frac{v_1+v_2}{2}}}$,
Discrete Distributions
🌐 Probability mass function: $P(X=k)$
🌐 Binomial distribution: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$,
🌐 Poisson distribution: $P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$,
🌐 Geometric distribution: $P(X=k) = (1-p)^{k-1}p$,
🌐 Negative binomial distribution: $P(X=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}$,
Probability Concepts
🌐 Conditional probability: $P(A|B) = \frac{P(A \cap B)}{P(B)}$
🌐 Bayes' theorem: $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$,
🌐 Expectation of random variable: $E(X) = \sum_{i=1}^n x_iP(x_i)$,
🌐 Variance of random variable: $\text{Var}(X) = E[(X - \mu)^2]$,
🌐 Standard deviation: $\sigma = \sqrt{\text{Var}(X)}$,
🌐 Covariance: $\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$,
🌐 Correlation coefficient: $\rho_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}$,
🌐 Normal approximation: $z = \frac{x - \mu}{\sigma / \sqrt{n}}$,
🌐 Central limit theorem: $\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim N(0,1)$,
🌐 Wiener process: $dW_t = \epsilon \sqrt{dt}$,
🌐 Brownian motion: $W_t \sim N(0, t)$,
🌐 Geometric Brownian motion: $dS_t = \mu S_t dt + \sigma S_t dW_t$,
🌐 Ornstein-Uhlenbeck process: $dX_t = \theta(\mu - X_t) dt + \sigma dW_t$,
🌐 Ito's lemma: $df(t, X_t) = \left(\frac{\partial f}{\partial t} + \frac{1}{2}\sigma^2 \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x} dW_t$,
🌐 Martingale property: $\mathbb{E}[X_{t+1} | X_t] = X_t$,
🌐 Optional stopping theorem: If $\tau$ is a stopping time with $\mathbb{E}[\tau] < \infty$, then $\mathbb{E}[X_\tau] = \mathbb{E}[X_0]$,
🌐 Doob's martingale inequality: $\mathbb{P}(\max_{0 \le t \le T} |X_t| \ge a) \le \frac{\mathbb{E}[X_T^2]}{a^2}$,
🌐 Doob-Meyer decomposition: For any submartingale $X_t$, there exists a unique decomposition $X_t = M_t + A_t$, where $M_t$ is a martingale and $A_t$ is a predictable, increasing process,
🌐 Radon-Nikodym derivative: $\frac{d\mathbb{Q}}{d\mathbb{P}} = \frac{Z_T}{\mathbb{E}[Z_T]}$, where $Z_t = e^{-\int_0^t \theta_u dW_u - \frac{1}{2} \int_0^t \theta_u^2 du}$,
🌐 Girsanov's theorem: Under $\mathbb{Q}$, the process $\tilde{W}_t = W_t + \int_0^t \theta_u du$ is a Brownian motion,
🌐 Feynman-Kac formula: $u(t,x) = \mathbb{E}_x[e^{-\int_t^T r(s) ds} g(X_T)]$, where $X_t$ solves the SDE $dX_t = b(t,X_t)dt + \sigma(t,X_t)dW_t$.
🌐 Markov property: $\mathbb{P}(X_{t+1} | X_t, X_{t-1}, \dots, X_0) = \mathbb{P}(X_{t+1} | X_t)$,
🌐 Poisson process: $\mathbb{P}(N(t+dt) - N(t) = 1) = \lambda dt + o(dt)$,
🌐 Exponential inter-arrival times: $f_T(t) = \lambda e^{-\lambda t}$,
🌐 Exponential inter-arrival times: $f_T(t) = \lambda e^{-\lambda t}$,
🌐 Merton's jump diffusion: $dS_t = (\mu - \lambda k)S_t dt + \sigma S_t dW_t + (Y - 1)S_t dN_t$,
🌐 Schwartz model: $dS_t = \mu S_t dt + \sigma S_t dW_t$, $d\mu = \alpha(\theta - \mu) dt + \gamma dZ_t$,
🌐 Stratonovich integral: $\int_0^t H_s \circ dW_s = \lim_{\Delta t \to 0} \sum_{i=0}^{n-1} H_{t_i}(W_{t_{i+1}} - W_{t_i})$,
🌐 Doob's maximal inequality: $\mathbb{P}(\max_{0 \le t \le T} X_t \ge a) \le \frac{\mathbb{E}[X_T^2]}{a^2}$,
🌐 Kolmogorov's inequality: $\mathbb{P}(\max_{1 \le k \le n} |S_k| \ge a) \le \frac{\mathbb{E}[S_n^2]}{a^2}$,
🌐 Azuma's inequality: If $(X_t)$ is a martingale and $|X_t - X_{t-1}| \le c_t$, then $\mathbb{P}(\max_{0 \le t \le n} |X_t| \ge a) \le 2e^{-\frac{a^2}{2\sum_{t=1}^n c_t^2}}$,
🌐 Borell-TIS inequality: If $(X_t)$ is a Gaussian process with $\mathbb{E}[X_t] = 0$ and $\text{Var}(X_t) = \sigma_t^2$, then $\mathbb{P}(\max_{0 \le t \le T} X_t \ge a) \le e^{-\frac{a^2}{2\sigma_T^2}}$,
Statistical Inference
🌐 Confidence interval: $\bar{x} \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$
🌐 Hypothesis testing: $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$
🌐 Student's t-test: $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$
Analysis of Variance
🌐 ANOVA F-test: $F = \frac{\text{MSB}}{\text{MSW}}$
Regression Analysis
🌐 Coefficient of determination: $R^2 = 1 - \frac{\text{SSE}}{\text{SST}}$
🌐 Residual sum of squares: $\text{SSE} = \sum_{i=1}^n (y_i - \hat{y_i})^2$
🌐 Total sum of squares: $\text{SST} = \sum_{i=1}^n (y_i - \bar{y})^2$
Descriptive Statistics
🌐 Mean: $\bar{x} = \frac{\sum_{i=1}^n x_i}{n}$
🌐 Standard deviation: $s = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}}$
Ordinary Differential Equations
🌐 First-order ODE: $\frac{dy}{dt} = f(t, y)$
🌐 Second-order ODE: $\frac{d^2y}{dt^2} = f(t, y, \frac{dy}{dt})$
🌐 Linear ODE: $\sum_{i=0}^n a_i(t)\frac{d^iy}{dt^i} = g(t)$
🌐 Homogeneous ODE: $\sum_{i=0}^n a_i(t)\frac{d^iy}{dt^i} = 0$
Partial Differential Equations
🌐 Laplace's equation: $\nabla^2 u = 0$
🌐 Poisson's equation: $\nabla^2 u = f$
🌐 Heat equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u$
🌐 Wave equation: $\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u$
🌐 Transport equation: $\frac{\partial u}{\partial t} + c \frac{\partial u}{\partial x} = 0$
🌐 Schrödinger equation: $i\hbar\frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\nabla^2 \Psi + V\Psi$
🌐 Klein-Gordon equation: $\left(\frac{\partial^2}{\partial t^2} - c^2 \nabla^2 + m^2 c^4 \right) \phi = 0$
🌐 Navier-Stokes equation: $\frac{\partial \mathbf{v}}{\partial t} + (\mathbf{v} \cdot \nabla) \mathbf{v} = -\frac{1}{\rho}\nabla p + \nu \nabla^2 \mathbf{v} + \mathbf{f}$
🌐 Continuity equation: $\frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{v}) = 0$
🌐 Black-Scholes PDE: $\frac{\partial V}{\partial t} + rS\frac{\partial V}{\partial S} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} - rV = 0$
🌐 Fokker-Planck equation: $\frac{\partial p}{\partial t} = -\nabla \cdot (\mathbf{J}p) + \frac{1}{2}\sum_{i,j} \frac{\partial^2}{\partial x_i \partial x_j} (D_{ij}p)$
🌐 Burgers' equation: $\frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} = \nu \frac{\partial^2 u}{\partial x^2}$
🌐 Korteweg–de Vries equation: $\frac{\partial u}{\partial t} + 6u\frac{\partial u}{\partial x} + \frac{\partial^3 u}{\partial x^3} = 0$
Stochastic Differential Equations
🌐 SPDE general form: $\frac{\partial u}{\partial t} = L[u] + f(u, t, x) + g(u, t, x) \dot{W}(t, x)$
🌐 Stochastic heat equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u + \sigma u \dot{W}(t, x)$
🌐 Stochastic wave equation: $\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u + \sigma u \dot{W}(t, x)$
🌐 Stochastic Burgers' equation: $\frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} = \nu \frac{\partial^2 u}{\partial x^2} + \sigma u \dot{W}(t, x)$
🌐 Stochastic Navier-Stokes equation: $\frac{\partial \mathbf{v}}{\partial t} + (\mathbf{v} \cdot \nabla) \mathbf{v} = -\frac{1}{\rho}\nabla p + \nu \nabla^2 \mathbf{v} + \sigma \mathbf{v} \dot{W}(t, x) + \mathbf{f}$,
🌐 Stochastic Schrödinger equation: $i\hbar\frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\nabla^2 \Psi + V\Psi + \sigma \Psi \dot{W}(t, x)$,
🌐 Stochastic reaction-diffusion: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u + f(u) + g(u) \dot{W}(t, x)$,
🌐 Stochastic Korteweg–de Vries equation: $\frac{\partial u}{\partial t} + 6u\frac{\partial u}{\partial x} + \frac{\partial^3 u}{\partial x^3} = \sigma u \dot{W}(t, x)$,
🌐 Stochastic Fokker-Planck equation: $\frac{\partial p}{\partial t} = -\nabla \cdot (\mathbf{J}p) + \frac{1}{2}\sum_{i,j} \frac{\partial^2}{\partial x_i \partial x_j} (D_{ij}p) + \sigma p \dot{W}(t, x)$,
🌐 Stochastic Allen-Cahn equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u - f(u) + \sigma u \dot{W}(t, x)$,
🌐 Stochastic Ginzburg-Landau equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u - f(u) + \sigma \nabla \cdot (u \dot{W}(t, x))$,
🌐 Stochastic Cahn-Hilliard equation: $\frac{\partial u}{\partial t} = -\alpha \nabla^2 \mu + \sigma u \dot{W}(t, x)$, $\mu = -\nabla^2 u + f'(u)$,
🌐 Stochastic Kuramoto-Sivashinsky equation: $\frac{\partial u}{\partial t} = -\alpha \nabla^2 u - \beta \nabla^4 u - \gamma u \frac{\partial u}{\partial x} + \sigma u \dot{W}(t, x)$,
🌐 Stochastic Fisher-KPP equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u + \beta u(1 - u) + \sigma u \dot{W}(t, x)$,
🌐 Stochastic Benjamin-Ono equation: $\frac{\partial u}{\partial t} + \frac{\partial^3 u}{\partial x^3} + 6u\frac{\partial u}{\partial x} = \sigma u \dot{W}(t, x)$,
🌐 Bisection method: $c = \frac{a+b}{2}$,
🌐 Newton's method: $x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$,
🌐 Secant method: $x_{n+1} = x_n - f(x_n)\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}$,
🌐 Fixed-point iteration: $x_{n+1} = g(x_n)$
🌐 Lagrange interpolation: $L(x) = \sum_{i=0}^n f(x_i)\prod_{\substack{j=0 \\ j \neq i}}^n \frac{x-x_j}{x_i-x_j}$
🌐 Divided differences: $f[x_0, x_1, \dots, x_k] = \frac{f[x_1, \dots, x_k] - f[x_0, \dots, x_{k-1}]}{x_k - x_0}$
🌐 Newton's interpolation: $N(x) = f[x_0] + (x-x_0)f[x_0, x_1] + \dots + (x-x_0)\dots(x-x_{n-1})f[x_0, \dots, x_n]$
🌐 Simpson's rule: $\int_a^b f(x) dx \approx \frac{b-a}{6}(f(a) + 4f(\frac{a+b}{2}) + f(b))$
🌐 Trapezoidal rule: $\int_a^b f(x) dx \approx \frac{b-a}{2}(f(a) + f(b))$
🌐 Romberg integration: $R_{i,j} = R_{i-1,j} + \frac{1}{4^j-1}(R_{i-1,j} - R_{i-1,j-1})$
🌐 Gauss quadrature: $\int_a^b f(x) dx \approx \sum_{i=1}^n w_i f(x_i)$
🌐 Jacobi iteration: $x_i^{(k+1)} = \frac{1}{a_{ii}}(b_i - \sum_{\substack{j=1 \\ j \neq i}}^n a_{ij}x_j^{(k)})$
🌐 Gauss-Seidel iteration: $x_i^{(k+1)} = \frac{1}{a_{ii}}(b_i - \sum_{j=1}^{i-1} a_{ij}x_j^{(k+1)} - \sum_{j=i+1}^n a_{ij}x_j^{(k)})$
🌐 SOR method: $x_i^{(k+1)} = (1-\omega)x_i^{(k)} + \frac{\omega}{a_{ii}}(b_i - \sum_{j=1}^{i-1} a_{ij}x_j^{(k+1)} - \sum_{j=i+1}^n a_{ij}x_j^{(k)})$
🌐 Euler's method: $y_{n+1} = y_n + hf(t_n, y_n)$
🌐 Midpoint method: $y_{n+1} = y_n + h f(t_n + \frac{h}{2}, y_n + \frac{h}{2}f(t_n, y_n))$
🌐 Runge-Kutta method (4th order): $y_{n+1} = y_n + \frac{1}{6}(k_1 + 2k_2 + 2k_3 + k_4)$, where $k_1 = hf(t_n, y_n)$, $k_2 = hf(t_n + \frac{h}{2}, y_n + \frac{k_1}{2})$, $k_3 = hf(t_n + \frac{h}{2}, y_n + \frac{k_2}{2})$, $k_4 = hf(t_n + h, y_n + k_3)$
🌐 Adams-Bashforth method: $y_{n+1} = y_n + \frac{h}{2}(3f(t_n, y_n) - f(t_{n-1}, y_{n-1}))$
🌐 Finite difference: $\frac{\partial u}{\partial x}(x_i) \approx \frac{u(x_{i+1}) - u(x_{i-1})}{2\Delta x}$
🌐 Central difference: $\frac{\partial^2 u}{\partial x^2}(x_i) \approx \frac{u(x_{i+1}) - 2u(x_i) + u(x_{i-1})}{\Delta x^2}$
🌐 Thomas algorithm: $c_i' = \frac{c_i}{b_i - a_i c_{i-1}'}$, $d_i' = \frac{d_i - a_i d_{i-1}'}{b_i - a_i c_{i-1}'}$, $x_n = d_n'$, $x_i = d_i' - c_i' x_{i+1}$,
🌐 QR factorization: $A = QR$, where $Q$ is an orthogonal matrix and $R$ is an upper triangular matrix,
🌐 Singular value decomposition: $A = U\Sigma V^T$, where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix,
🌐 Power method: $x^{(k+1)} = \frac{Ax^{(k)}}{||Ax^{(k)}||}$,
🌐 Rayleigh quotient: $R(x) = \frac{x^TAx}{x^Tx}$,
🌐 Gram-Schmidt process: $v_i = u_i - \sum_{j=1}^{i-1} \frac{\langle u_i, v_j \rangle}{\langle v_j, v_j \rangle} v_j$,
🌐 LU factorization: $A = LU$, where $L$ is a lower triangular matrix and $U$ is an upper triangular matrix,
🌐 Cholesky factorization: $A = LL^T$, where $A$ is a symmetric positive definite matrix and $L$ is a lower triangular matrix.
Algebra and Analysis Inequalities
🌐 Triangle inequality: $||x+y|| \le ||x|| + ||y||$
🌐 Cauchy-Schwarz: $|x^T y| \le ||x|| ||y||$
🌐 Arithmetic-Geometric mean: $\frac{x_1 + x_2 + \cdots + x_n}{n} \ge \sqrt[n]{x_1x_2\cdots x_n}$
🌐 Jensen's inequality: $f(\frac{\sum_{i=1}^n \alpha_ix_i}{\sum_{i=1}^n \alpha_i}) \le \frac{\sum_{i=1}^n \alpha_if(x_i)}{\sum_{i=1}^n \alpha_i}$
🌐 Hölder's inequality: $\sum_{i=1}^n |x_iy_i| \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}}\left(\sum_{i=1}^n |y_i|^q\right)^{\frac{1}{q}}$
🌐 Minkowski inequality: $\left(\sum_{i=1}^n |x_i + y_i|^p\right)^{\frac{1}{p}} \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}} + \left(\sum_{i=1}^n |y_i|^p\right)^{\frac{1}{p}}$
🌐 Young's inequality: $ab \le \frac{a^p}{p} + \frac{b^q}{q}$
Concentration Inequalities
🌐 Chebyshev's inequality: $Pr(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}$,
🌐 Markov's inequality: $Pr(X \ge a) \le \frac{E[X]}{a}$,
🌐 Bernstein's inequality: $Pr\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \ge t\right) \le 2\exp\left(-\frac{t^2}{2(\sum_{i=1}^n \text{Var}(X_i) + \frac{t}{3}\sum_{i=1}^n E[|X_i-E[X_i]|])}\right)$,
🌐 Hoeffding's inequality: $Pr\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \ge t\right) \le 2\exp\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)$,
🌐 Azuma's inequality: $Pr\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \ge t\right) \le 2\exp\left(-\frac{t^2}{2\sum_{i=1}^n c_i^2}\right)$,
Optimization Inequalities and Functional Analysis
🌐 Schur's inequality: $a^pb^q + b^pc^r + c^pa^r \ge a^{p-r}c^{r-q}b^{q-p}$,
🌐 Rearrangement inequality: $\sum_{i=1}^n a_ib_i \ge \sum_{i=1}^n a_ic_i$,
🌐 Cauchy's interlacing: $\lambda_1(A) \ge \lambda_1(B) \ge \lambda_2(A) \ge \lambda_2(B) \ge \cdots \ge \lambda_n(A) \ge \lambda_n(B)$,
🌐 Schwarz inequality: $\sum_{i=1}^n a_ib_i \le \sqrt{\sum_{i=1}^n a_i^2} \sqrt{\sum_{i=1}^n b_i^2}$,
🌐 Poincaré inequality: $\int_\Omega |u|^2 \le C \int_\Omega |\nabla u|^2$,
🌐 Sobolev inequality: $\|u\|_{L^{p^*}(\Omega)} \le C \|\nabla u\|_{L^p(\Omega)}$,
🌐 Cramér-Rao inequality: $\text{Var}(\hat{\theta}) \ge \frac{1}{I(\theta)}$,
🌐 Paley-Zygmund inequality: $\Pr(|X - E[X]| \ge \alpha E[|X - E[X]|]) \ge (1-\alpha)^2 \frac{E[(X - E[X])^2]}{E[|X - E[X]|]^2}$,
🌐 Rademacher-Menchov inequality: $\text{Var}(\sum_{i=1}^n \varepsilon_i a_i) \le \frac{1}{4} \sum_{i=1}^n a_i^2$,
🌐 Rayleigh quotient: $\lambda_{\min}(A) \le \frac{x^TAx}{x^Tx} \le \lambda_{\max}(A)$
🌐 Courant-Fisher: $\lambda_k(A) = \min_{U_k} \max_{x \in U_k} \frac{x^TAx}{x^Tx}$
🌐 Gershgorin's theorem: $\lambda(A) \in \bigcup_{i=1}^n B(a_{ii}, R_i)$
🌐 Hadwiger's inequality: $V(K) \le \frac{1}{\sqrt{n}} \left(\frac{2}{e}\right)^{\frac{n-1}{2}} \prod_{i=1}^n r_i$
🌐 Bonferroni's inequality: $\Pr\left(\bigcup_{i=1}^n A_i\right) \le \sum_{i=1}^n \Pr(A_i) - \sum_{1 \le i < j \le n} \Pr(A_i \cap A_j)$
🌐 Boole's inequality: $\Pr\left(\bigcup_{i=1}^n A_i\right) \le \sum_{i=1}^n \Pr(A_i)$
🌐 Nash inequality: $\|u\|_{L^2(\Omega)}^2 \le C \|\nabla u\|_{L^2(\Omega)} \|u\|_{L^1(\Omega)}^{\frac{n-2}{n}}$
🌐 Hardy-Littlewood-Sobolev: $\|u\|_{L^q(\Omega)} \le C \|\nabla u\|_{L^p(\Omega)}$
Approximation Algorithms
🌐 Greedy set cover: $\text{Approx}(\text{SC}) = O(\log n)$,
🌐 Metric TSP: $\text{Approx}(\text{TSP}) \le 1.5 \cdot \text{OPT}$,
🌐 Minimum vertex cover: $\text{Approx}(\text{MVC}) \le 2 \cdot \text{OPT}$,
🌐 Max-cut: $\text{Approx}(\text{MC}) \ge \frac{1}{2} \cdot \text{OPT}$,
🌐 Max-k-cut: $\text{Approx}(\text{Max k-cut}) \ge \frac{k}{k-1} \cdot \text{OPT}$,
🌐 Knapsack: $\text{Approx}(\text{KP}) \ge (1 - \varepsilon) \cdot \text{OPT}$,
🌐 Bin packing: $\text{Approx}(\text{BP}) \le \frac{11}{9} \cdot \text{OPT} + 4$,
🌐 Luby's MIS: $\text{Approx}(\text{MIS}) = O(\log n)$,
🌐 FPTAS: $\text{Approx}(\text{FPTAS}) \ge (1 - \varepsilon) \cdot \text{OPT}$,
🌐 Christofides algorithm: $\text{Approx}(\text{TSP-Metric}) \le 1.5 \cdot \text{OPT}$,
🌐 Facility location: $\text{Approx}(\text{FL}) \le 2 \cdot \text{OPT}$,
🌐 Local search: $\text{Approx}(\text{LS}) \ge \text{OPT}$,
🌐 Fractional knapsack: $\text{Approx}(\text{FK}) = \text{OPT}$,
🌐 Greedy matching: $\text{Approx}(\text{GM}) \ge \frac{1}{2} \cdot \text{OPT}$,
🌐 Randomized rounding: $\text{Approx}(\text{RR}) \ge \text{OPT}$,
🌐 Scheduling with rejection: $\text{Approx}(\text{SchedulingRej}) \ge (1 - \varepsilon) \cdot \text{OPT}$,
🌐 Balanced allocation: $\text{E}[\text{Load}] = O(\log \log n)$,
🌐 Multiway cut: $\text{Approx}(\text{MWC}) \le 2 \cdot \text{OPT}$,
Randomized Algorithms
🌐 Karger's min-cut: $\text{Prob}(\text{min-cut}) \ge \frac{1}{\binom{n}{2}}$,
🌐 Fermat primality test: $\Pr(\text{composite}|\text{pass}) \le \frac{1}{2^k}$,
🌐 Miller-Rabin primality: $\Pr(\text{composite}|\text{pass}) \le \frac{1}{4^k}$,
🌐 Rabin-Karp: $\text{Prob}(\text{collision}) \le \frac{1}{M}$,
🌐 Metropolis-Hastings: $\text{Prob}(\text{stationary}) = \pi(x)$,
🌐 Gibbs sampling: $\text{Prob}(\text{sample}) \propto \pi(x)$,
🌐 Coupon collector: $\text{E}[T] = n \cdot \sum_{i=1}^n \frac{1}{i}$,
🌐 PageRank: $\text{Prob}(\text{rank}) = \alpha \cdot \text{E}[X] + (1-\alpha) \cdot \frac{1}{n}$,
🌐 Simulated annealing: $\text{Prob}(\text{near-optimal}) \ge 1 - \exp(-t)$,
🌐 Randomized quicksort: $\text{E}[T(n)] = O(n \log n)$,
🌐 Monte Carlo π: $\text{E}[\hat{\pi}] = \pi$,
🌐 Las Vegas algorithms: $\Pr(\text{correct}) = 1$,
🌐 Power of two choices: $\text{E}[\text{Load}] = O(\log \log n)$,
Graph Algorithms
🌐 Multiway cut: $\text{Approx}(\text{MWC}) \le 2 \cdot \text{OPT}$,
🌐 Johnson's algorithm: $\text{Approx}(\text{SM}) \le 2 - \frac{1}{k} \cdot \text{OPT}$,
🌐 Minimum spanning tree: $\text{Approx}(\text{MST}) = \text{OPT}$,
🌐 Prim's algorithm: $\text{Approx}(\text{MST-Prim}) = \text{OPT}$,
🌐 Kruskal's algorithm: $\text{Approx}(\text{MST-Kruskal}) = \text{OPT}$,
🌐 Boruvka's algorithm: $\text{Approx}(\text{MST-Boruvka}) = \text{OPT}$,
🌐 Max flow-min cut: $\text{MaxFlow}(G) = \text{MinCut}(G)$,
🌐 Edmonds-Karp algorithm: $\text{MaxFlow}(\text{EK}) = \text{OPT}$,
🌐 Ford-Fulkerson algorithm: $\text{MaxFlow}(\text{FF}) = \text{OPT}$,
🌐 Spectral clustering: $\text{Approx}(\text{SC}) \le \sqrt{2 \cdot \text{OPT}}$,
🌐 Traveling salesman LP: $\text{Approx}(\text{TSP-LP}) \le 2 \cdot \text{OPT}$,
Numerical Algorithms
🌐 SVD image compression: $\text{Approx}(\text{SVD}) \ge \text{OPT}$,
🌐 AKS primality: $\text{Prob}(\text{prime}) = 1$,
🌐 Load balancing: $\text{Approx}(\text{LB}) \le 2 \cdot \text{OPT}$,
🌐 Linear programming rounding: $\text{Approx}(\text{LP}) \ge \text{OPT}$,
Data Structures
🌐 Bloom filter: $\text{Prob}(\text{FP}) \le (1 - e^{-kn/m})^k$,
🌐 Randomized treaps: $\text{E}[\text{Height}] = O(\log n)$,
Online Algorithms
🌐 Ski-rental problem: $\text{CompetitiveRatio}(\text{SR}) \le 2$,
🌐 Online paging: $\text{CompetitiveRatio}(\text{Paging}) \le k+1$,
🌐 Multiplicative weight update: $\text{Approx}(\text{MWU}) \ge \frac{1}{\varepsilon} \cdot \text{OPT}$,
Other
🌐 Set cover LP: $\text{Approx}(\text{SC-LP}) = O(\log n)$,
🌐 Max-SAT: $\text{Approx}(\text{Max-SAT}) \ge \frac{1}{2} \cdot \text{OPT}$,
🌐 Max k-CNF-SAT: $\text{Approx}(\text{Max k-CNF-SAT}) \ge \frac{k-1}{k} \cdot \text{OPT}$,
🌐 Vertex cover LP: $\text{Approx}(\text{VC-LP}) \le 2 \cdot \text{OPT}$,
🌐 Dilworth's theorem: $\text{ChainPartition}(P) = \text{Width}(P)$,
🌐 Graham's algorithm: $\text{Approx}(\text{Makespan}) \le \frac{4}{3} \cdot \text{OPT}$,
🌐 List scheduling: $\text{Approx}(\text{LS}) \le 2 - \frac{1}{m} \cdot \text{OPT}$,
🌐 Dinic's algorithm: $\text{MaxFlow}(\text{Dinic}) = \text{OPT}$,
🌐 Push-relabel algorithm: $\text{MaxFlow}(\text{PR}) = \text{OPT}$,
🌐 Randomized incremental construction: $\text{E}[\text{Complexity}] = O(n \log n)$,
🌐 Yao's minimax principle: $\text{LowerBound}(\text{R}) \ge \text{UpperBound}(\text{D})$,
🌐 Nash equilibrium: $\text{Payoff}(\text{NE}) \ge \text{OPT}$,
Centrality Measures
🌐 Degree centrality: $C_D(v) = \frac{\deg(v)}{n-1}$,
🌐 Closeness centrality: $C_C(v) = \frac{1}{\sum_{u \neq v} d(u, v)}$,
Ranking Algorithms and Coefficients
🌐 Betweenness centrality: $C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}$,
🌐 Eigenvector centrality: $C_E(v) = \frac{1}{\lambda} \sum_{t \in N} a_{vt} C_E(t)$,
🌐 Katz centrality: $C_K(v) = \sum_{t=1}^\infty \sum_{j=1}^n \alpha^t (A^t)_{vj}$,
🌐 PageRank: $PR(v) = (1-\alpha) + \alpha \sum_{u \in N} \frac{PR(u)}{L(u)}$,
Graph Properties and Metrics
🌐 Clustering coefficient: $C(v) = \frac{2|E_N(v)|}{k_v(k_v-1)}$,
🌐 Global clustering coefficient: $C_G = \frac{3 \times \text{number of triangles}}{\text{number of connected triples}}$,
🌐 Transitivity: $T = \frac{\text{number of closed triplets}}{\text{number of triplets}}$,
🌐 Average shortest path length: $L = \frac{1}{n(n-1)} \sum_{s \neq t \in V} d(s, t)$,
Graph properties
🌐 Diameter: $D = \max_{s,t \in V} d(s, t)$
🌐 Eccentricity: $\varepsilon(v) = \max_{t \in V} d(v, t)$
🌐 Radius: $r = \min_{v \in V} \varepsilon(v)$
Community structure
🌐 Modularity: $Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{k_ik_j}{2m} \right] \delta(c_i, c_j)$
🌐 Assortativity: $r = \frac{\sum_{ij} ij(e_{ij} - q_i q_j)}{\sigma^2_q}$
Subgraphs
🌐 K-core: A subgraph $G'$ in which all vertices have degree $k$ or greater
Graph measures
🌐 Edge density: $\rho = \frac{2m}{n(n-1)}$
🌐 Small-world coefficient: $\sigma = \frac{C}{C_r} / \frac{L}{L_r}$
Similarity measures
🌐 Jaccard similarity: $J(u, v) = \frac{|N(u) \cap N(v)|}{|N(u) \cup N(v)|}$
🌐 Cosine similarity: $\cos(u, v) = \frac{|N(u) \cap N(v)|}{\sqrt{|N(u)| |N(v)|}}$
Index-based measures
🌐 Adamic-Adar index: $AA(u, v) = \sum_{z \in N(u) \cap N(v)} \frac{1}{\log{\deg(z)}}$
🌐 Resource allocation index: $RA(u, v) = \sum_{z \in N(u) \cap N(v)} \frac{1}{\deg(z)}$
Node degree measures
🌐 Preferential attachment: $PA(u, v) = |N(u)| \cdot |N(v)|$
Path-based measures
Shortest Path and Connected Components
🌐 Shortest path: $d(u, v) = \min_{\text{paths}(u,v)} \text{length}(p)$
🌐 Connected components: Subgraphs in which any two nodes are connected by a path,
Network Models
🌐 Erdős-Rényi model: $G(n, p)$
🌐 Watts-Strogatz model: $G(n, k, p)$
🌐 Barabási-Albert model: $G(n, m)$
Community Detection Algorithms
🌐 Girvan-Newman algorithm: $\text{Community}(G, \text{Betweenness})$
🌐 Louvain method: $\text{Community}(G, \text{Modularity})$
🌐 Label propagation algorithm: $\text{Community}(G, \text{Propagation})$
Cliques and Core-Periphery Structures
🌐 Clique: $K_n$
🌐 $k$-clique community: $C_k$
🌐 Core-periphery structure: $\text{CorePeriphery}(G)$
Rich-Club Coefficient
🌐 Rich-club coefficient: $\phi(k) = \frac{2E_k}{N_k(N_k - 1)}$
Graph Properties and Parameters
🌐 Handshaking lemma: $\sum_{v \in V} \text{deg}(v) = 2|E|$,
🌐 Euler's formula: $|V| - |E| + |F| = 2$,
🌐 Planar graph: $|E| \le 3|V| - 6$,
Graph Degree Characteristics
🌐 Maximum degree: $\Delta(G) = \max_{v \in V} \text{deg}(v)$,
Graph Properties and Parameters
🌐 Minimum degree: $δ(G) = \min_{v \in V} \text{deg}(v)$
🌐 Chromatic number: $χ(G) ≥ \frac{|V|}{α(G)}$
🌐 Chromatic index: $χ'(G) ≥ Δ(G)$
🌐 Vertex connectivity: $κ(G) = \min_{S ⊆ V} \frac{|S|}{|V(G-S)|}$
🌐 Edge connectivity: $λ(G) = \min_{F ⊆ E} \frac{|F|}{|E(G-F)|}$
Graph Cycles and Distances
🌐 Girth: $g(G) = \min_{C ⊆ G} |C|$
🌐 Clique number: $ω(G) = \max_{K ⊆ G} |K|$
🌐 Independence number: $α(G) = \max_{I ⊆ G} |I|$
🌐 Degree sum formula: $\sum_{v \in V} \text{deg}(v) = 2m$
🌐 Eulerian circuit: $deg(v) = 2k_v, ∀v \in V$
🌐 Hamiltonian cycle: $∀S ⊆ V, |N(S)| ≥ |S|$
🌐 Graph diameter: $diam(G) = \max_{u,v \in V} dist(u,v)$
🌐 Graph radius: $rad(G) = \min_{v \in V} ecc(v)$
🌐 Eccentricity: $ecc(v) = \max_{u \in V} dist(v, u)$
🌐 Wiener index: $W(G) = \frac{1}{2}\sum_{u, v \in V} dist(u, v)$
Graph Matrices and Theorems
🌐 Laplacian matrix: $L(G) = D(G) - A(G)$
🌐 Kirchhoff's matrix theorem: $n \cdot det(L') = det(L)$
🌐 Graph density: $D(G) = \frac{2|E|}{|V|(|V|-1)}$
🌐 Bipartite graph: $χ(G) = 2$
🌐 Petersen's theorem: $k \text{-regular} \Rightarrow k \text{-edge-colorable}$
🌐 Ramsey's theorem: $R(s, t) = \min\{R(s-1, t), R(s, t-1)\} + 1$
🌐 Erdős-Stone theorem: $\lim_{n \to \infty} \frac{\text{ex}(n, H)}{\binom{n}{2}} = 1 - \frac{1}{χ'(H) - 1}$
🌐 Dirac's theorem: $δ(G) ≥ \frac{|V|}{2} \Rightarrow \text{Hamiltonian cycle}$
🌐 Ore's theorem: $deg(u) + deg(v) ≥ |V| \Rightarrow \text{Hamiltonian cycle}$
🌐 Graph complement: $\overline{G} = (V, \overline{E})$
🌐 Turán's theorem: $ex(n, K_r) = \left(1-\frac{1}{r-1}\right)\frac{n^2}{2}$
Graph Minors, Planarity, and Coloring
🌐 Graph minors: $G \prec H \Leftrightarrow \exists G' \in G, H' \in H: G' \cong H'$,
🌐 Kuratowski's theorem: $G \text{ planar} \Leftrightarrow K_5, K_{3,3} \nprec G$,
🌐 Vizing's theorem: $\Delta(G) \le \chi'(G) \le \Delta(G) + 1$,
🌐 Brooks' theorem: $\chi(G) \le \Delta(G) + 1$,
Matching and Connectivity
🌐 Hall's marriage theorem: $|N(S)| \ge |S| \Rightarrow \text{perfect matching}$,
🌐 Menger's theorem: $\lambda(G) = \min_{X \subseteq V} \frac{|X|}{\text{conn}(X)}$,
Probabilistic Methods and Algorithms
🌐 Lovász Local Lemma: $\Pr\left(\bigcap_{i=1}^n \overline{A_i}\right) > 0$,
🌐 Graph isomorphism: $G \cong H \Leftrightarrow A(G) \sim A(H)$,
🌐 Havel-Hakimi algorithm: $\text{realizable}(d) \Leftrightarrow \text{realizable}(d')$,
Topological and Spectral Properties
🌐 Euler's characteristic: $\chi(G) = |V| - |E| + |F|$,
🌐 Spectral radius: $\rho(G) = \max\{\lambda_i : \lambda_i \in \Lambda(G)\}$,
🌐 Chromatic polynomial: $P(G, k) = \sum_{i=0}^{|V|} a_i k^{n-i}$,
🌐 Tutte's theorem: $G \text{ perfect matching} \Leftrightarrow \text{odd}(G-S) \le |S|, \forall S \subseteq V$,
Small Angle Approximations
🌐 Small angle approximation: $\sin x \approx x$
🌐 Small angle cosine: $\cos x \approx 1 - \frac{x^2}{2}$
🌐 Small angle tangent: $\tan x \approx x$
Higher Order Approximations
🌐 Sine higher order: $\sin x \approx x - \frac{x^3}{3!} + \frac{x^5}{5!}$
🌐 Cosine higher order: $\cos x \approx 1 - \frac{x^2}{2!} + \frac{x^4}{4!}$
🌐 Exponential approximation: $e^x \approx 1 + x + \frac{x^2}{2}$
🌐 Exponential higher order: $e^x \approx \sum_{n=0}^\infty \frac{x^n}{n!}$
🌐 Natural logarithm approximation: $\ln(1+x) \approx x - \frac{x^2}{2} + \frac{x^3}{3}$
🌐 Natural logarithm higher order: $\ln(1+x) \approx \sum_{n=1}^\infty \frac{(-1)^{n+1}x^n}{n}$
🌐 Binomial approximation: $(1+x)^n \approx 1 + nx$
🌐 Stirling's approximation: $n! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n$
🌐 Central limit theorem: $\frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \approx N(0,1)$
🌐 Poisson approximation: $P(X=k) \approx e^{-\lambda} \frac{\lambda^k}{k!}$
🌐 Sigmoid approximation: $\frac{1}{1+e^{-x}} \approx \frac{1}{2} + \frac{x}{4} - \frac{x^3}{48} + \frac{x^5}{480}$
🌐 Arc length approximation: $L \approx \sum_{i=1}^n \sqrt{\Delta x_i^2 + \Delta y_i^2}$
🌐 Area under curve approximation: $A \approx \sum_{i=1}^n f(x_i)\Delta x$
🌐 Euler's method: $y_{n+1} \approx y_n + h \cdot f(x_n, y_n)$
🌐 Trapezoidal rule: $\int_a^b f(x) dx \approx \frac{1}{2}h\left[f(x_0) + 2f(x_1) + 2f(x_2) + \dots + f(x_n)\right]$
🌐 Simpson's rule: $\int_a^b f(x) dx \approx \frac{h}{3}\left[f(x_0) + 4f(x_1) + 2f(x_2) + \dots + f(x_n)\right]$
🌐 Bode's rule: $\int_a^b f(x) dx \approx \frac{2h}{45}\left[7f(x_0) + 32f(x_1) + 12f(x_2) + 32f(x_3) + 7f(x_4)\right]$
🌐 Gaussian quadrature: $\int_a^b f(x) dx \approx \sum_{i=1}^n w_i f(x_i)$
🌐 Newton-Cotes formulas: $\int_a^b f(x) dx \approx \sum_{i=0}^n \alpha_i f(x_i)$
🌐 Laplace's method: $\int e^{-Mf(x)} dx \approx \sqrt{\frac{2\pi}{Mf''(x_0)}}e^{-Mf(x_0)}$
🌐 L'Hôpital's rule: $\lim_{x\to a} \frac{f(x)}{g(x)} \approx \lim_{x\to a} \frac{f'(x)}{g'(x)}$
🌐 Bernoulli's inequality: $(1+x)^n \ge 1+nx$
🌐 Taylor series: $f(x) \approx f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \dots + \frac{f^{(n)}(a)}{n!}(x-a)^n$
🌐 Maclaurin series: $f(x) \approx f(0) + f'(0)x + \frac{f''(0)}{2!}x^2 + \dots + \frac{f^{(n)}(0)}{n!}x^n$
🌐 Power series: $f(x) \approx \sum_{n=0}^\infty a_n (x-c)^n$
🌐 Riemann sum: $\int_a^b f(x) dx \approx \sum_{i=1}^n f(x_i) \Delta x$
🌐 Linear interpolation: $y \approx y_1 + \frac{y_2 - y_1}{x_2 - x_1} (x - x_1)$
Interpolation and Approximation
🌐 Cubic spline interpolation: $S(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$
🌐 Pade approximant: $R(x) = \frac{P_n(x)}{Q_m(x)}$
🌐 Continued fraction: $x \approx a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \ddots}}}$
Hyperbolic Functions and Rules of Thumb
🌐 Hyperbolic approximation: $\sinh x \approx x + \frac{x^3}{3!} + \frac{x^5}{5!} + \dots$
🌐 Hyperbolic cosine: $\cosh x \approx 1 + \frac{x^2}{2!} + \frac{x^4}{4!} + \dots$
🌐 Hyperbolic tangent: $\tanh x \approx x - \frac{x^3}{3} + \frac{2x^5}{15} - \frac{17x^7}{315}$
🌐 Rule of 72: $\text{Doubling Time} \approx \frac{72}{\text{Interest Rate}}$
Geometry and Statistics Relationships
🌐 Heron's formula: $A \approx \sqrt{s(s-a)(s-b)(s-c)}$
Geometry and Trigonometry
🌐 Pythagorean theorem: $a^2 + b^2 = c^2$
🌐 Slope-intercept form: $y = mx + b$
🌐 Sine function: $\sin{\theta} = \frac{opposite}{hypotenuse}$
🌐 Cosine function: $\cos{\theta} = \frac{adjacent}{hypotenuse}$
🌐 Tangent function: $\tan{\theta} = \frac{\sin{\theta}}{\cos{\theta}}$
🌐 Cosecant function: $\csc{\theta} = \frac{1}{\sin{\theta}}$
🌐 Secant function: $\sec{\theta} = \frac{1}{\cos{\theta}}$
🌐 Cotangent function: $\cot{\theta} = \frac{1}{\tan{\theta}}$
🌐 Law of sines: $\frac{\sin{A}}{a} = \frac{\sin{B}}{b} = \frac{\sin{C}}{c}$
🌐 Law of cosines: $c^2 = a^2 + b^2 - 2ab \cos{C}$
Calculus and Analysis
🌐 Euler's formula: $e^{ix} = \cos{x} + i\sin{x}$
Geometry and Measurement
🌐 Area of circle: $A = \pi r^2$
🌐 Circumference of circle: $C = 2\pi r$
🌐 Volume of sphere: $V = \frac{4}{3}\pi r^3$
🌐 Surface area of sphere: $A = 4\pi r^2$
Distance Metrics
🌐 Euclidean Distance: $d(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}$
Distance Metrics and Similarity Measures
🌐 Minkowski Distance: $d(x, y) = \left(\sum_{i=1}^n |x_i - y_i|^p\right)^{1/p}$
🌐 Cosine Similarity: $\text{cos}(\theta) = \frac{x \cdot y}{\|x\| \|y\|}$
🌐 Jaccard Similarity: $J(A, B) = \frac{|A \cap B|}{|A \cup B|}$
🌐 Dot Product: $x \cdot y = \sum_{i=1}^n x_i y_i$
🌐 Cross Product: $x \times y = \begin{pmatrix} x_2 y_3 - x_3 y_2, x_3 y_1 - x_1 y_3, x_1 y_2 - x_2 y_1 \end{pmatrix}$
🌐 Angle Between Vectors: $\theta = \arccos{\frac{x \cdot y}{\|x\| \|y\|}}$
🌐 Orthogonal Vectors: $x \cdot y = 0$
🌐 Parallel Vectors: $x = ky$ for some scalar $k$
🌐 Cosine Similarity: $\text{sim}(x, y) = \frac{x^T y}{\|x\|_2 \|y\|_2}$
🌐 Euclidean Distance: $d(x, y) = \|x - y\|_2$
🌐 Manhattan Distance: $d(x, y) = \|x - y\|_1$
🌐 Minkowski Distance: $d(x, y) = \left(\sum_{i=1}^n \left| x_i - y_i \right|^p \right)^{\frac{1}{p}}$
🌐 Hamming Distance: $d(x, y) = \sum_{i=1}^n \mathbf{1}(x_i \neq y_i)$
🌐 Jaccard Similarity: $J(A, B) = \frac{|A \cap B|}{|A \cup B|}$
🌐 Dice Similarity: $D(A, B) = \frac{2 |A \cap B|}{|A| + |B|}$
🌐 Tanimoto Similarity: $T(A, B) = \frac{A \cdot B}{\|A\|^2 + \|B\|^2 - A \cdot B}$
🌐 Wasserstein Distance: $W(p, q) = \inf_{\gamma \in \Pi(p, q)} \int_{X \times X} c(x, y) d\gamma(x, y)$
🌐 Sinkhorn Distance: $\text{Sinkhorn}(C, \lambda) = \min_{P \in \text{Birk}(p, q)} \langle P, C \rangle - \lambda H(P)$