CheatsheetGPT

I prompted GPT4 to help me create a cheatsheet for areas spanning AI, Computer Science, Finance, Operations Research, and Mathematics. Then, asked to grade the importance of each relationship for different people, and in some cases, to explain its answers.

🌐 Click me to see how it works.

🌐 Click to see the purpose.

🌐 Limitations & Disclaimers, Future Work.

Loading might take some time. Alpha Version. Last Update: 06/24/2023.

Select an Option:

No Grading Grade Publicity Relevance to Quants Relevance to AI Researchers Relevance to Data Science Interview Exploration Mode

Grade Cutoff :

•••

🌐 Linear Regression: $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon$

🌐 Logistic Regression: $P(y|x) = \sigma(w^Tx)$

🌐 Support Vector Machine: $\min_{w,b} \frac{1}{2} ||w||^2 + C \sum_{i=1}^{n} \max(0, 1 - y_i(w^Tx_i + b))$

🌐 Perceptron: $y = \mathbf{1}(w^Tx + b > 0)$

🌐 Adaline: $y = \phi(w^Tx + b)$, where $\phi$ is a linear activation function.

Naive Bayes

🌐 Naive Bayes Classifier: $p(y|x) = p(y)\prod_{i=1}^{n}p(x_i|y)$

🌐 Gaussian Naive Bayes: $p(x_i|y) = \mathcal{N}(x_i|\mu_{iy},\sigma_{iy}^2)$

🌐 Multinomial Naive Bayes: $p(x_i|y) = \frac{(\sum_{j=1}^{n} x_{ij})!}{\prod_{j=1}^{n} x_{ij}!} \prod_{j=1}^{n} \theta_{yj}^{x_{ij}}$

🌐 Bernoulli Naive Bayes: $p(x_i|y) = \theta_{iy}^{x_i} (1-\theta_{iy})^{1-x_i}$

Tree-Based Models

🌐 Decision Tree: $h(x) = \sum_{i=1}^{m} y_i \mathbf{1}(x \in R_i)$

🌐 Random Forest: $h(x) = \frac{1}{T} \sum_{t=1}^{T} h_t(x)$

Clustering

🌐 k-Means Clustering: $J = \sum_{i=1}^{n} \min_{j=1}^{k} ||x_i - \mu_j||^2$, where $\mu_j$ is the mean of the $j$-th cluster.

🌐 Hierarchical Clustering: $D_{ij} = ||x_i - x_j||$, $\min$ or $\max$ linkage between clusters.

🌐 Gaussian Mixture Model (GMM): $p(x) = \sum_{k=1}^{K} \pi_k \mathcal{N}(x|\mu_k, \Sigma_k)$

🌐 DBSCAN: $\text{Cluster}(x) = \text{Density}(x, \epsilon, minPts)$

🌐 Isolation Forest: $\text{Cluster}(x) = \text{Splits}(x)$

Neural Networks

🌐 Artificial Neural Network: $a^{(1)}=x,a^{(l)}=\sigma(z^{(l)}),z^{(l)}=w^{(l)} a^{(l-1)} + b^{(l)}$

🌐 Convolutional Neural Network: $y_{i,j} = \sigma(\sum_{m=1}^{M} \sum_{k=1}^{K} w_{m,k} x_{i+k-1,j+m-1} + b_m)$

🌐 Recurrent Neural Network (RNN): $h_t = f(h_{t-1}, x_t)$, where $f$ is a recurrent function.

🌐 Long Short-Term Memory (LSTM) RNN: $f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f)$, $i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i)$, $o_t = \sigma(W_o x_t + U_o h_{t-1} + b_o)$, $c_t = f_t \odot c_{t-1} + i_t \odot g(W_c x_t + U_c h_{t-1} + b_c)$, $h_t = o_t \odot \sigma(c_t)$

🌐 Autoencoder: $\mathrm{encoder}: h = f(x;\theta), \mathrm{decoder}: r = g(h;\theta')$

🌐 Artificial Neural Network Backpropagation: $\delta^{(L)} = \nabla_a J \odot \sigma'(z^{(L)})$, $\delta^{(l)} = ((w^{(l+1)})^T \delta^{(l+1)}) \odot \sigma'(z^{(l)})$

Gradient Boosting and Ensemble

🌐 Gradient Boosting: $F(x) = \sum_{m=1}^{M} \gamma_m f_m(x)$

🌐 Gradient Boosting Decision Tree: $y = \sum_{m=1}^{M} \gamma_m f_m(x)$, where $f_m$ is a decision tree and $\gamma_m$ is the learning rate for the $m$-th tree.

🌐 XGBoost: $\text{obj}(\theta) = \sum_{i=1}^{n} l(y_i, \hat{y}_i^{(t)}) + \sum_{i=1}^{t} \Omega(f_i)$

🌐 LightGBM: $\text{obj}(\theta) = \sum_{i=1}^{n} l(y_i, \hat{y}_i) + \sum_{j=1}^{T} \Omega(f_j)$

🌐 AdaBoost: $H(x) = \text{sign}\left(\sum_{t=1}^T \alpha_t h_t(x)\right)$,

🌐 Bagging: $h(x) = \frac{1}{T} \sum_{t=1}^T h_t(x)$,

🌐 Stacking: $\text{Meta-model}(M_1(x), M_2(x), ..., M_n(x))$

Dimensionality Reduction

🌐 Principal Component Analysis (PCA): $z = U^T(x-\mu)$, where $U$ is the eigenvector matrix.

🌐 Independent Component Analysis (ICA): $x = As$, where $A$ is the mixing matrix and $s$ is the independent source signals.

Classification Metrics

🌐 Receiver Operating Characteristic (ROC) Curve: $\text{TPR} = \frac{TP}{TP+FN}$, $\text{FPR} = \frac{FP}{FP+TN}$

🌐 Precision-Recall Curve: $Precision = \frac{TP}{TP+FP}$, $Recall = \frac{TP}{TP+FN}$

🌐 Confusion Matrix: $\begin{pmatrix}TN & FP\\FN & TP\end{pmatrix}$

🌐 F1 Score: $F1 = \frac{2 \times Precision \times Recall}{Precision + Recall}$

🌐 Area Under Curve (AUC): $\text{AUC} = \int \text{ROC}(\text{FPR}(t), \text{TPR}(t)) dt$

Regression Metrics

🌐 Mean Absolute Error (MAE): $\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$

🌐 Mean Squared Error (MSE): $\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

🌐 Root Mean Squared Error (RMSE): $\text{RMSE} = \sqrt{\text{MSE}}$

🌐 R2 Score: $R^2 = 1 - \frac{\text{MSE}}{\text{Var}(y)}$

Regularization Techniques

🌐 L1 Regularization (Lasso): $J(\theta) = \frac{1}{m} \sum_{i=1}^{m} L(y^{(i)}, \hat{y}^{(i)}) + \lambda \sum_{j=1}^{n} |\theta_j|$

🌐 L2 Regularization (Ridge): $J(\theta) = \frac{1}{m} \sum_{i=1}^{m} L(y^{(i)}, \hat{y}^{(i)}) + \frac{\lambda}{2} \sum_{j=1}^{n} \theta_j^2$

🌐 Elastic Net Regularization: $J(\theta) = \frac{1}{m} \sum_{i=1}^{m} L(y^{(i)}, \hat{y}^{(i)}) + \lambda_1 \sum_{j=1}^{n} |\theta_j| + \frac{\lambda_2}{2} \sum_{j=1}^{n} \theta_j^2$

🌐 Dropout: $y = \frac{1}{1-p} x \odot m$, where $m \in \{0,1\}$ is a binary mask with probability $p$ of being $1$.

Overfitting Control

🌐 Early Stopping: Stop training when the validation loss stops decreasing,

Kernel Methods

🌐 Radial Basis Function (RBF): $K(x, x') = \exp \left(-\frac{\|x - x'\|^2}{2\sigma^2} \right)$,

🌐 Polynomial Kernel: $K(x, x') = (x^T x' + c)^d$,

Regression Models

🌐 Locally Weighted Linear Regression (LWLR): $\min_{w \in \mathbb{R}^d} \sum_{i=1}^N w(x_i) (y_i - w^T x_i)^2$,

Classification Models

🌐 K-Nearest Neighbors: $y = \mathrm{mode}(\{y_i : x_i \in \mathcal{N}_k(x)\})$

🌐 Softmax Regression: $P(y|x) = \frac{\exp(w_y^T x + b_y)}{\sum_{j=1}^k \exp(w_j^T x + b_j)}$,

🌐 Sigmoid Kernel: $K(x, x') = \tanh (\kappa x^T x' + c)$,

🌐 Kernel SVM: $f(x) = \sum_{i=1}^N \alpha_i y_i K(x, x_i) + b$,

Unsupervised Learning Models

🌐 Gaussian Process: $p(f(x)|X, y, x) = \mathcal{N}(m(x), \sigma^2(x))$,

🌐 Latent Dirichlet Allocation (LDA): $p(w|\theta, \beta) = \sum_{z=1}^K p(w|z, \beta)p(z|\theta)$,

🌐 t-Distributed Stochastic Neighbor Embedding (t-SNE): $p_{j|i} = \frac{\exp(-||x_i - x_j||^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2)}$,

🌐 Spectral Clustering: $L = D - W$, where $D$ is the degree matrix and $W$ is the adjacency matrix,

🌐 Collaborative Filtering: $R \approx U^TV$, where $R$ is the rating matrix, $U$ is the user matrix, and $V$ is the item matrix,

🌐 Matrix Factorization: $R \approx \sum_{k=1}^K u_{ik}v_{kj}$,

Collaborative Filtering

Time Series Analysis

Recommendation Systems

Generative Models

Natural Language Processing

🌐 Word2Vec: $f(w) = \text{Embedding}(w)$,

🌐 GloVe: $F(w_i, w_j, \tilde{w}_k) = w_i^T \tilde{w}_k + b_i + \tilde{b}_k - \log X_{ij}$,

🌐 FastText: $f(w) = \sum_{g \in ngrams(w)} \text{Embedding}(g)$,

🌐 ELMo: $h(x) = \sum_{j=1}^L \gamma_j h_j(x)$,

🌐 Attention Mechanism: $\alpha_{ij} = \frac{\exp(e_{ij})}{\sum_{k=1}^T \exp(e_{ik})}$,

🌐 Self-Attention: $e_{ij} = a(s_i, s_j)$,

🌐 BERT: $\text{MaskedLM}(\text{Transformer}(x))$,

🌐 GPT: $p(x) = \prod_{t=1}^T p(x_t | x_{\le t})$,

Cluster Evaluation

🌐 Silhouette Score: $s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))}$,

Feature Engineering

🌐 One-hot Encoding: $x_i = \begin{cases} 1 & \text{if } c = c_i, \\ 0 & \text{otherwise} \end{cases}$,

🌐 Feature Standardization: $x' = \frac{x - \mu}{\sigma}$,

🌐 Label Encoding: Convert categorical variables into integer labels.

Model Evaluation

🌐 Cross-Validation: $\text{CV} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}$,

🌐 K-Fold Cross-Validation: $\text{CV} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}$

🌐 Stratified K-Fold Cross-Validation: $\text{CV}_{\text{stratified}} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}(\text{same class distribution})$

🌐 Time Series Cross-Validation: $\text{CV}_{\text{time}} = \frac{1}{k} \sum_{i=1}^k L_{test}^{(i)}(\text{time order})$

Activation Functions

🌐 Leaky ReLU: $f(x) = \begin{cases} x & \text{if } x > 0, \\ \alpha x & \text{otherwise} \end{cases}$,

🌐 Parametric ReLU (PReLU): $f(x) = \begin{cases} x & \text{if } x > 0, \\ \alpha_i x & \text{otherwise} \end{cases}$,

Feature Importance

🌐 Permutation Importance: $\text{importance} = \frac{\text{error} - \text{permuted error}}{\text{error}}$

Sequence Models

🌐 Hidden Markov Model (HMM): $\alpha_t(j) = p(o_1,o_2,...,o_t, q_t = j|\lambda)$, $\beta_t(j) = p(o_{t+1}, o_{t+2}, ..., o_T |q_t = j, \lambda)$, $\gamma_t(j) = p(q_t = j|O,\lambda)$, $\epsilon_t(i,j) = p(q_t = i,q_{t+1} = j|O,\lambda)$

Loss Functions

🌐 Mean Squared Logarithmic Error (MSLE): $L(y, \hat{y}) = \frac{1}{N} \sum_{i=1}^N (\log(1 + y_i) - \log(1 + \hat{y}_i))^2$,

🌐 Huber Loss: $L(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{for } |y - \hat{y}| \le \delta, \ \delta (|y - \hat{y}| - \frac{1}{2}\delta) & \text{otherwise} \end{cases}$,

🌐 Hinge Loss: $L(y, \hat{y}) = \max(0, 1 - y \cdot \hat{y})$,

🌐 KL Divergence: $D_{KL}(P || Q) = \sum_{i=1}^N P(i) \log \frac{P(i)}{Q(i)}$,

🌐 Categorical Cross-Entropy Loss: $L(y, \hat{y}) = -\sum_{i=1}^C y_i \log \hat{y}_i$,

🌐 Binary Cross-Entropy Loss: $L(y, \hat{y}) = -\sum_{i=1}^N [y_i \log \hat{y}_i + (1 - y_i)\log (1 - \hat{y}_i)]$,

Graph Embeddings

🌐 DeepWalk: $f(v) = \text{SkipGram}(v, v_1, \dots, v_{2k})$,

🌐 Node2Vec: $f(v) = \text{SkipGram}(v, v_1, \dots, v_{2k})$ with biased random walks,

🌐 GraphSAGE: $f(v) = \text{SAGE}(v, N(v))$,

Common ML Optimizers

🌐 Stochastic Gradient Descent with Momentum: $v_t = \gamma v_{t-1} + \eta \nabla f(x_t)$, $x_{t+1} = x_t - v_t$,

🌐 AdaGrad: $g_t = g_{t-1} + (\nabla f(x_t))^2$, $x_{t+1} = x_t - \frac{\eta}{\sqrt{g_t + \epsilon}} \nabla f(x_t)$,

🌐 Adam: $m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla f(x_t)$, $v_t = \beta_2 v_{t-1} + (1 - \beta_2) (\nabla f(x_t))^2$, $\hat{m}_t = \frac{m_t}{1 - \beta_1^t}$, $\hat{v}_t = \frac{v_t}{1 - \beta_2^t}$, $x_{t+1} = x_t - \frac{\eta \hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$,

🌐 AdaMax: $m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla f(x_t)$, $v_t = \max(\beta_2 v_{t-1}, |\nabla f(x_t)|)$, $x_{t+1} = x_t - \frac{\eta m_t}{v_t}$,

🌐 Nadam: $m_t = \beta_1 m_{t-1} + (1 - \beta_1) \nabla f(x_t)$, $v_t = \beta_2 v_{t-1} + (1 - \beta_2) (\nabla f(x_t))^2$, $\hat{m}_t = \frac{m_t}{1 - \beta_1^t} + \frac{(1 - \beta_1) \nabla f(x_t)}{1 - \beta_1^t}$, $\hat{v}_t = \frac{v_t}{1 - \beta_2^t}$, $x_{t+1} = x_t - \frac{\eta \hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon}$,

Evaluation Metrics

🌐 ROUGE: $R = \frac{\sum_{s \in \text{reference}} \sum_{n \in \text{grams}} \min \left(c_n(s), c_n(\text{candidate}) \right)}{\sum_{s \in \text{reference}} \sum_{n \in \text{grams}} c_n(s)}$,

🌐 BLEU: $BP \cdot \exp \left(\sum_{n=1}^N w_n \log p_n \right)$,

Normalization

🌐 Batch Normalization: $\hat{x}_i = \frac{x_i - E[x_i]}{\sqrt{Var[x_i] + \epsilon}}$, $y_i = \gamma \hat{x}_i + \beta$

Data Augmentation

Hyperparameter Tuning

Others

🌐 EM Algorithm: $Q(\theta, \theta^{(t)}) = E_{Z|X,\theta^{(t)}}[\log p(X, Z|\theta)]$,

🌐 Bias-Variance Tradeoff: $\text{Error} = \text{Bias}^2 + \text{Variance} + \text{Noise}$,

🌐 Transfer Learning: $\text{Performance}(\text{new}) = \text{Pretrained Model}(\text{similar}) + \Delta \text{Performance}$

🌐 Stratified Sampling: $\text{Sampled Data} = \text{Sample}(\text{Class}_1, \dots, \text{Class}_n)$

Value Functions and Bellman Equations

🌐 Markov Decision Process: $\mathcal{M} = (S, A, P, R, \gamma)$

🌐 State transition function: $P(s'|s, a)$

🌐 Reward function: $R(s, a, s')$ or $R(s, a)$,

🌐 State-value function: $V^\pi(s) = E[\sum_{t=0}^\infty \gamma^t R_t | S_0 = s, \pi]$,

🌐 Action-value function: $Q^\pi(s, a) = E[\sum_{t=0}^\infty \gamma^t R_t | S_0 = s, A_0 = a, \pi]$,

🌐 Bellman equation for $V^\pi$: $V^\pi(s) = \sum_a \pi(a|s) \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma V^\pi(s') \right]$,

🌐 Bellman equation for $Q^\pi$: $Q^\pi(s, a) = \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma \sum_{a'} \pi(a'|s') Q^\pi(s', a') \right]$,

🌐 Optimal state-value function: $V^\star(s) = \max_a Q^\star(s, a)$

🌐 Optimal action-value function: $Q^\star(s, a) = \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma \max_{a'} Q^(s', a') \right]$

Policy Improvement

🌐 Policy improvement: $\pi'(s) = \arg\max_a Q^\pi(s, a)$,

🌐 Policy iteration: $(1) \text{Policy Evaluation} \rightarrow (2) \text{Policy Improvement} \rightarrow (3) \text{Repeat until convergence}$,

Value-based Algorithms

🌐 Value iteration: $V_{t+1}(s) = \max_a \sum_{s'} P(s'|s, a) \left[ R(s, a, s') + \gamma V_t(s') \right]$,

🌐 Temporal Difference (TD) learning: $V(S_t) \leftarrow V(S_t) + \alpha \left[ R_{t+1} + \gamma V(S_{t+1}) - V(S_t) \right]$,

🌐 Q-learning: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma \max_a Q(S_{t+1}, a) - Q(S_t, A_t) \right]$,

🌐 SARSA: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t) \right]$,

Deep RL Algorithms

🌐 Markov Decision Process: $\mathcal{M} = (S, A, P, R, \gamma)$

🌐 Deep Q-Network (DQN) loss: $\min_\theta \sum_{(s, a, r, s', d) \in \mathcal{D}} \left[ r + (1 - d)\gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right]^2$,

🌐 Experience replay buffer: $\mathcal{D}$, a collection of tuples $(s, a, r, s', d)$ used to train the DQN,

🌐 Target network: $Q(s, a; \theta^-)$, a separate network with parameters $\theta^-$ that are periodically updated from the main network,

🌐 Double DQN (DDQN) loss: $\min_\theta \sum_{(s, a, r, s', d) \in \mathcal{D}} \left[ r + (1 - d)\gamma Q(s', \arg\max_{a'} Q(s', a'; \theta); \theta^-) - Q(s, a; \theta) \right]^2$,

🌐 Distributed DQN (DDQN): $Q(S_t, A_t; \theta) \leftarrow Q(S_t, A_t; \theta) + \alpha \left[ R_{t+1} + \gamma \max_a Q(S_{t+1}, a; \theta') - Q(S_t, A_t; \theta) \right]$,

🌐 Dueling DQN: $Q(s, a; \theta, \alpha, \beta) = V(s; \theta) + A(s, a; \alpha) - \frac{1}{|A|} \sum_{a'} A(s, a'; \beta)$,

🌐 Prioritized Experience Replay: $p_i = \left| r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right| + \epsilon$,

Policy Gradient Algorithms

🌐 Actor-Critic loss: $L(\theta, \phi) = -E_{\tau \sim \pi_{\theta}}[\sum_{t=0}^T \gamma^t (r(s_t, a_t) - V_{\phi}(s_t)) \nabla_\theta \log \pi_{\theta}(a_t|s_t)]$,

🌐 Advantage Actor-Critic (A2C): $L(\theta, \phi) = -E_{\tau \sim \pi_{\theta}}[\sum_{t=0}^T \gamma^t A_{\phi}(s_t, a_t)]$,

🌐 Proximal Policy Optimization (PPO): $L^{\text{CLIP}}(\theta) = E_t \left[ \min \left( \frac{\pi_\theta(a_t|s_t)}{\pi_{\theta_\text{old}}(a_t|s_t)} A^\text{adv}t, \text{clip} \left( \frac{\pi\theta(a_t|s_t)}{\pi_{\theta_\text{old}}(a_t|s_t)}, 1 - \epsilon, 1 + \epsilon \right) A^\text{adv}_t \right) \right]$,

🌐 Soft Actor-Critic (SAC): $J(\theta, \phi) = E_{\tau \sim \pi_{\theta}} \left[ \sum_{t=0}^T \gamma^t \left( r(s_t, a_t) - \alpha \log \pi_{\theta}(a_t|s_t) \right) \right]$,

🌐 Trust Region Policy Optimization (TRPO): $\text{maximize } \Delta L(\theta) \text{ subject to } KL(\pi_\theta | \pi_{\theta_\text{old}}) \leq \delta$,

🌐 Monte Carlo Policy Gradient: $\nabla J(\theta) = E_{\tau \sim \pi_{\theta}}[\sum_{t=0}^T \nabla_\theta \log \pi_{\theta}(a_t|s_t) R_t]$,

🌐 REINFORCE: $\theta \leftarrow \theta + \alpha \sum_{t=0}^T \nabla_\theta \log \pi_{\theta}(a_t|s_t) (R_t - b)$,

🌐 Natural Policy Gradient: $\nabla_{\theta} J(\theta) = F^{-1}(\theta) \nabla_{\theta} J(\theta)$,

Deterministic Policy Gradient

Reinforcement Learning For Games

🌐 TD-Gammon: $\delta_t = r_{t+1} + \gamma V(s_{t+1}; \theta) - V(s_t; \theta)$,

Exploration Strategies

🌐 ε-greedy exploration: $\pi(a|s) = \begin{cases} 1 - \epsilon + \frac{\epsilon}{|A|} & \text{if } a = \arg\max_{a'} Q(s, a') \ \frac{\epsilon}{|A|} & \text{otherwise} \end{cases}$,

🌐 Boltzmann exploration: $\pi(a|s) = \frac{\exp(Q(s, a) / \tau)}{\sum_{a'} \exp(Q(s, a') / \tau)}$,

Bandit Algorithms

🌐 Multi-Armed Bandit: $A_t = \arg\max_{a \in A} \left( Q_t(a) + c \sqrt{\frac{\log t}{N_t(a)}} \right)$,

🌐 Upper Confidence Bound (UCB): $A_t = \arg\max{a \in A} \left( \hat{\mu}_a + \sqrt{\frac{2 \log t}{n_a}} \right)$,

🌐 Thompson Sampling: $A_t = \arg\max_a \theta^\star_a$, $\theta^\star_a \sim \text{Beta}(\alpha_a, \beta_a)$,

🌐 Contextual Bandit: $A_t = \arg\max_{a \in A} \left( \theta^T x_{t, a} \right)$,

🌐 Linear UCB: $A_t = \arg\max_{a \in A} \left( \theta^T x_{t, a} + \alpha \sqrt{x_{t, a}^T V^{-1} x_{t, a}} \right)$,

🌐 LinUCB: $A_t = \arg\max_{a \in A} \left( \theta^T x_{t, a} + \alpha \sqrt{x_{t, a}^T A_a^{-1} x_{t, a}} \right)$,

🌐 EXP3: $\pi_t(a) = \frac{(1 - \gamma) \hat{w}{t-1}(a) + \gamma / K}{\sum{a'=1}^K ((1 - \gamma) \hat{w}_{t-1}(a') + \gamma / K)}$,

Advanced RL Algorithms

🌐 Entropy-regularized objective: $J(\theta) = E_{\tau \sim \pi_{\theta}} \left[ \sum_{t=0}^T \gamma^t \left( r(s_t, a_t) + \alpha H(\pi_{\theta}(\cdot|s_t)) \right) \right]$,

🌐 DDPG: $\nabla_{\theta^\mu} J = \mathbb{E}{s_t \sim D} \left[\nabla{a} Q(s, a|\theta^Q) \nabla_{\theta^\mu} \mu(s|\theta^\mu)\right]$,

🌐 Monte Carlo Tree Search (MCTS): $Q(s, a) = \frac{\sum_{i=1}^n R_i(s, a)}{N(s, a)}$,

🌐 Upper Confidence Bound for Trees (UCT): $a^* = \arg\max_a \left( Q(s, a) + c \sqrt{\frac{\log N(s)}{N(s, a)}} \right)$,

Temporal Difference Variants

🌐 Expected SARSA: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma E_{\pi}[Q(S_{t+1}, A_{t+1})] - Q(S_t, A_t) \right]$,

🌐 Dyna-Q: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} + \gamma \max_a Q(S_{t+1}, a) - Q(S_t, A_t) \right]$,

🌐 Model learning: $P(s'|s, a) \leftarrow P(s'|s, a) + \alpha \left[ 1 - P(s'|s, a) \right]$,

🌐 R-learning: $\rho \leftarrow \rho + \beta \left[ R_{t+1} - \rho + \gamma \max_a Q(S_{t+1}, a) - Q(S_t, A_t) \right]$,

🌐 Average Reward SARSA: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left[ R_{t+1} - \bar{R}t + Q(S{t+1}, A_{t+1}) - Q(S_t, A_t) \right]$,

Other

Optimization Problems

🌐 Optimization problem: $\min_{x \in \mathcal{X}} f(x)$

🌐 Feasible set: $\mathcal{X} = \{x \in \mathbb{R}^n : g_i(x) \le 0, i = 1, \ldots, m\}$

🌐 Linear Programming: $\min_{x \in \mathbb{R}^n} c^T x$ subject to $Ax \le b$

🌐 Quadratic Programming: $\min_{x \in \mathbb{R}^n} \frac{1}{2} x^T Q x + c^T x$ subject to $Ax \le b$

🌐 Constrained Optimization: $\min_{x \in X} f(x)$ subject to $g(x) \le 0$ and $h(x) = 0$

Lagrangian and duality

🌐 Lagrangian: $L(x, \lambda, \nu) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \sum_{j=1}^p \nu_j h_j(x)$

🌐 Lagrange multipliers: $\lambda^\star_i = -\frac{\partial f}{\partial g_i}(x^\star)$

Convexity

🌐 Convex function: $f(\alpha x + (1-\alpha)y) \le \alpha f(x) + (1-\alpha) f(y)$

🌐 Concave function: $f(\alpha x + (1-\alpha)y) \ge \alpha f(x) + (1-\alpha) f(y)$

Optimality Conditions

🌐 First-order condition: $\nabla f(x^\star) = 0$

🌐 Second-order condition: $H(x^\star) \succ 0$

🌐 KKT conditions: $\begin{cases} \nabla f(x^\star) + \sum_{i=1}^m \lambda_i^\star \nabla g_i(x^\star) = 0 \ g_i(x^\star) \le 0, \lambda_i^\star \ge 0, \lambda_i^\star g_i(x^\star) = 0, i=1,\ldots,m \ x^\star \in \mathcal{X} \end{cases}$

Gradient and Hessian

🌐 Gradient: $\nabla f(x) = \left(\frac{\partial f}{\partial x_1}(x), \ldots, \frac{\partial f}{\partial x_n}(x)\right)$, Hessian matrix: $H = \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \ \vdots & \ddots & \vdots \ \frac{\partial^2 f}{\partial x_n \partial x_1} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix}$,

Optimization Algorithms

🌐 Newton's method: $x^{(k+1)} = x^{(k)} - H^{-1}(x^{(k)})\nabla f(x^{(k)})$,

🌐 Quasi-Newton method: $x^{(k+1)} = x^{(k)} - B_k^{-1}\nabla f(x^{(k)})$,

🌐 Conjugate gradient: $x^{(k+1)} = x^{(k)} + \alpha_k p^{(k)}$,

Descent and search methods

🌐 Steepest descent: $p^{(k)} = -\nabla f(x^{(k)})$,

🌐 Line search: $\alpha_k = \arg\min_{\alpha > 0} f(x^{(k)} + \alpha p^{(k)})$,

🌐 Armijo rule: $f(x^{(k)} + \alpha p^{(k)}) \le f(x^{(k)}) + c\alpha \nabla f(x^{(k)})^T p^{(k)}$,

🌐 Wolfe conditions: $\begin{cases} f(x^{(k)} + \alpha p^{(k)}) \le f(x^{(k)}) + c_1\alpha \nabla f(x^{(k)})^T p^{(k)} \ \nabla f(x^{(k)} + \alpha p^{(k)})^T p^{(k)} \ge c_2 \nabla f(x^{(k)})^T p^{(k)} \end{cases}$,

🌐 Golden section search: $\frac{a_{n+1} - a_n}{a_n - a_{n-1}} = \frac{b_{n+1} - b_n}{b_n - b_{n-1}} = \phi$,

🌐 Bisection method: $x^{(k+1)} = \frac{a^{(k)} + b^{(k)}}{2}$,

Root-finding and fixed-point methods

🌐 Secant method: $x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)}) (x^{(k)} - x^{(k-1)})}{f(x^{(k)}) - f(x^{(k-1)})}$

🌐 Fixed-point iteration: $x^{(k+1)} = g(x^{(k)})$

🌐 Banach fixed-point theorem: $||g(x) - g(y)|| \le L||x - y||$

🌐 Lipschitz constant: $L = \sup_{x \neq y} \frac{||g(x) - g(y)||}{||x - y||}$

Iterative methods and approximations

🌐 Successive approximations: $x^{(k)} = x^{(0)} + \sum_{i=1}^k \Delta x^{(i)}$

🌐 Perturbed optimization: $\min_{x \in \mathcal{X}} f(x) + \epsilon g(x)$

Homotopy and Saddle Point methods

🌐 Homotopy method: $\min_{x \in \mathcal{X}} (1 - \lambda)f(x) + \lambda g(x)$

🌐 Saddle point: $\nabla_x L(x^\star, \lambda^\star) = 0, \nabla_\lambda L(x^\star, \lambda^\star) = 0$

🌐 Primal-dual method: $x^{(k+1)} = \arg\min_{x \in \mathcal{X}} L(x, \lambda^{(k)})$

Penalty and Barrier Methods

🌐 Penalty function: $P(x) = f(x) + \sum_{i=1}^m \phi(g_i(x))$

🌐 Barrier function: $B(x) = f(x) - \mu \sum_{i=1}^m \log(-g_i(x))$

ADMM

🌐 ADMM update: $x^{(k+1)} = \arg\min_x \mathcal{L}_\rho(x, z^{(k)}, \lambda^{(k)})$, $z^{(k+1)} = \arg\min_z \mathcal{L}_\rho(x^{(k+1)}, z, \lambda^{(k)})$, $\lambda^{(k+1)} = \lambda^{(k)} + \rho(g(x^{(k+1)}) - z^{(k+1)})$

Proximal Gradient Methods

🌐 Proximal gradient: $x^{(k+1)} = \text{prox}_{\alpha h}(x^{(k)} - \alpha \nabla f(x^{(k)}))$,

🌐 Proximal operator: $\text{prox}_h(x) = \arg\min_y \left(h(y) + \frac{1}{2} ||y-x||^2\right)$,

🌐 FISTA update: $y^{(k+1)} = x^{(k)} - \alpha \nabla f(x^{(k)})$,

🌐 FISTA update: $x^{(k+1)} = \text{prox}_{\alpha h}(y^{(k+1)})$,

🌐 ISTA update: $x^{(k+1)} = \text{prox}_{\alpha h}(x^{(k)} - \alpha \nabla f(x^{(k)}))$,

🌐 Nesterov acceleration: $x^{(k+1)} = \text{prox}_{\alpha h}(y^{(k)} - \alpha \nabla f(y^{(k)}))$,

🌐 Nesterov momentum: $y^{(k+1)} = x^{(k+1)} + \frac{k}{k+3}(x^{(k+1)} - x^{(k)})$,

Linear Minimization and Frank-Wolfe Method

🌐 Frank-Wolfe method: $s^{(k)} = \arg\min_{s \in \mathcal{X}} \nabla f(x^{(k)})^T s$,

🌐 Frank-Wolfe update: $x^{(k+1)} = x^{(k)} + \gamma_k(s^{(k)} - x^{(k)})$,

Convergence and subdifferential calculus

🌐 Convergence rate: $\frac{f(x^{(k)}) - f(x^*)}{f(x^0) - f(x^\star)} \le \rho^k$,

🌐 Subdifferential: $\partial f(x) = \{v \in \mathbb{R}^n : f(y) \ge f(x) + v^T(y-x), \forall y \in \mathbb{R}^n\}$,

🌐 Subgradient method: $x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)}$,

🌐 Projected subgradient: $x^{(k+1)} = \mathcal{P}_{\mathcal{X}}(x^{(k)} - \alpha_k g^{(k)})$,

🌐 Concave-convex procedure: $x^{(k+1)} = \arg\min_x \left(\nabla f(x^{(k)})^T(x - x^{(k)}) + h(x)\right)$,

🌐 Smoothing approximation: $f_\epsilon(x) = \inf_{y \in \mathbb{R}^n} \left(f(y) + \frac{1}{2\epsilon} ||x - y||^2\right)$,

🌐 Augmented Lagrangian: $\mathcal{L}_\rho(x, \lambda) = f(x) + \sum_{i=1}^m \lambda_i g_i(x) + \frac{\rho}{2} \sum_{i=1}^m g_i(x)^2$,

🌐 Projected gradient: $x^{(k+1)} = \mathcal{P}_{\mathcal{X}}(x^{(k)} - \alpha_k \nabla f(x^{(k)}))$,

Time Value

🌐 Compound interest: $A = P(1 + \frac{r}{n})^{nt}$

🌐 Continuous compounding: $A = Pe^{rt}$

🌐 Present value: $PV = \frac{FV}{(1 + r)^n}$

🌐 Future value: $FV = PV(1 + r)^n$

🌐 Simple interest: $I = Prt$

🌐 Annuity formula: $PV = \frac{PMT}{r}(1 - (1 + r)^{-n})$

🌐 Perpetuity formula: $PV = \frac{PMT}{r}$

Options

🌐 Black model: $C(F, K, T, r, \sigma) = e^{-rT}[FN(d_1) - KN(d_2)]$

🌐 Real options valuation: $\text{ROV} = f(S, K, T, r, \sigma, q)$

🌐 Black-Scholes-Merton: $C(S, K, T, r, \sigma, q) = Se^{-qT}N(d_1) - Ke^{-rT}N(d_2)$

🌐 Black-Scholes-Merton put: $P(S, K, T, r, \sigma, q) = Ke^{-rT}N(-d_2) - Se^{-qT}N(-d_1)$

🌐 Black-Scholes formula: $C(S, t) = SN(d_1) - Ke^{-r(T-t)}N(d_2)$

🌐 Black-Scholes: $C(S, K, T, r, \sigma) = SN(d_1) - Ke^{-rT}N(d_2)$

🌐 Black-Scholes put: $P(S, K, T, r, \sigma) = Ke^{-rT}N(-d_2) - SN(-d_1)$

🌐 $d_1$: $d_1 = \frac{1}{\sigma \sqrt{T}}\left[\ln\left(\frac{S}{K}\right) + \left(r + \frac{\sigma^2}{2}\right)T\right]$

🌐 $d_2$: $d_2 = d_1 - \sigma \sqrt{T}$

Greeks

🌐 Call rho: $\rho_C = KTe^{-rT}N(d_2)$

🌐 Put rho: $\rho_P = -KTe^{-rT}N(-d_2)$

🌐 Put-call parity: $C - P = S - Ke^{-rT}$

🌐 Binomial option pricing: $C_n = \frac{1}{(1+r)^n} \sum_{i=0}^n \binom{n}{i} p^i (1-p)^{n-i} \max(S(1+u)^i(1+d)^{n-i} - K, 0)$

🌐 Risk-neutral probability: $p = \frac{1+r-d}{u-d}$

🌐 Implied volatility: $\sigma_{\text{implied}} = \text{IV}(C, S, K, T, r)$

🌐 Option elasticity: $\text{Elasticity} = \frac{\Delta_C \cdot S}{C}$

🌐 Put-call ratio: $\text{PCR} = \frac{\text{Volume}_\text{Put}}{\text{Volume}_\text{Call}}$

🌐 Moneyness: $\text{Moneyness} = \frac{S}{K}$

🌐 Greeks neutralization: $\text{Neutralize} = -\sum \text{Greeks}$

🌐 Call delta: $\Delta_C = N(d_1)$

🌐 Put delta: $\Delta_P = -N(-d_1)$,

🌐 Call gamma: $\Gamma_C = \frac{N'(d_1)}{S\sigma\sqrt{T}}$,

🌐 Put gamma: $\Gamma_P = \Gamma_C$,

🌐 Call theta: $\Theta_C = -\frac{S\sigma N'(d_1)}{2\sqrt{T}} - rKe^{-rT}N(d_2)$,

🌐 Put theta: $\Theta_P = -\frac{S\sigma N'(-d_1)}{2\sqrt{T}} + rKe^{-rT}N(-d_2)$,

🌐 Call vega: $V_C = S\sqrt{T} N'(d_1)$,

🌐 Put vega: $V_P = V_C$,

Risk Measures

🌐 GARCH model: $\sigma_t^2 = \omega + \alpha \varepsilon_{t-1}^2 + \beta \sigma_{t-1}^2$,

🌐 Risk-adjusted return: $\text{RAR} = \frac{\text{Return} - r_f}{\text{Volatility}}$,

🌐 Sharpe ratio: $\text{Sharpe} = \frac{\text{Return} - r_f}{\text{StandardDeviation}}$,

🌐 Sortino ratio: $\text{Sortino} = \frac{\text{Return} - r_f}{\text{DownsideDeviation}}$,

🌐 Treynor ratio: $\text{Treynor} = \frac{\text{Return} - r_f}{\text{Beta}}$,

🌐 Information ratio: $\text{IR} = \frac{\text{Return} - \text{Benchmark}}{\text{TrackingError}}$,

🌐 Jensen's alpha: $\text{Alpha} = \text{Return} - [r_f + \beta (\text{MarketReturn} - r_f)]$,

Portfolio Management

🌐 CAPM formula: $E(R_i) = R_f + \beta_i(E(R_m) - R_f)$,

🌐 Sharpe ratio: $\frac{E(R_p) - R_f}{\sigma_p}$,

🌐 Covariance of assets: $\beta_i = \frac{\text{Cov}(R_i, R_m)}{\text{Var}(R_m)}$,

🌐 Efficient frontier: $\sigma_p = \sqrt{w_1^2\sigma_1^2 + w_2^2\sigma_2^2 + 2w_1w_2\rho_{12}\sigma_1\sigma_2}$,

🌐 Markowitz portfolio: $w_i = \sum_{j=1}^n \frac{C_{ij}^{-1}(R_j - R_f)}{\sum_{j=1}^n \sum_{k=1}^n C_{jk}^{-1}(R_j - R_f)(R_k - R_f)}$,

🌐 Covariance matrix: $\Sigma = \begin{bmatrix} \sigma_1^2 & \rho_{12}\sigma_1\sigma_2 \\ \rho_{12}\sigma_1\sigma_2 & \sigma_2^2 \end{bmatrix}$,

🌐 Gordon growth model: $P_0 = \frac{D_1}{r - g}$,

Corporate Finance

Other

🌐 Dividend discount model: $P_0 = \sum_{t=1}^{\infty} \frac{D_t}{(1 + r)^t}$,

🌐 IRR formula: $NPV = \sum_{t=0}^n \frac{CF_t}{(1 + IRR)^t} = 0$,

🌐 NPV formula: $NPV = \sum_{t=0}^n \frac{CF_t}{(1 + r)^t}$,

🌐 WACC formula: $WACC = \frac{E}{V}R_e + \frac{D}{V}(1 - T_c)R_d$,

🌐 Dividend payout ratio: $\text{Payout Ratio} = \frac{\text{Dividends}}{\text{Net Income}}$,

🌐 Retention ratio: $\text{Retention Ratio} = 1 - \text{Payout Ratio}$,

🌐 Merton's model: $\text{DistanceToDefault} = \frac{\ln(\frac{V}{D}) + (r + \frac{\sigma_V^2}{2})T}{\sigma_V \sqrt{T}}$,

🌐 Risk-neutral density: $f^*(S_T) = \frac{e^{rT}}{C'(S_T)}$,

Bond Pricing and Duration

🌐 Effective duration: $\text{Duration} = \frac{V_+ - V_-}{2V \Delta y}$,

🌐 Effective convexity: $\text{Convexity} = \frac{V_+ + V_- - 2V}{V (\Delta y)^2}$,

🌐 Modified duration: $\text{ModDuration} = \frac{1}{1 + \frac{YTM}{m}} \cdot \text{Duration}$,

🌐 Macaulay duration: $\text{MacDuration} = \frac{\sum^n_{i=1} t_i PVCF_i}{V}$,

🌐 DV01: $\text{DV01} = \frac{\Delta V}{\Delta y}$,

🌐 Convexity adjustment: $\text{ConvexityAdj} = \frac{1}{2} \cdot \text{Convexity} \cdot (\Delta y)^2$,

🌐 Yield to maturity: $YTM = f(V, CF, t)$,

Fixed Income

🌐 Option adjusted spread: $OAS = Z - q$,

🌐 Z-spread: $Z = \text{Spread}(\text{YTM})$,

🌐 Forward rate: $f(t_1, t_2) = \frac{1}{t_2 - t_1} \left[\left(\frac{P(t_1)}{P(t_2)}\right)^{\frac{1}{t_2 - t_1}} - 1\right]$,

🌐 Swap rate: $S_t = \frac{\sum_{i=1}^n L(t_i) \Delta t}{\sum_{i=1}^n P(t_i) \Delta t}$,

🌐 Futures price: $F_t = Se^{(r - q)(T - t)}$,

🌐 Caplet pricing: $C_{\text{caplet}} = B(0, T)N(d_1) - KB(0, T_1)N(d_2)$,

🌐 Floorlet pricing: $C_{\text{floorlet}} = KB(0, T_1)N(-d_2) - B(0, T)N(-d_1)$,

🌐 Swaption pricing: $C_{\text{swaption}} = N(d_1) - (K / S)N(d_2)$,

🌐 CDS pricing: $\text{CDS} = \frac{\text{PV}_\text{Protection}}{\text{PV}_\text{Premium}}$,

Derivative Pricing Models

🌐 Cox-Ingersoll-Ross model: $dr_t = a(b - r_t)dt + \sigma\sqrt{r_t}dW_t$,

🌐 Vasicek model: $dr_t = a(b - r_t)dt + \sigma dW_t$,

🌐 Hull-White model: $dr_t = a(b(t) - r_t)dt + \sigma dW_t$,

🌐 Constant elasticity of variance: $dS_t = \mu S_t dt + \sigma S_t^\gamma dW_t$,

🌐 Heston model: $dS_t = \mu S_t dt + \sqrt{\nu_t} S_t dW_t^1$, $d\nu_t = \kappa(\theta - \nu_t) dt + \xi \sqrt{\nu_t} dW_t^2$,

🌐 Bachelier model: $C(S, K, T, \sigma) = (S - K)N(d) + \sigma \sqrt{T}N'(d)$,

🌐 Chen model: $dS_t = \mu S_t dt + \sigma S_t^\gamma dW_t$,

🌐 Girsanov theorem: $\frac{d\mathbb{Q}^*}{d\mathbb{Q}} = \exp\left(-\int_0^T \theta_t dW_t - \frac{1}{2}\int_0^T \theta_t^2 dt\right)$,

🌐 Martingale pricing: $V_t = \mathbb{E}^\mathbb{Q}\left[e^{-r(T-t)}V_T | \mathcal{F}_t\right]$,

🌐 Breeden-Litzenberger: $\frac{\partial^2 C}{\partial K^2} = e^{rT}f^*(K)$,

Basic Counting Principles

🌐 Factorial: $n! = n(n-1)(n-2)\dots1$,

🌐 Permutations: $_nP_r = \frac{n!}{(n-r)!}$,

🌐 Combinations: $_nC_r = \frac{n!}{r!(n-r)!}$,

🌐 Binomial theorem: $(a+b)^n = \sum_{k=0}^n {n \choose k} a^{n-k}b^k$,

🌐 Pascal's triangle: ${n \choose k} = {n-1 \choose k-1} + {n-1 \choose k}$,

🌐 Vandermonde's identity: $\sum_{k=0}^r {m \choose k} {n \choose r-k} = {m+n \choose r}$.

Advanced Counting

🌐 Inclusion-exclusion principle: $|A_1 \cup A_2 \cup \dots \cup A_n| = \sum_{i} |A_i| - \sum_{i

🌐 Double counting: $|A \times B| = |A| \cdot |B|$,

🌐 Permutations with repetition: $\frac{n!}{n_1!n_2!\dots n_k!}$, where $n_i$ are the repetitions of each element,

🌐 Derangement formula: $D_n = n!(1 - \frac{1}{1!} + \frac{1}{2!} - \frac{1}{3!} + \dots + (-1)^n \frac{1}{n!})$,

🌐 Necklace counting: $\frac{1}{n} \sum_{d|n} \phi(d) a^{n/d}$, where $a$ is the number of colors,

Generating Functions

🌐 Generating functions: $G(x) = \sum_{n=0}^{\infty} a_n x^n$,

🌐 Exponential generating functions: $E(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!}$

🌐 Ordinary generating functions: $F(x) = \sum_{n=0}^{\infty} a_n$

🌐 Generating functions for partitions: $p(n) = \sum_{k=0}^{\infty} p_k x^k$, where $p_k$ is the number of partitions of $n$ into exactly $k$ parts

Special Numbers

🌐 Catalan numbers: $C_n = \frac{1}{n+1}{2n \choose n}$,

🌐 Stirling numbers: $S(n,k) = S(n-1,k-1) + kS(n-1,k)$,

🌐 Stirling numbers of the first kind: $s(n,k) = (-1)^{n-k} \sum_{i=k}^n {n \choose i} (n-i)^{k-1}$

🌐 Bell numbers: $B_n = \sum_{k=0}^n S(n,k)$,

🌐 Euler's totient function: $\phi(n) = n\prod_{p|n}(1 - \frac{1}{p})$,

🌐 Moebius function: $\mu(n) = \begin{cases} 1 & \text{if } n = 1, \ 0 & \text{if } p^2 | n, \ (-1)^r & \text{if } n = p_1 p_2 \dots p_r \end{cases}$,

Number Theory

🌐 Mobius inversion formula: If $g(n) = \sum_{d|n} f(d)$, then $f(n) = \sum_{d|n} \mu(d) g(\frac{n}{d})$,

🌐 Eulerian numbers: $A(n, k) = (n-k)A(n-1, k-1) + (k+1)A(n-1, k)$,

Combinatorial Enumeration

🌐 Polya's enumeration theorem: If $G$ is a group of permutations acting on a set $X$, then $|X/G| = \frac{1}{|G|} \sum_{g \in G} |\operatorname{fix}(g)|$.

🌐 Pigeonhole principle: If $n$ items are placed into $m$ containers with $n > m$, then at least one container has more than one item,

Graph Combinatorics

🌐 Matching principle: $M_n = (2n+1)M_{n-1} + 3(2n-1)M_{n-2}$,

🌐 Chromatic polynomial: $P(G, k) = (-1)^{|V(G)|} \sum_{H \subseteq G} (-1)^{|V(H)|} k^{c(H)}$,

🌐 Kirchhoff's theorem: $\operatorname{spanning_trees}(G) = \frac{1}{n} \det(G^\star_{ij})$, where $G^\star_{ij}$ is the Laplacian matrix of G with row and column $i$ removed,

🌐 König's theorem: $\operatorname{minvertexcover}(G) = \operatorname{maxmatching}(G)$,

🌐 Hall's marriage theorem: If $|A| = |B|$, then there is a perfect matching in bipartite graph $G = (A \cup B, E)$ if and only if $|N(S)| \ge |S|$ for all $S \subseteq A$,

🌐 Dilworth's theorem: The minimum number of chains in a partition of a poset equals the length of the longest antichain,

🌐 Sperner's theorem: The largest antichain in the power set of an $n$-element set has size ${n \choose \lfloor \frac{n}{2} \rfloor}$,

Continuous Distributions

🌐 Probability density function: $f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$,

🌐 Cumulative distribution function: $F(x) = \int_{-\infty}^x f(t) dt$,

🌐 Exponential distribution: $f(x) = \lambda e^{-\lambda x}$,

🌐 Gamma distribution: $f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}$,

🌐 Beta distribution: $f(x) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}$,

🌐 Chi-squared distribution: $f(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{(k/2)-1}e^{-x/2}$,

🌐 F-distribution: $f(x) = \frac{\Gamma(\frac{v_1+v_2}{2})}{\Gamma(\frac{v_1}{2})\Gamma(\frac{v_2}{2})}(\frac{v_1}{v_2})^{\frac{v_1}{2}}\frac{x^{\frac{v_1}{2}-1}}{(1+\frac{v_1x}{v_2})^{\frac{v_1+v_2}{2}}}$,

Discrete Distributions

🌐 Probability mass function: $P(X=k)$

🌐 Binomial distribution: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$,

🌐 Poisson distribution: $P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}$,

🌐 Geometric distribution: $P(X=k) = (1-p)^{k-1}p$,

🌐 Negative binomial distribution: $P(X=k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}$,

Probability Concepts

🌐 Conditional probability: $P(A|B) = \frac{P(A \cap B)}{P(B)}$

🌐 Bayes' theorem: $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$,

🌐 Expectation of random variable: $E(X) = \sum_{i=1}^n x_iP(x_i)$,

🌐 Variance of random variable: $\text{Var}(X) = E[(X - \mu)^2]$,

🌐 Standard deviation: $\sigma = \sqrt{\text{Var}(X)}$,

🌐 Covariance: $\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]$,

🌐 Correlation coefficient: $\rho_{XY} = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}$,

🌐 Normal approximation: $z = \frac{x - \mu}{\sigma / \sqrt{n}}$,

🌐 Central limit theorem: $\frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim N(0,1)$,

🌐 Wiener process: $dW_t = \epsilon \sqrt{dt}$,

🌐 Brownian motion: $W_t \sim N(0, t)$,

🌐 Geometric Brownian motion: $dS_t = \mu S_t dt + \sigma S_t dW_t$,

🌐 Ornstein-Uhlenbeck process: $dX_t = \theta(\mu - X_t) dt + \sigma dW_t$,

🌐 Ito's lemma: $df(t, X_t) = \left(\frac{\partial f}{\partial t} + \frac{1}{2}\sigma^2 \frac{\partial^2 f}{\partial x^2}\right) dt + \sigma \frac{\partial f}{\partial x} dW_t$,

🌐 Martingale property: $\mathbb{E}[X_{t+1} | X_t] = X_t$,

🌐 Optional stopping theorem: If $\tau$ is a stopping time with $\mathbb{E}[\tau] < \infty$, then $\mathbb{E}[X_\tau] = \mathbb{E}[X_0]$,

🌐 Doob's martingale inequality: $\mathbb{P}(\max_{0 \le t \le T} |X_t| \ge a) \le \frac{\mathbb{E}[X_T^2]}{a^2}$,

🌐 Doob-Meyer decomposition: For any submartingale $X_t$, there exists a unique decomposition $X_t = M_t + A_t$, where $M_t$ is a martingale and $A_t$ is a predictable, increasing process,

🌐 Radon-Nikodym derivative: $\frac{d\mathbb{Q}}{d\mathbb{P}} = \frac{Z_T}{\mathbb{E}[Z_T]}$, where $Z_t = e^{-\int_0^t \theta_u dW_u - \frac{1}{2} \int_0^t \theta_u^2 du}$,

🌐 Girsanov's theorem: Under $\mathbb{Q}$, the process $\tilde{W}_t = W_t + \int_0^t \theta_u du$ is a Brownian motion,

🌐 Feynman-Kac formula: $u(t,x) = \mathbb{E}_x[e^{-\int_t^T r(s) ds} g(X_T)]$, where $X_t$ solves the SDE $dX_t = b(t,X_t)dt + \sigma(t,X_t)dW_t$.

🌐 Markov property: $\mathbb{P}(X_{t+1} | X_t, X_{t-1}, \dots, X_0) = \mathbb{P}(X_{t+1} | X_t)$,

🌐 Poisson process: $\mathbb{P}(N(t+dt) - N(t) = 1) = \lambda dt + o(dt)$,

🌐 Exponential inter-arrival times: $f_T(t) = \lambda e^{-\lambda t}$,

🌐 Merton's jump diffusion: $dS_t = (\mu - \lambda k)S_t dt + \sigma S_t dW_t + (Y - 1)S_t dN_t$,

🌐 Schwartz model: $dS_t = \mu S_t dt + \sigma S_t dW_t$, $d\mu = \alpha(\theta - \mu) dt + \gamma dZ_t$,

🌐 Stratonovich integral: $\int_0^t H_s \circ dW_s = \lim_{\Delta t \to 0} \sum_{i=0}^{n-1} H_{t_i}(W_{t_{i+1}} - W_{t_i})$,

🌐 Doob's maximal inequality: $\mathbb{P}(\max_{0 \le t \le T} X_t \ge a) \le \frac{\mathbb{E}[X_T^2]}{a^2}$,

🌐 Kolmogorov's inequality: $\mathbb{P}(\max_{1 \le k \le n} |S_k| \ge a) \le \frac{\mathbb{E}[S_n^2]}{a^2}$,

🌐 Azuma's inequality: If $(X_t)$ is a martingale and $|X_t - X_{t-1}| \le c_t$, then $\mathbb{P}(\max_{0 \le t \le n} |X_t| \ge a) \le 2e^{-\frac{a^2}{2\sum_{t=1}^n c_t^2}}$,

🌐 Borell-TIS inequality: If $(X_t)$ is a Gaussian process with $\mathbb{E}[X_t] = 0$ and $\text{Var}(X_t) = \sigma_t^2$, then $\mathbb{P}(\max_{0 \le t \le T} X_t \ge a) \le e^{-\frac{a^2}{2\sigma_T^2}}$,

Statistical Inference

🌐 Confidence interval: $\bar{x} \pm z_{\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$

🌐 Hypothesis testing: $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$

🌐 Student's t-test: $t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}$

Analysis of Variance

🌐 ANOVA F-test: $F = \frac{\text{MSB}}{\text{MSW}}$

Regression Analysis

🌐 Coefficient of determination: $R^2 = 1 - \frac{\text{SSE}}{\text{SST}}$

🌐 Residual sum of squares: $\text{SSE} = \sum_{i=1}^n (y_i - \hat{y_i})^2$

🌐 Total sum of squares: $\text{SST} = \sum_{i=1}^n (y_i - \bar{y})^2$

Descriptive Statistics

🌐 Mean: $\bar{x} = \frac{\sum_{i=1}^n x_i}{n}$

🌐 Standard deviation: $s = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}}$

Ordinary Differential Equations

🌐 First-order ODE: $\frac{dy}{dt} = f(t, y)$

🌐 Second-order ODE: $\frac{d^2y}{dt^2} = f(t, y, \frac{dy}{dt})$

🌐 Linear ODE: $\sum_{i=0}^n a_i(t)\frac{d^iy}{dt^i} = g(t)$

🌐 Homogeneous ODE: $\sum_{i=0}^n a_i(t)\frac{d^iy}{dt^i} = 0$

Partial Differential Equations

🌐 Laplace's equation: $\nabla^2 u = 0$

🌐 Poisson's equation: $\nabla^2 u = f$

🌐 Heat equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u$

🌐 Wave equation: $\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u$

🌐 Transport equation: $\frac{\partial u}{\partial t} + c \frac{\partial u}{\partial x} = 0$

🌐 Schrödinger equation: $i\hbar\frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\nabla^2 \Psi + V\Psi$

🌐 Klein-Gordon equation: $\left(\frac{\partial^2}{\partial t^2} - c^2 \nabla^2 + m^2 c^4 \right) \phi = 0$

🌐 Navier-Stokes equation: $\frac{\partial \mathbf{v}}{\partial t} + (\mathbf{v} \cdot \nabla) \mathbf{v} = -\frac{1}{\rho}\nabla p + \nu \nabla^2 \mathbf{v} + \mathbf{f}$

🌐 Continuity equation: $\frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{v}) = 0$

🌐 Black-Scholes PDE: $\frac{\partial V}{\partial t} + rS\frac{\partial V}{\partial S} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 V}{\partial S^2} - rV = 0$

🌐 Fokker-Planck equation: $\frac{\partial p}{\partial t} = -\nabla \cdot (\mathbf{J}p) + \frac{1}{2}\sum_{i,j} \frac{\partial^2}{\partial x_i \partial x_j} (D_{ij}p)$

🌐 Burgers' equation: $\frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} = \nu \frac{\partial^2 u}{\partial x^2}$

🌐 Korteweg–de Vries equation: $\frac{\partial u}{\partial t} + 6u\frac{\partial u}{\partial x} + \frac{\partial^3 u}{\partial x^3} = 0$

Stochastic Differential Equations

🌐 SPDE general form: $\frac{\partial u}{\partial t} = L[u] + f(u, t, x) + g(u, t, x) \dot{W}(t, x)$

🌐 Stochastic heat equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u + \sigma u \dot{W}(t, x)$

🌐 Stochastic wave equation: $\frac{\partial^2 u}{\partial t^2} = c^2 \nabla^2 u + \sigma u \dot{W}(t, x)$

🌐 Stochastic Burgers' equation: $\frac{\partial u}{\partial t} + u\frac{\partial u}{\partial x} = \nu \frac{\partial^2 u}{\partial x^2} + \sigma u \dot{W}(t, x)$

🌐 Stochastic Navier-Stokes equation: $\frac{\partial \mathbf{v}}{\partial t} + (\mathbf{v} \cdot \nabla) \mathbf{v} = -\frac{1}{\rho}\nabla p + \nu \nabla^2 \mathbf{v} + \sigma \mathbf{v} \dot{W}(t, x) + \mathbf{f}$,

🌐 Stochastic Schrödinger equation: $i\hbar\frac{\partial \Psi}{\partial t} = -\frac{\hbar^2}{2m}\nabla^2 \Psi + V\Psi + \sigma \Psi \dot{W}(t, x)$,

🌐 Stochastic reaction-diffusion: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u + f(u) + g(u) \dot{W}(t, x)$,

🌐 Stochastic Korteweg–de Vries equation: $\frac{\partial u}{\partial t} + 6u\frac{\partial u}{\partial x} + \frac{\partial^3 u}{\partial x^3} = \sigma u \dot{W}(t, x)$,

🌐 Stochastic Fokker-Planck equation: $\frac{\partial p}{\partial t} = -\nabla \cdot (\mathbf{J}p) + \frac{1}{2}\sum_{i,j} \frac{\partial^2}{\partial x_i \partial x_j} (D_{ij}p) + \sigma p \dot{W}(t, x)$,

🌐 Stochastic Allen-Cahn equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u - f(u) + \sigma u \dot{W}(t, x)$,

🌐 Stochastic Ginzburg-Landau equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u - f(u) + \sigma \nabla \cdot (u \dot{W}(t, x))$,

🌐 Stochastic Cahn-Hilliard equation: $\frac{\partial u}{\partial t} = -\alpha \nabla^2 \mu + \sigma u \dot{W}(t, x)$, $\mu = -\nabla^2 u + f'(u)$,

🌐 Stochastic Kuramoto-Sivashinsky equation: $\frac{\partial u}{\partial t} = -\alpha \nabla^2 u - \beta \nabla^4 u - \gamma u \frac{\partial u}{\partial x} + \sigma u \dot{W}(t, x)$,

🌐 Stochastic Fisher-KPP equation: $\frac{\partial u}{\partial t} = \alpha \nabla^2 u + \beta u(1 - u) + \sigma u \dot{W}(t, x)$,

🌐 Stochastic Benjamin-Ono equation: $\frac{\partial u}{\partial t} + \frac{\partial^3 u}{\partial x^3} + 6u\frac{\partial u}{\partial x} = \sigma u \dot{W}(t, x)$,

🌐 Bisection method: $c = \frac{a+b}{2}$,

🌐 Newton's method: $x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$,

🌐 Secant method: $x_{n+1} = x_n - f(x_n)\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}$,

🌐 Fixed-point iteration: $x_{n+1} = g(x_n)$

🌐 Lagrange interpolation: $L(x) = \sum_{i=0}^n f(x_i)\prod_{\substack{j=0 \\ j \neq i}}^n \frac{x-x_j}{x_i-x_j}$

🌐 Divided differences: $f[x_0, x_1, \dots, x_k] = \frac{f[x_1, \dots, x_k] - f[x_0, \dots, x_{k-1}]}{x_k - x_0}$

🌐 Newton's interpolation: $N(x) = f[x_0] + (x-x_0)f[x_0, x_1] + \dots + (x-x_0)\dots(x-x_{n-1})f[x_0, \dots, x_n]$

🌐 Simpson's rule: $\int_a^b f(x) dx \approx \frac{b-a}{6}(f(a) + 4f(\frac{a+b}{2}) + f(b))$

🌐 Trapezoidal rule: $\int_a^b f(x) dx \approx \frac{b-a}{2}(f(a) + f(b))$

🌐 Romberg integration: $R_{i,j} = R_{i-1,j} + \frac{1}{4^j-1}(R_{i-1,j} - R_{i-1,j-1})$

🌐 Gauss quadrature: $\int_a^b f(x) dx \approx \sum_{i=1}^n w_i f(x_i)$

🌐 Jacobi iteration: $x_i^{(k+1)} = \frac{1}{a_{ii}}(b_i - \sum_{\substack{j=1 \\ j \neq i}}^n a_{ij}x_j^{(k)})$

🌐 Gauss-Seidel iteration: $x_i^{(k+1)} = \frac{1}{a_{ii}}(b_i - \sum_{j=1}^{i-1} a_{ij}x_j^{(k+1)} - \sum_{j=i+1}^n a_{ij}x_j^{(k)})$

🌐 SOR method: $x_i^{(k+1)} = (1-\omega)x_i^{(k)} + \frac{\omega}{a_{ii}}(b_i - \sum_{j=1}^{i-1} a_{ij}x_j^{(k+1)} - \sum_{j=i+1}^n a_{ij}x_j^{(k)})$

🌐 Euler's method: $y_{n+1} = y_n + hf(t_n, y_n)$

🌐 Midpoint method: $y_{n+1} = y_n + h f(t_n + \frac{h}{2}, y_n + \frac{h}{2}f(t_n, y_n))$

🌐 Runge-Kutta method (4th order): $y_{n+1} = y_n + \frac{1}{6}(k_1 + 2k_2 + 2k_3 + k_4)$, where $k_1 = hf(t_n, y_n)$, $k_2 = hf(t_n + \frac{h}{2}, y_n + \frac{k_1}{2})$, $k_3 = hf(t_n + \frac{h}{2}, y_n + \frac{k_2}{2})$, $k_4 = hf(t_n + h, y_n + k_3)$

🌐 Adams-Bashforth method: $y_{n+1} = y_n + \frac{h}{2}(3f(t_n, y_n) - f(t_{n-1}, y_{n-1}))$

🌐 Finite difference: $\frac{\partial u}{\partial x}(x_i) \approx \frac{u(x_{i+1}) - u(x_{i-1})}{2\Delta x}$

🌐 Central difference: $\frac{\partial^2 u}{\partial x^2}(x_i) \approx \frac{u(x_{i+1}) - 2u(x_i) + u(x_{i-1})}{\Delta x^2}$

🌐 Thomas algorithm: $c_i' = \frac{c_i}{b_i - a_i c_{i-1}'}$, $d_i' = \frac{d_i - a_i d_{i-1}'}{b_i - a_i c_{i-1}'}$, $x_n = d_n'$, $x_i = d_i' - c_i' x_{i+1}$,

🌐 QR factorization: $A = QR$, where $Q$ is an orthogonal matrix and $R$ is an upper triangular matrix,

🌐 Singular value decomposition: $A = U\Sigma V^T$, where $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix,

🌐 Power method: $x^{(k+1)} = \frac{Ax^{(k)}}{||Ax^{(k)}||}$,

🌐 Rayleigh quotient: $R(x) = \frac{x^TAx}{x^Tx}$,

🌐 Gram-Schmidt process: $v_i = u_i - \sum_{j=1}^{i-1} \frac{\langle u_i, v_j \rangle}{\langle v_j, v_j \rangle} v_j$,

🌐 LU factorization: $A = LU$, where $L$ is a lower triangular matrix and $U$ is an upper triangular matrix,

🌐 Cholesky factorization: $A = LL^T$, where $A$ is a symmetric positive definite matrix and $L$ is a lower triangular matrix.

Algebra and Analysis Inequalities

🌐 Triangle inequality: $||x+y|| \le ||x|| + ||y||$

🌐 Cauchy-Schwarz: $|x^T y| \le ||x|| ||y||$

🌐 Arithmetic-Geometric mean: $\frac{x_1 + x_2 + \cdots + x_n}{n} \ge \sqrt[n]{x_1x_2\cdots x_n}$

🌐 Jensen's inequality: $f(\frac{\sum_{i=1}^n \alpha_ix_i}{\sum_{i=1}^n \alpha_i}) \le \frac{\sum_{i=1}^n \alpha_if(x_i)}{\sum_{i=1}^n \alpha_i}$

🌐 Hölder's inequality: $\sum_{i=1}^n |x_iy_i| \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}}\left(\sum_{i=1}^n |y_i|^q\right)^{\frac{1}{q}}$

🌐 Minkowski inequality: $\left(\sum_{i=1}^n |x_i + y_i|^p\right)^{\frac{1}{p}} \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}} + \left(\sum_{i=1}^n |y_i|^p\right)^{\frac{1}{p}}$

🌐 Young's inequality: $ab \le \frac{a^p}{p} + \frac{b^q}{q}$

Concentration Inequalities

🌐 Chebyshev's inequality: $Pr(|X - \mu| \ge k\sigma) \le \frac{1}{k^2}$,

🌐 Markov's inequality: $Pr(X \ge a) \le \frac{E[X]}{a}$,

🌐 Bernstein's inequality: $Pr\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \ge t\right) \le 2\exp\left(-\frac{t^2}{2(\sum_{i=1}^n \text{Var}(X_i) + \frac{t}{3}\sum_{i=1}^n E[|X_i-E[X_i]|])}\right)$,

🌐 Hoeffding's inequality: $Pr\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \ge t\right) \le 2\exp\left(-\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}\right)$,

🌐 Azuma's inequality: $Pr\left(\left|\sum_{i=1}^n (X_i - E[X_i])\right| \ge t\right) \le 2\exp\left(-\frac{t^2}{2\sum_{i=1}^n c_i^2}\right)$,

Optimization Inequalities and Functional Analysis

🌐 Schur's inequality: $a^pb^q + b^pc^r + c^pa^r \ge a^{p-r}c^{r-q}b^{q-p}$,

🌐 Rearrangement inequality: $\sum_{i=1}^n a_ib_i \ge \sum_{i=1}^n a_ic_i$,

🌐 Cauchy's interlacing: $\lambda_1(A) \ge \lambda_1(B) \ge \lambda_2(A) \ge \lambda_2(B) \ge \cdots \ge \lambda_n(A) \ge \lambda_n(B)$,

🌐 Schwarz inequality: $\sum_{i=1}^n a_ib_i \le \sqrt{\sum_{i=1}^n a_i^2} \sqrt{\sum_{i=1}^n b_i^2}$,

🌐 Poincaré inequality: $\int_\Omega |u|^2 \le C \int_\Omega |\nabla u|^2$,

🌐 Sobolev inequality: $\|u\|_{L^{p^*}(\Omega)} \le C \|\nabla u\|_{L^p(\Omega)}$,

🌐 Cramér-Rao inequality: $\text{Var}(\hat{\theta}) \ge \frac{1}{I(\theta)}$,

🌐 Paley-Zygmund inequality: $\Pr(|X - E[X]| \ge \alpha E[|X - E[X]|]) \ge (1-\alpha)^2 \frac{E[(X - E[X])^2]}{E[|X - E[X]|]^2}$,

🌐 Rademacher-Menchov inequality: $\text{Var}(\sum_{i=1}^n \varepsilon_i a_i) \le \frac{1}{4} \sum_{i=1}^n a_i^2$,

🌐 Rayleigh quotient: $\lambda_{\min}(A) \le \frac{x^TAx}{x^Tx} \le \lambda_{\max}(A)$

🌐 Courant-Fisher: $\lambda_k(A) = \min_{U_k} \max_{x \in U_k} \frac{x^TAx}{x^Tx}$

🌐 Gershgorin's theorem: $\lambda(A) \in \bigcup_{i=1}^n B(a_{ii}, R_i)$

🌐 Hadwiger's inequality: $V(K) \le \frac{1}{\sqrt{n}} \left(\frac{2}{e}\right)^{\frac{n-1}{2}} \prod_{i=1}^n r_i$

🌐 Bonferroni's inequality: $\Pr\left(\bigcup_{i=1}^n A_i\right) \le \sum_{i=1}^n \Pr(A_i) - \sum_{1 \le i < j \le n} \Pr(A_i \cap A_j)$

🌐 Boole's inequality: $\Pr\left(\bigcup_{i=1}^n A_i\right) \le \sum_{i=1}^n \Pr(A_i)$

🌐 Nash inequality: $\|u\|_{L^2(\Omega)}^2 \le C \|\nabla u\|_{L^2(\Omega)} \|u\|_{L^1(\Omega)}^{\frac{n-2}{n}}$

🌐 Hardy-Littlewood-Sobolev: $\|u\|_{L^q(\Omega)} \le C \|\nabla u\|_{L^p(\Omega)}$

Approximation Algorithms

🌐 Greedy set cover: $\text{Approx}(\text{SC}) = O(\log n)$,

🌐 Metric TSP: $\text{Approx}(\text{TSP}) \le 1.5 \cdot \text{OPT}$,

🌐 Minimum vertex cover: $\text{Approx}(\text{MVC}) \le 2 \cdot \text{OPT}$,

🌐 Max-cut: $\text{Approx}(\text{MC}) \ge \frac{1}{2} \cdot \text{OPT}$,

🌐 Max-k-cut: $\text{Approx}(\text{Max k-cut}) \ge \frac{k}{k-1} \cdot \text{OPT}$,

🌐 Knapsack: $\text{Approx}(\text{KP}) \ge (1 - \varepsilon) \cdot \text{OPT}$,

🌐 Bin packing: $\text{Approx}(\text{BP}) \le \frac{11}{9} \cdot \text{OPT} + 4$,

🌐 Luby's MIS: $\text{Approx}(\text{MIS}) = O(\log n)$,

🌐 FPTAS: $\text{Approx}(\text{FPTAS}) \ge (1 - \varepsilon) \cdot \text{OPT}$,

🌐 Christofides algorithm: $\text{Approx}(\text{TSP-Metric}) \le 1.5 \cdot \text{OPT}$,

🌐 Facility location: $\text{Approx}(\text{FL}) \le 2 \cdot \text{OPT}$,

🌐 Local search: $\text{Approx}(\text{LS}) \ge \text{OPT}$,

🌐 Fractional knapsack: $\text{Approx}(\text{FK}) = \text{OPT}$,

🌐 Greedy matching: $\text{Approx}(\text{GM}) \ge \frac{1}{2} \cdot \text{OPT}$,

🌐 Randomized rounding: $\text{Approx}(\text{RR}) \ge \text{OPT}$,

🌐 Scheduling with rejection: $\text{Approx}(\text{SchedulingRej}) \ge (1 - \varepsilon) \cdot \text{OPT}$,

🌐 Balanced allocation: $\text{E}[\text{Load}] = O(\log \log n)$,

🌐 Multiway cut: $\text{Approx}(\text{MWC}) \le 2 \cdot \text{OPT}$,

Randomized Algorithms

🌐 Karger's min-cut: $\text{Prob}(\text{min-cut}) \ge \frac{1}{\binom{n}{2}}$,

🌐 Fermat primality test: $\Pr(\text{composite}|\text{pass}) \le \frac{1}{2^k}$,

🌐 Miller-Rabin primality: $\Pr(\text{composite}|\text{pass}) \le \frac{1}{4^k}$,

🌐 Rabin-Karp: $\text{Prob}(\text{collision}) \le \frac{1}{M}$,

🌐 Metropolis-Hastings: $\text{Prob}(\text{stationary}) = \pi(x)$,

🌐 Gibbs sampling: $\text{Prob}(\text{sample}) \propto \pi(x)$,

🌐 Coupon collector: $\text{E}[T] = n \cdot \sum_{i=1}^n \frac{1}{i}$,

🌐 PageRank: $\text{Prob}(\text{rank}) = \alpha \cdot \text{E}[X] + (1-\alpha) \cdot \frac{1}{n}$,

🌐 Simulated annealing: $\text{Prob}(\text{near-optimal}) \ge 1 - \exp(-t)$,

🌐 Randomized quicksort: $\text{E}[T(n)] = O(n \log n)$,

🌐 Monte Carlo π: $\text{E}[\hat{\pi}] = \pi$,

🌐 Las Vegas algorithms: $\Pr(\text{correct}) = 1$,

🌐 Power of two choices: $\text{E}[\text{Load}] = O(\log \log n)$,

Graph Algorithms

🌐 Multiway cut: $\text{Approx}(\text{MWC}) \le 2 \cdot \text{OPT}$,

🌐 Johnson's algorithm: $\text{Approx}(\text{SM}) \le 2 - \frac{1}{k} \cdot \text{OPT}$,

🌐 Minimum spanning tree: $\text{Approx}(\text{MST}) = \text{OPT}$,

🌐 Prim's algorithm: $\text{Approx}(\text{MST-Prim}) = \text{OPT}$,

🌐 Kruskal's algorithm: $\text{Approx}(\text{MST-Kruskal}) = \text{OPT}$,

🌐 Boruvka's algorithm: $\text{Approx}(\text{MST-Boruvka}) = \text{OPT}$,

🌐 Max flow-min cut: $\text{MaxFlow}(G) = \text{MinCut}(G)$,

🌐 Edmonds-Karp algorithm: $\text{MaxFlow}(\text{EK}) = \text{OPT}$,

🌐 Ford-Fulkerson algorithm: $\text{MaxFlow}(\text{FF}) = \text{OPT}$,

🌐 Spectral clustering: $\text{Approx}(\text{SC}) \le \sqrt{2 \cdot \text{OPT}}$,

🌐 Traveling salesman LP: $\text{Approx}(\text{TSP-LP}) \le 2 \cdot \text{OPT}$,

Numerical Algorithms

🌐 SVD image compression: $\text{Approx}(\text{SVD}) \ge \text{OPT}$,

🌐 AKS primality: $\text{Prob}(\text{prime}) = 1$,

🌐 Load balancing: $\text{Approx}(\text{LB}) \le 2 \cdot \text{OPT}$,

🌐 Linear programming rounding: $\text{Approx}(\text{LP}) \ge \text{OPT}$,

Data Structures

🌐 Bloom filter: $\text{Prob}(\text{FP}) \le (1 - e^{-kn/m})^k$,

🌐 Randomized treaps: $\text{E}[\text{Height}] = O(\log n)$,

Online Algorithms

🌐 Ski-rental problem: $\text{CompetitiveRatio}(\text{SR}) \le 2$,

🌐 Online paging: $\text{CompetitiveRatio}(\text{Paging}) \le k+1$,

🌐 Multiplicative weight update: $\text{Approx}(\text{MWU}) \ge \frac{1}{\varepsilon} \cdot \text{OPT}$,

Other

🌐 Set cover LP: $\text{Approx}(\text{SC-LP}) = O(\log n)$,

🌐 Max-SAT: $\text{Approx}(\text{Max-SAT}) \ge \frac{1}{2} \cdot \text{OPT}$,

🌐 Max k-CNF-SAT: $\text{Approx}(\text{Max k-CNF-SAT}) \ge \frac{k-1}{k} \cdot \text{OPT}$,

🌐 Vertex cover LP: $\text{Approx}(\text{VC-LP}) \le 2 \cdot \text{OPT}$,

🌐 Dilworth's theorem: $\text{ChainPartition}(P) = \text{Width}(P)$,

🌐 Graham's algorithm: $\text{Approx}(\text{Makespan}) \le \frac{4}{3} \cdot \text{OPT}$,

🌐 List scheduling: $\text{Approx}(\text{LS}) \le 2 - \frac{1}{m} \cdot \text{OPT}$,

🌐 Dinic's algorithm: $\text{MaxFlow}(\text{Dinic}) = \text{OPT}$,

🌐 Push-relabel algorithm: $\text{MaxFlow}(\text{PR}) = \text{OPT}$,

🌐 Randomized incremental construction: $\text{E}[\text{Complexity}] = O(n \log n)$,

🌐 Yao's minimax principle: $\text{LowerBound}(\text{R}) \ge \text{UpperBound}(\text{D})$,

🌐 Nash equilibrium: $\text{Payoff}(\text{NE}) \ge \text{OPT}$,

Centrality Measures

🌐 Degree centrality: $C_D(v) = \frac{\deg(v)}{n-1}$,

🌐 Closeness centrality: $C_C(v) = \frac{1}{\sum_{u \neq v} d(u, v)}$,

Ranking Algorithms and Coefficients

🌐 Betweenness centrality: $C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}$,

🌐 Eigenvector centrality: $C_E(v) = \frac{1}{\lambda} \sum_{t \in N} a_{vt} C_E(t)$,

🌐 Katz centrality: $C_K(v) = \sum_{t=1}^\infty \sum_{j=1}^n \alpha^t (A^t)_{vj}$,

🌐 PageRank: $PR(v) = (1-\alpha) + \alpha \sum_{u \in N} \frac{PR(u)}{L(u)}$,

Graph Properties and Metrics

🌐 Clustering coefficient: $C(v) = \frac{2|E_N(v)|}{k_v(k_v-1)}$,

🌐 Global clustering coefficient: $C_G = \frac{3 \times \text{number of triangles}}{\text{number of connected triples}}$,

🌐 Transitivity: $T = \frac{\text{number of closed triplets}}{\text{number of triplets}}$,

🌐 Average shortest path length: $L = \frac{1}{n(n-1)} \sum_{s \neq t \in V} d(s, t)$,

Graph properties

🌐 Diameter: $D = \max_{s,t \in V} d(s, t)$

🌐 Eccentricity: $\varepsilon(v) = \max_{t \in V} d(v, t)$

🌐 Radius: $r = \min_{v \in V} \varepsilon(v)$

Community structure

🌐 Modularity: $Q = \frac{1}{2m} \sum_{i,j} \left[ A_{ij} - \frac{k_ik_j}{2m} \right] \delta(c_i, c_j)$

🌐 Assortativity: $r = \frac{\sum_{ij} ij(e_{ij} - q_i q_j)}{\sigma^2_q}$

Subgraphs

🌐 K-core: A subgraph $G'$ in which all vertices have degree $k$ or greater

Graph measures

🌐 Edge density: $\rho = \frac{2m}{n(n-1)}$

🌐 Small-world coefficient: $\sigma = \frac{C}{C_r} / \frac{L}{L_r}$

Similarity measures

🌐 Jaccard similarity: $J(u, v) = \frac{|N(u) \cap N(v)|}{|N(u) \cup N(v)|}$

🌐 Cosine similarity: $\cos(u, v) = \frac{|N(u) \cap N(v)|}{\sqrt{|N(u)| |N(v)|}}$

Index-based measures

🌐 Adamic-Adar index: $AA(u, v) = \sum_{z \in N(u) \cap N(v)} \frac{1}{\log{\deg(z)}}$

🌐 Resource allocation index: $RA(u, v) = \sum_{z \in N(u) \cap N(v)} \frac{1}{\deg(z)}$

Node degree measures

🌐 Preferential attachment: $PA(u, v) = |N(u)| \cdot |N(v)|$

Path-based measures

Shortest Path and Connected Components

🌐 Shortest path: $d(u, v) = \min_{\text{paths}(u,v)} \text{length}(p)$

🌐 Connected components: Subgraphs in which any two nodes are connected by a path,

Network Models

🌐 Erdős-Rényi model: $G(n, p)$

🌐 Watts-Strogatz model: $G(n, k, p)$

🌐 Barabási-Albert model: $G(n, m)$

Community Detection Algorithms

🌐 Girvan-Newman algorithm: $\text{Community}(G, \text{Betweenness})$

🌐 Louvain method: $\text{Community}(G, \text{Modularity})$

🌐 Label propagation algorithm: $\text{Community}(G, \text{Propagation})$

Cliques and Core-Periphery Structures

🌐 Clique: $K_n$

🌐 $k$-clique community: $C_k$

🌐 Core-periphery structure: $\text{CorePeriphery}(G)$

Rich-Club Coefficient

🌐 Rich-club coefficient: $\phi(k) = \frac{2E_k}{N_k(N_k - 1)}$

Graph Properties and Parameters

🌐 Handshaking lemma: $\sum_{v \in V} \text{deg}(v) = 2|E|$,

🌐 Euler's formula: $|V| - |E| + |F| = 2$,

🌐 Planar graph: $|E| \le 3|V| - 6$,

Graph Degree Characteristics

🌐 Maximum degree: $\Delta(G) = \max_{v \in V} \text{deg}(v)$,

Graph Properties and Parameters

🌐 Minimum degree: $δ(G) = \min_{v \in V} \text{deg}(v)$

🌐 Chromatic number: $χ(G) ≥ \frac{|V|}{α(G)}$

🌐 Chromatic index: $χ'(G) ≥ Δ(G)$

🌐 Vertex connectivity: $κ(G) = \min_{S ⊆ V} \frac{|S|}{|V(G-S)|}$

🌐 Edge connectivity: $λ(G) = \min_{F ⊆ E} \frac{|F|}{|E(G-F)|}$

Graph Cycles and Distances

🌐 Girth: $g(G) = \min_{C ⊆ G} |C|$

🌐 Clique number: $ω(G) = \max_{K ⊆ G} |K|$

🌐 Independence number: $α(G) = \max_{I ⊆ G} |I|$

🌐 Degree sum formula: $\sum_{v \in V} \text{deg}(v) = 2m$

🌐 Eulerian circuit: $deg(v) = 2k_v, ∀v \in V$

🌐 Hamiltonian cycle: $∀S ⊆ V, |N(S)| ≥ |S|$

🌐 Graph diameter: $diam(G) = \max_{u,v \in V} dist(u,v)$

🌐 Graph radius: $rad(G) = \min_{v \in V} ecc(v)$

🌐 Eccentricity: $ecc(v) = \max_{u \in V} dist(v, u)$

🌐 Wiener index: $W(G) = \frac{1}{2}\sum_{u, v \in V} dist(u, v)$

Graph Matrices and Theorems

🌐 Laplacian matrix: $L(G) = D(G) - A(G)$

🌐 Kirchhoff's matrix theorem: $n \cdot det(L') = det(L)$

🌐 Graph density: $D(G) = \frac{2|E|}{|V|(|V|-1)}$

🌐 Bipartite graph: $χ(G) = 2$

🌐 Petersen's theorem: $k \text{-regular} \Rightarrow k \text{-edge-colorable}$

🌐 Ramsey's theorem: $R(s, t) = \min\{R(s-1, t), R(s, t-1)\} + 1$

🌐 Erdős-Stone theorem: $\lim_{n \to \infty} \frac{\text{ex}(n, H)}{\binom{n}{2}} = 1 - \frac{1}{χ'(H) - 1}$

🌐 Dirac's theorem: $δ(G) ≥ \frac{|V|}{2} \Rightarrow \text{Hamiltonian cycle}$

🌐 Ore's theorem: $deg(u) + deg(v) ≥ |V| \Rightarrow \text{Hamiltonian cycle}$

🌐 Graph complement: $\overline{G} = (V, \overline{E})$

🌐 Turán's theorem: $ex(n, K_r) = \left(1-\frac{1}{r-1}\right)\frac{n^2}{2}$

Graph Minors, Planarity, and Coloring

🌐 Graph minors: $G \prec H \Leftrightarrow \exists G' \in G, H' \in H: G' \cong H'$,

🌐 Kuratowski's theorem: $G \text{ planar} \Leftrightarrow K_5, K_{3,3} \nprec G$,

🌐 Vizing's theorem: $\Delta(G) \le \chi'(G) \le \Delta(G) + 1$,

🌐 Brooks' theorem: $\chi(G) \le \Delta(G) + 1$,

Matching and Connectivity

🌐 Hall's marriage theorem: $|N(S)| \ge |S| \Rightarrow \text{perfect matching}$,

🌐 Menger's theorem: $\lambda(G) = \min_{X \subseteq V} \frac{|X|}{\text{conn}(X)}$,

Probabilistic Methods and Algorithms

🌐 Lovász Local Lemma: $\Pr\left(\bigcap_{i=1}^n \overline{A_i}\right) > 0$,

🌐 Graph isomorphism: $G \cong H \Leftrightarrow A(G) \sim A(H)$,

🌐 Havel-Hakimi algorithm: $\text{realizable}(d) \Leftrightarrow \text{realizable}(d')$,

Topological and Spectral Properties

🌐 Euler's characteristic: $\chi(G) = |V| - |E| + |F|$,

🌐 Spectral radius: $\rho(G) = \max\{\lambda_i : \lambda_i \in \Lambda(G)\}$,

🌐 Chromatic polynomial: $P(G, k) = \sum_{i=0}^{|V|} a_i k^{n-i}$,

🌐 Tutte's theorem: $G \text{ perfect matching} \Leftrightarrow \text{odd}(G-S) \le |S|, \forall S \subseteq V$,

Small Angle Approximations

🌐 Small angle approximation: $\sin x \approx x$

🌐 Small angle cosine: $\cos x \approx 1 - \frac{x^2}{2}$

🌐 Small angle tangent: $\tan x \approx x$

Higher Order Approximations

🌐 Sine higher order: $\sin x \approx x - \frac{x^3}{3!} + \frac{x^5}{5!}$

🌐 Cosine higher order: $\cos x \approx 1 - \frac{x^2}{2!} + \frac{x^4}{4!}$

🌐 Exponential approximation: $e^x \approx 1 + x + \frac{x^2}{2}$

🌐 Exponential higher order: $e^x \approx \sum_{n=0}^\infty \frac{x^n}{n!}$

🌐 Natural logarithm approximation: $\ln(1+x) \approx x - \frac{x^2}{2} + \frac{x^3}{3}$

🌐 Natural logarithm higher order: $\ln(1+x) \approx \sum_{n=1}^\infty \frac{(-1)^{n+1}x^n}{n}$

🌐 Binomial approximation: $(1+x)^n \approx 1 + nx$

🌐 Stirling's approximation: $n! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n$

🌐 Central limit theorem: $\frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \approx N(0,1)$

🌐 Poisson approximation: $P(X=k) \approx e^{-\lambda} \frac{\lambda^k}{k!}$

🌐 Sigmoid approximation: $\frac{1}{1+e^{-x}} \approx \frac{1}{2} + \frac{x}{4} - \frac{x^3}{48} + \frac{x^5}{480}$

🌐 Arc length approximation: $L \approx \sum_{i=1}^n \sqrt{\Delta x_i^2 + \Delta y_i^2}$

🌐 Area under curve approximation: $A \approx \sum_{i=1}^n f(x_i)\Delta x$

🌐 Euler's method: $y_{n+1} \approx y_n + h \cdot f(x_n, y_n)$

🌐 Trapezoidal rule: $\int_a^b f(x) dx \approx \frac{1}{2}h\left[f(x_0) + 2f(x_1) + 2f(x_2) + \dots + f(x_n)\right]$

🌐 Simpson's rule: $\int_a^b f(x) dx \approx \frac{h}{3}\left[f(x_0) + 4f(x_1) + 2f(x_2) + \dots + f(x_n)\right]$

🌐 Bode's rule: $\int_a^b f(x) dx \approx \frac{2h}{45}\left[7f(x_0) + 32f(x_1) + 12f(x_2) + 32f(x_3) + 7f(x_4)\right]$

🌐 Gaussian quadrature: $\int_a^b f(x) dx \approx \sum_{i=1}^n w_i f(x_i)$

🌐 Newton-Cotes formulas: $\int_a^b f(x) dx \approx \sum_{i=0}^n \alpha_i f(x_i)$

🌐 Laplace's method: $\int e^{-Mf(x)} dx \approx \sqrt{\frac{2\pi}{Mf''(x_0)}}e^{-Mf(x_0)}$

🌐 L'Hôpital's rule: $\lim_{x\to a} \frac{f(x)}{g(x)} \approx \lim_{x\to a} \frac{f'(x)}{g'(x)}$

🌐 Bernoulli's inequality: $(1+x)^n \ge 1+nx$

🌐 Taylor series: $f(x) \approx f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \dots + \frac{f^{(n)}(a)}{n!}(x-a)^n$

🌐 Maclaurin series: $f(x) \approx f(0) + f'(0)x + \frac{f''(0)}{2!}x^2 + \dots + \frac{f^{(n)}(0)}{n!}x^n$

🌐 Power series: $f(x) \approx \sum_{n=0}^\infty a_n (x-c)^n$

🌐 Riemann sum: $\int_a^b f(x) dx \approx \sum_{i=1}^n f(x_i) \Delta x$

🌐 Linear interpolation: $y \approx y_1 + \frac{y_2 - y_1}{x_2 - x_1} (x - x_1)$

Interpolation and Approximation

🌐 Cubic spline interpolation: $S(x) = a_i + b_i(x-x_i) + c_i(x-x_i)^2 + d_i(x-x_i)^3$

🌐 Pade approximant: $R(x) = \frac{P_n(x)}{Q_m(x)}$

🌐 Continued fraction: $x \approx a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \ddots}}}$

Hyperbolic Functions and Rules of Thumb

🌐 Hyperbolic approximation: $\sinh x \approx x + \frac{x^3}{3!} + \frac{x^5}{5!} + \dots$

🌐 Hyperbolic cosine: $\cosh x \approx 1 + \frac{x^2}{2!} + \frac{x^4}{4!} + \dots$

🌐 Hyperbolic tangent: $\tanh x \approx x - \frac{x^3}{3} + \frac{2x^5}{15} - \frac{17x^7}{315}$

🌐 Rule of 72: $\text{Doubling Time} \approx \frac{72}{\text{Interest Rate}}$

Geometry and Statistics Relationships

🌐 Heron's formula: $A \approx \sqrt{s(s-a)(s-b)(s-c)}$

Geometry and Trigonometry

🌐 Pythagorean theorem: $a^2 + b^2 = c^2$

🌐 Slope-intercept form: $y = mx + b$

🌐 Sine function: $\sin{\theta} = \frac{opposite}{hypotenuse}$

🌐 Cosine function: $\cos{\theta} = \frac{adjacent}{hypotenuse}$

🌐 Tangent function: $\tan{\theta} = \frac{\sin{\theta}}{\cos{\theta}}$

🌐 Cosecant function: $\csc{\theta} = \frac{1}{\sin{\theta}}$

🌐 Secant function: $\sec{\theta} = \frac{1}{\cos{\theta}}$

🌐 Cotangent function: $\cot{\theta} = \frac{1}{\tan{\theta}}$

🌐 Law of sines: $\frac{\sin{A}}{a} = \frac{\sin{B}}{b} = \frac{\sin{C}}{c}$

🌐 Law of cosines: $c^2 = a^2 + b^2 - 2ab \cos{C}$

Calculus and Analysis

🌐 Euler's formula: $e^{ix} = \cos{x} + i\sin{x}$

Geometry and Measurement

🌐 Area of circle: $A = \pi r^2$

🌐 Circumference of circle: $C = 2\pi r$

🌐 Volume of sphere: $V = \frac{4}{3}\pi r^3$

🌐 Surface area of sphere: $A = 4\pi r^2$

Distance Metrics

🌐 Euclidean Distance: $d(x, y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}$

Distance Metrics and Similarity Measures

🌐 Minkowski Distance: $d(x, y) = \left(\sum_{i=1}^n |x_i - y_i|^p\right)^{1/p}$

🌐 Cosine Similarity: $\text{cos}(\theta) = \frac{x \cdot y}{\|x\| \|y\|}$

🌐 Jaccard Similarity: $J(A, B) = \frac{|A \cap B|}{|A \cup B|}$

🌐 Dot Product: $x \cdot y = \sum_{i=1}^n x_i y_i$

🌐 Cross Product: $x \times y = \begin{pmatrix} x_2 y_3 - x_3 y_2, x_3 y_1 - x_1 y_3, x_1 y_2 - x_2 y_1 \end{pmatrix}$

🌐 Angle Between Vectors: $\theta = \arccos{\frac{x \cdot y}{\|x\| \|y\|}}$

🌐 Orthogonal Vectors: $x \cdot y = 0$

🌐 Parallel Vectors: $x = ky$ for some scalar $k$

🌐 Cosine Similarity: $\text{sim}(x, y) = \frac{x^T y}{\|x\|_2 \|y\|_2}$

🌐 Euclidean Distance: $d(x, y) = \|x - y\|_2$

🌐 Manhattan Distance: $d(x, y) = \|x - y\|_1$

🌐 Minkowski Distance: $d(x, y) = \left(\sum_{i=1}^n \left| x_i - y_i \right|^p \right)^{\frac{1}{p}}$

🌐 Hamming Distance: $d(x, y) = \sum_{i=1}^n \mathbf{1}(x_i \neq y_i)$

🌐 Jaccard Similarity: $J(A, B) = \frac{|A \cap B|}{|A \cup B|}$

🌐 Dice Similarity: $D(A, B) = \frac{2 |A \cap B|}{|A| + |B|}$

🌐 Tanimoto Similarity: $T(A, B) = \frac{A \cdot B}{\|A\|^2 + \|B\|^2 - A \cdot B}$

🌐 Wasserstein Distance: $W(p, q) = \inf_{\gamma \in \Pi(p, q)} \int_{X \times X} c(x, y) d\gamma(x, y)$

🌐 Sinkhorn Distance: $\text{Sinkhorn}(C, \lambda) = \min_{P \in \text{Birk}(p, q)} \langle P, C \rangle - \lambda H(P)$