Skip to main content
  1. Others/

CheatsheetGPT

·106 mins·

I prompted GPT4 to help me create a cheatsheet for areas spanning AI, Computer Science, Finance, Operations Research, and Mathematics. Then, asked to grade the importance of each relationship for different people, and in some cases, to explain its answers.

Each Relationship might contain some explanation Click to expand
From a content's perspective, the goal is for this to serve as an expanding brainstorming hub and a point of reference. From a concept perspective, this post will hopefully convince you that there is incredible brainstorming potential by AI, and this is just a first taste. Click to expand
Due to hallucination issues, there are some errors. Goal is to improve the page by making it more comprehensive, correcting errors, and explaining all grades as well as explaining all relationships in a tooltip. Furthermore, ideally, a user can grade with an arbitrary query all equations, however this requires an API call to ChatGPT, which, the author, at the time of the writing did not have access to. Other issues include some inconsistency in grading and some redundancy or non ideal classification. Finally in terms of performance, this page can be optimized significantly. Contact me if you wish to contribute or would like access to the list of equations (c[lastname] at gmail). Click to expand

Loading might take some time. Alpha Version. Last Update: 06/24/2023.

Select an Option:

Grade Cutoff :





•••
Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data, minimizing the sum of squared residuals. Example: predicting house prices based on the number of rooms and area. Click to expand
Logistic regression is a generalized linear model used for binary classification, estimating the probability of an event occurring based on input features. Example: predicting whether a customer will make a purchase based on their browsing history. Click to expand
Support Vector Machine (SVM) is a supervised learning method used for classification and regression by finding the optimal hyperplane that maximizes the margin between two classes. Example: classifying handwritten digits. Click to expand
Perceptron is a simple binary classification algorithm that learns weights for input features to make a linear decision boundary. Example: classifying whether a fruit is an apple or an orange based on its color and shape. Click to expand
Click to expand

Naive Bayes

Click to expand
Click to expand
Click to expand
Click to expand

Tree-Based Models

Click to expand
Click to expand

Clustering

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Neural Networks

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Gradient Boosting and Ensemble

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Dimensionality Reduction

Click to expand
Click to expand

Classification Metrics

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Regression Metrics

Click to expand
Click to expand
Click to expand
Click to expand

Regularization Techniques

Click to expand
Click to expand
Click to expand
Click to expand

Overfitting Control

Click to expand
Kernel methods are a family of techniques that use kernel functions to transform data in order to improve the performance of machine learning algorithms. The kernel trick is a technique that allows for efficient computation of high-dimensional feature spaces without explicitly mapping the data points. Radial Basis Function (RBF) is a popular kernel used in SVM and other algorithms. Kernel methods are important because they enable learning complex decision boundaries in high-dimensional spaces, allowing for better performance in tasks such as image classification, text analysis, and bioinformatics.
Kernel Methods Click to expand
Click to expand
Click to expand

Regression Models

Click to expand

Classification Models

Click to expand
Click to expand
Click to expand
Click to expand

Unsupervised Learning Models

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Collaborative Filtering

Time Series Analysis

Recommendation Systems

Generative Models

Natural Language Processing

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Cluster Evaluation

Click to expand
Feature engineering is the process of creating new features or transforming existing features to improve the performance of machine learning models. This is important because it can help uncover hidden patterns in the data, reduce dimensionality, and improve model interpretability. Feature engineering can involve techniques such as normalization, scaling, one-hot encoding, and feature selection. Examples include creating interaction terms in linear regression or using word embeddings in natural language processing tasks.
Feature Engineering Click to expand
Click to expand
Click to expand
Click to expand
Model evaluation is the process of measuring the performance of a machine learning model to determine its effectiveness and generalization capabilities. Techniques such as cross-validation, hold-out validation, and bootstrapping help in estimating the model's performance on unseen data. Model evaluation is crucial for selecting the best model, understanding its limitations, and fine-tuning hyperparameters. Metrics like accuracy, F1-score, AUC-ROC, and mean squared error are used to quantify model performance.
Model Evaluation Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Activation Functions

Click to expand
Click to expand

Feature Importance

Click to expand

Sequence Models

Click to expand
Loss functions are used to measure the difference between a machine learning model's predictions and the actual values. They are essential for evaluating and optimizing models during the training process. Different loss functions, such as mean squared error, mean squared logarithmic error, Kullback-Leibler divergence, and categorical cross-entropy, are used depending on the problem type (e.g., regression, classification, or generative tasks). Selecting an appropriate loss function is crucial for achieving good model performance and convergence during training.
Loss Functions Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Graph Embeddings

Click to expand
Click to expand
Click to expand
Common ML optimizers are algorithms used to update model parameters during the training process, with the goal of minimizing the loss function. Examples include AdaGrad, Adam, and RMSProp. Optimizers are essential for efficient training of machine learning models, as they determine the convergence rate and can help avoid local minima. Choosing the right optimizer can significantly affect the performance of a model, and researchers often experiment with different optimizers to achieve the best results.
Common ML Optimizers Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Evaluation Metrics

Click to expand
Click to expand

Normalization

Click to expand
Data augmentation is the process of creating new training samples by applying various transformations to the original dataset. This can include techniques such as rotation, scaling, flipping, or adding noise. Data augmentation is important because it helps improve the performance and generalization capabilities of machine learning models, particularly in cases with limited training data. It is widely used in image classification, speech recognition, and natural language processing tasks to increase the diversity of the training set and reduce overfitting.
Data Augmentation Click to expand
Hyperparameter tuning is the process of optimizing the configuration parameters of a machine learning model to improve its performance. Hyperparameters control aspects such as the learning rate, regularization strength, and model architecture. Hyperparameter tuning is important because it can significantly affect the performance of a model and help prevent overfitting or underfitting. Techniques for hyperparameter tuning include grid search, random search, and Bayesian optimization. By carefully tuning hyperparameters, researchers can achieve better model performance and generalization capabilities on unseen data.
Hyperparameter Tuning Click to expand

Others

Click to expand
Click to expand
Click to expand
Click to expand
explanation here
Value Functions and Bellman Equations Click to expand
where $S$ is the state space, $A$ is the action space, $P$ is the state transition function, $R$ is the reward function, and $\gamma$ is the discount factor Click to expand
The state transition function, represented by $P(s'|s, a)$, describes the probability of transitioning from the current state $s$ to the next state $s'$ given the action $a$ is taken. Click to expand
The reward function, $R(s, a, s')$ or $R(s, a)$, is used to define the immediate reward received when transitioning from state $s$ to state $s'$ after taking action $a$. It can be state-action or state-action-next state dependent. Click to expand
The state-value function, $V^\pi(s)$, represents the expected cumulative reward starting from state $s$ and following policy $\pi$. The expectation is taken over the random rewards $R_t$ and initial state $S_0 = s$. Click to expand
The action-value function, $Q^\pi(s, a)$, represents the expected cumulative reward starting from state $s$, taking action $a$, and then following policy $\pi$. The expectation is taken over the random rewards $R_t$, initial state $S_0 = s$, and initial action $A_0 = a$. Click to expand
The Bellman equation for $V^\pi$ expresses the state-value function recursively in terms of the expected immediate reward and the expected future value. It emphasizes the relationship between the value of a state and the values of its successor states under policy $\pi$. Click to expand
The Bellman equation for $Q^\pi$ expresses the action-value function recursively in terms of the expected immediate reward and the expected future value. It emphasizes the relationship between the value of taking an action in a state and the values of the successor states under policy $\pi$. Click to expand
Optimal state-value function: In Reinforcement Learning (RL), the optimal state-value function, denoted as V*(s), represents the highest expected cumulative reward an agent can achieve from a given state s, considering all possible actions. This relation demonstrates that the optimal value of a state is equal to the maximum value over all possible actions of the optimal action-value function, Q*(s, a). It is important because it helps in finding the best policy, which guides an agent in maximizing the expected cumulative reward. Click to expand
Optimal action-value function: In RL, the optimal action-value function, denoted as Q*(s, a), represents the highest expected cumulative reward an agent can achieve from a given state s, taking a particular action a and following the best policy thereafter. This relation demonstrates how to compute Q*(s, a) by considering the probability of transitioning to a new state s', the immediate reward R(s, a, s'), and the discounted future rewards. It is crucial in finding the best policy to guide the agent's decisions. Click to expand

Policy Improvement

Policy improvement: π'(s) = argmax_a Q^π(s, a) is the process of finding the best action for each state according to the current value function Q^π(s, a). It is important as it helps to improve the policy by selecting actions that lead to the highest expected return. For example, in a maze-solving task, policy improvement would choose the action that maximizes the probability of reaching the goal in the shortest number of steps. Click to expand
Policy iteration: (1) Policy Evaluation → (2) Policy Improvement → (3) Repeat until convergence is a two-step process in reinforcement learning used to find the optimal policy. It alternates between evaluating the current policy and improving it. This process is important because it guarantees convergence to the optimal policy. For example, in a chess game, policy iteration can be used to train an agent that learns to play better over time by refining its policy. Click to expand

Value-based Algorithms

Value iteration calculates the optimal state-value function by iteratively updating the values of all states. It's important because it converges to the optimal policy when given a perfect model of the environment. Max function ensures the agent selects the best action for each state. Click to expand
Temporal Difference (TD) learning updates state-value estimates using the difference between actual and expected rewards. It combines ideas of Monte Carlo and Dynamic Programming, making it suitable for learning online and without a complete model of the environment. Click to expand
Q-learning is an off-policy TD learning method for estimating action-value functions. It converges to the optimal policy even when following an exploratory policy. It's widely used in RL since it doesn't require a model of the environment and learns online. Click to expand
SARSA is an on-policy TD learning method for estimating action-value functions. It updates the Q-values using the next state-action pair, making it sensitive to the policy being followed. It's useful when the agent needs to learn an optimal policy while acting in the environment. Click to expand

Deep RL Algorithms

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Policy Gradient Algorithms

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Deterministic Policy Gradient

Reinforcement Learning For Games

Click to expand

Exploration Strategies

Click to expand
Click to expand

Bandit Algorithms

In the Multi-Armed Bandit problem, $Q_t(a)$ is the estimated value of action $a$, $c$ is the exploration parameter, $t$ is the current time step, and $N_t(a)$ is the number of times action $a$ has been selected up to time $t$. The algorithm chooses the action that maximizes the sum of the estimated value and the exploration bonus. Click to expand
Upper Confidence Bound (UCB) is an algorithm for the Multi-Armed Bandit problem that chooses the action with the highest upper confidence bound on the mean reward. $\hat{\mu}a$ is the empirical mean reward of action $a$, $t$ is the current time step, and $n_a$ is the number of times action $a$ has been selected up to time $t$. Click to expand
Thompson Sampling is a probabilistic algorithm for the Multi-Armed Bandit problem that selects actions based on samples from their posterior distributions. $\theta^\star_a$ is a sample from the Beta distribution with parameters $\alpha_a$ and $\beta_a$ for action $a$. Click to expand
In the Contextual Bandit problem, the algorithm chooses the action that maximizes the inner product of the context vector $x_{t, a}$ and the weight vector $\theta$. This allows for better action selection by considering contextual information. Click to expand
Linear UCB is an algorithm for Contextual Bandit problem that selects actions based on a linear combination of the context vector $x_{t, a}$ and weight vector $\theta$. The algorithm also considers exploration by incorporating a confidence term that depends on the context vector and a matrix $V$. Click to expand
LinUCB is similar to Linear UCB but uses an individual matrix $A_a$ for each action $a$ instead of a shared matrix $V$. This allows for better exploration by considering the uncertainty associated with each action separately. Click to expand
Click to expand

Advanced RL Algorithms

Click to expand
Click to expand
Click to expand
Click to expand

Temporal Difference Variants

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Other

Optimization Problems

Optimization problem: Minimizing a function f(x) over a set X. Click to expand
Feasible set: Represents the set of all possible solutions that satisfy the constraints of an optimization problem. Click to expand
Linear Programming: Optimization of a linear objective function subject to linear constraints. Click to expand
Quadratic Programming: Optimization of a quadratic objective function subject to linear constraints. Click to expand
Constrained Optimization: Minimizing a function subject to inequality and equality constraints. Click to expand
Lagrangian and duality: Methods for solving constrained optimization problems
Lagrangian and duality Click to expand
Lagrangian: Combines the objective function and constraints into a single function. Click to expand
Lagrange multipliers: Provide a method for finding extrema of a function subject to constraints. Click to expand
Convexity: A property of functions and optimization problems that guarantees a unique global minimum
Convexity Click to expand
Convex function: A function whose line segment between any two points lies above the graph of the function. Click to expand
Concave function: A function whose line segment between any two points lies below the graph of the function. Click to expand
Optimality Conditions: Conditions that characterize the solution of an optimization problem
Optimality Conditions Click to expand
First-order condition: The gradient of the function at the optimal point is zero. Click to expand
Second-order condition: The Hessian matrix at the optimal point is positive definite. Click to expand
KKT conditions: Necessary and sufficient conditions for a constrained optimization problem to have an optimal solution. Click to expand
Gradient and Hessian
Gradient and Hessian Click to expand
Gradient represents the first-order partial derivatives of a function, while Hessian is the matrix of second-order partial derivatives. They are useful in optimization problems. Click to expand
Optimization Algorithms
Optimization Algorithms Click to expand
Newton's method uses the inverse Hessian to update the solution iteratively, which can converge quickly for smooth functions. Click to expand
Quasi-Newton methods approximate the inverse Hessian using a matrix B_k, reducing computational complexity compared to Newton's method. Click to expand
Conjugate gradient is an iterative optimization algorithm that uses conjugate search directions to minimize a quadratic function efficiently. Click to expand
Descent and search methods
Descent and search methods Click to expand
Steepest descent is an optimization algorithm that uses the negative gradient as the search direction, which is the direction of the fastest decrease in function value. Click to expand
Line search aims to find an optimal step size that minimizes the function value along a given search direction. Click to expand
Armijo rule is a line search condition that ensures sufficient decrease in the function value along the search direction. Click to expand
Wolfe conditions combine the Armijo rule with a curvature condition to ensure both sufficient decrease and sufficient curvature in the function value along the search direction. Click to expand
Golden section search is an optimization method for unimodal functions that uses the golden ratio to divide the search interval. Click to expand
Bisection method is an optimization algorithm that iteratively divides the search interval in half to find the minimum of a unimodal function. Click to expand
Root-finding and fixed-point methods
Root-finding and fixed-point methods Click to expand
Secant method iteratively refines the root approximation by using previous iterations' function values and approximations. Click to expand
Fixed-point iteration converges to a solution when the function g has a fixed point for the given initial value. Click to expand
Banach fixed-point theorem provides conditions for the existence and uniqueness of fixed points for contraction mappings. Click to expand
Lipschitz constant measures the amount by which a function can stretch or compress distances between points. Click to expand
Iterative methods and approximations
Iterative methods and approximations Click to expand
Successive approximations method is an iterative way to refine an initial guess to find the solution of an equation. Click to expand
Perturbed optimization adds a small perturbation to the objective function to smoothen the optimization landscape. Click to expand
Homotopy and Saddle Point methods
Homotopy and Saddle Point methods Click to expand
Homotopy method is a technique that deforms one optimization problem into another, making it easier to solve. Click to expand
Saddle point is a point where the gradient of the Lagrangian function with respect to primal and dual variables is zero. Click to expand
Primal-dual method is an optimization algorithm that updates primal and dual variables iteratively to find the saddle point of the Lagrangian. Click to expand
Penalty and Barrier Methods
Penalty and Barrier Methods Click to expand
Penalty function adds a penalty term to the objective function for constraint violations, making it easier to solve. Click to expand
Barrier function adds a barrier term to the objective function, preventing the solution from crossing constraint boundaries. Click to expand
ADMM
ADMM Click to expand
ADMM is an optimization algorithm that combines splitting and dual ascent methods to solve large-scale convex optimization problems. Click to expand
Proximal Gradient Methods
Proximal Gradient Methods Click to expand
Proximal gradient is an iterative optimization algorithm that finds the minimum of a function f(x) plus a regularization term h(x). Proximal operator finds the point y that minimizes the sum of h(y) and the squared distance to x. Click to expand
Proximal operator is used to find a point y that minimizes the sum of a function h(y) and the squared distance between y and x. Click to expand
FISTA update is part of the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), used for accelerated gradient descent with proximal operators. Click to expand
FISTA update is part of the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA), used for accelerated gradient descent with proximal operators. Click to expand
ISTA update is part of the Iterative Shrinkage-Thresholding Algorithm (ISTA), used for solving optimization problems with non-differentiable functions using proximal operators. Click to expand
Nesterov acceleration improves the convergence rate of gradient-based optimization algorithms using a momentum term. Click to expand
Nesterov momentum is a technique to improve the convergence rate of gradient-based optimization algorithms by incorporating a momentum term. Click to expand
Linear Minimization and Frank-Wolfe Method
Linear Minimization and Frank-Wolfe Method Click to expand
Frank-Wolfe method is an iterative algorithm for solving constrained convex optimization problems by minimizing a linear approximation of the objective function at each step. Click to expand
Frank-Wolfe update is the step in the Frank-Wolfe method that updates the solution by combining the current solution and a step towards the linear minimizer. Click to expand
Convergence and subdifferential calculus
Convergence and subdifferential calculus Click to expand
Convergence rate measures how fast an iterative optimization algorithm approaches the optimal solution. Click to expand
Subdifferential is a generalization of gradient for non-differentiable functions, containing all subgradients at a given point. Click to expand
Subgradient method is an iterative optimization algorithm for non-differentiable convex functions using subgradients instead of gradients. Click to expand
Projected subgradient is a variant of the subgradient method that projects the updated solution back onto the feasible set at each iteration. Click to expand
Concave-convex procedure is an iterative algorithm for solving non-convex optimization problems by linearizing the concave part of the objective function. Click to expand
Smoothing approximation is a technique to approximate a non-differentiable function with a smooth function by minimizing the sum of the original function and a squared distance term. Click to expand
Augmented Lagrangian is a technique for solving constrained optimization problems by adding a penalty term to the objective function that includes both Lagrange multipliers and constraint violation penalties. Click to expand
Projected gradient is an iterative algorithm for solving constrained optimization problems by projecting the updated solution back onto the feasible set at each iteration after applying a gradient step. Click to expand
The Time Value of money acknowledges that a dollar received today is worth more than a dollar received in the future due to its potential to earn interest. This concept helps evaluate the profitability of projects, and quantitatively assess the benefits and risks of different investment opportunities, leading to more informed financial decisions.
Time Value Click to expand
Compound interest formula calculates the total amount after a given time period, considering both principal and interest. Click to expand
Continuous compounding formula calculates the total amount after a given time period, assuming continuous interest compounding. Click to expand
Present value formula calculates the current worth of a future sum of money or cash flow, discounted by an interest rate. Click to expand
Future value formula calculates the value of an investment at a specified date in the future, considering an interest rate. Click to expand
Simple interest formula calculates the interest earned on a principal amount over a specified time period at a given rate. Click to expand
Annuity formula calculates the present value of a series of equal payments made at regular intervals, discounted by an interest rate. Click to expand
Perpetuity formula calculates the present value of an infinite series of equal payments made at regular intervals, discounted by an interest rate. Click to expand
Options pricing models and formulas used to estimate the fair value of options contracts.
Options Click to expand
Black model is an option pricing model for futures options that uses a log-normal distribution assumption. Click to expand
Real options valuation is a method for valuing investments with embedded real options by adapting option pricing models to account for uncertainty and flexibility. Click to expand
Black-Scholes-Merton model is an option pricing model for European options on stocks that pays dividends and uses a log-normal distribution assumption. Click to expand
Black-Scholes-Merton put formula calculates the fair value of a European put option on a stock that pays dividends using the Black-Scholes-Merton model. Click to expand
Black-Scholes formula calculates the fair value of a European call option on a stock using the Black-Scholes model, assuming no dividends. Click to expand
Black-Scholes model calculates the fair value of a European call option on a stock using the Black-Scholes model, assuming no dividends. Click to expand
Black-Scholes put formula calculates the fair value of a European put option on a stock using the Black-Scholes model, assuming no dividends. Click to expand
d_1 is a component of the Black-Scholes option pricing model, which represents the expected return on the option. Click to expand
d_2 is another component of the Black-Scholes option pricing model, representing the risk-adjusted probability that the option will be exercised. Click to expand

Greeks

Call rho measures the sensitivity of a call option's price to changes in interest rates. Click to expand
Put rho measures the sensitivity of a put option's price to changes in interest rates. Click to expand
Put-call parity is a fundamental principle relating the prices of European call options and put options with the same strike price and expiration date. Click to expand
Binomial option pricing model calculates the fair value of an option using a discrete-time binomial tree approach, modeling the possible future stock prices over time. Click to expand
Risk-neutral probability is the probability measure used in the binomial option pricing model to value options under the assumption of risk neutrality. Click to expand
Implied volatility is the volatility implied by the market price of an option, calculated using an option pricing model such as Black-Scholes. Click to expand
Option elasticity measures the sensitivity of an option's price to changes in the underlying stock price, also known as the option's leverage. Click to expand
Put-call ratio is a market sentiment indicator that compares the trading volume of put options to call options. Click to expand
Moneyness is a measure of how far an option is in or out of the money, calculated as the ratio of the underlying stock price to the option's strike price. Click to expand
Greeks neutralization is a risk management technique that involves offsetting the risk exposure of an options position by adjusting the position's Greeks to zero or near-zero values. Click to expand
Call delta measures the sensitivity of a call option's price to changes in the underlying stock price. Click to expand
Put delta represents the rate of change of a put option's price with respect to the underlying asset's price. It is useful for hedging and managing risk in options trading. Click to expand
Call gamma measures the rate of change of an option's delta with respect to the underlying asset's price. It helps traders manage risk and adjust their positions accordingly. Click to expand
Put gamma is equal to call gamma and represents the sensitivity of an option's delta to changes in the underlying asset's price. It is essential for managing risk in options trading. Click to expand
Call theta measures the rate of change of a call option's price with respect to time. It is important for understanding time decay in options trading and managing risk. Click to expand
Put theta measures the rate of change of a put option's price with respect to time. It is important for understanding time decay in options trading and managing risk. Click to expand
Call vega represents the sensitivity of an option's price to changes in the underlying asset's volatility. It is important for managing risk and adjusting positions in options trading. Click to expand
Put vega is equal to call vega and represents the sensitivity of an option's price to changes in the underlying asset's volatility. It is important for managing risk and adjusting positions in options trading. Click to expand
Risk Measures help identify and quantify the potential upsides and downsides of investments. By understanding the balance between risk and reward, investors can make informed decisions on which investments suit their risk tolerance and goals. These measures allow for better asset allocation and risk management, ultimately improving the overall performance of a portfolio.
Risk Measures Click to expand
The GARCH model estimates the volatility of financial assets and is useful for risk management, portfolio optimization, and option pricing. Click to expand
The risk-adjusted return measures the excess return per unit of risk taken by an investor. It is useful for comparing investment opportunities with different levels of risk. Click to expand
The Sharpe ratio measures the excess return per unit of risk (standard deviation) taken by an investor. It is widely used for comparing investment opportunities and evaluating portfolio performance. Click to expand
The Sortino ratio measures the excess return per unit of downside risk (downside deviation). It is useful for comparing investment opportunities with different levels of downside risk. Click to expand
The Treynor ratio measures the excess return per unit of systematic risk (beta) taken by an investor. It is useful for comparing investment opportunities with different levels of systematic risk. Click to expand
The Information ratio measures the excess return of a portfolio relative to its benchmark, adjusted for tracking error. It helps evaluate the skill of active managers and compare investment strategies. Click to expand
Jensen's alpha measures the abnormal return of a portfolio or security relative to its expected return based on CAPM. It helps evaluate the skill of active managers and identify outperforming investments. Click to expand
Portfolio Management concepts allow investors to understand how diversification reduces risk, and how different assets interact within a portfolio. By applying these concepts, investors can optimize their portfolios to achieve the best possible return for a given level of risk, leading to more efficient capital allocation and improved long-term financial performance.
Portfolio Management Click to expand
The CAPM formula estimates the expected return of an asset based on its beta and the expected market return. It is widely used for portfolio management, asset pricing, and risk management. Click to expand
The Sharpe ratio measures the excess return per unit of risk (standard deviation) taken by an investor. It is widely used for comparing investment opportunities and evaluating portfolio performance. Click to expand
The covariance of assets formula calculates the beta of an asset, which measures its sensitivity to market movements. It is important for understanding systematic risk and asset pricing. Click to expand
The efficient frontier represents the set of portfolios with the highest expected return for a given level of risk. It is essential for portfolio optimization and asset allocation decisions. Click to expand
The Markowitz portfolio formula calculates the optimal weights for assets in a portfolio to maximize expected return for a given level of risk. It is fundamental for modern portfolio theory and asset allocation. Click to expand
The covariance matrix represents the variances and covariances between assets in a portfolio. It is important for understanding the risk and diversification effects within a portfolio. Click to expand
The Gordon growth model estimates the value of a stock based on its expected future dividends, assuming a constant growth rate. It is useful for stock valuation and investment decisions. Click to expand

Corporate Finance

Other

The dividend discount model estimates the value of a stock based on the present value of its expected future dividends. It is useful for stock valuation and investment decisions. Click to expand
IRR computes the discount rate at which NPV equals zero. Click to expand
NPV calculates the present value of cash flows discounted at a given rate r. Click to expand
WACC calculates the weighted average cost of capital for a firm's financing sources. Click to expand
Dividend payout ratio measures the proportion of net income paid out as dividends. Click to expand
Retention ratio calculates the proportion of net income retained by a company for reinvestment. Click to expand
Merton's model computes the distance to default for a firm using its asset value V, debt D, volatility σ_V, and time T. Click to expand
Risk-neutral density represents an asset's probability distribution under risk-neutral pricing. Click to expand
Bond Pricing and Duration
Bond Pricing and Duration Click to expand
Effective duration measures bond price sensitivity to changes in yield, using the bond's present value (V) and yield change (∆y). Click to expand
Effective convexity measures the curvature of the bond price-yield relationship, using the bond's present value (V) and yield change (∆y). Click to expand
Modified duration adjusts Macaulay duration for changes in yield to maturity (YTM) and compounding frequency (m). Click to expand
Macaulay duration calculates the weighted average time to receive a bond's cash flows, using the present value of cash flows (PVCF) and bond value (V). Click to expand
DV01 measures a bond's price change for a 1 basis point change in yield. Click to expand
Convexity adjustment accounts for the curvature of the bond price-yield relationship when estimating price changes due to yield changes. Click to expand
Bond pricing calculates the present value of a bond's cash flows (CF) discounted at its yield to maturity (YTM). Click to expand
Fixed Income
Fixed Income Click to expand
Option adjusted spread (OAS) measures the spread between a security's yield and the risk-free rate, adjusted for embedded options. Click to expand
Z-spread measures the constant spread added to each point on the risk-free yield curve to match a bond's present value with its market price. Click to expand
Forward rate calculates the interest rate for a future period based on current spot rates (P(t1), P(t2)). Click to expand
Swap rate calculates the fixed rate exchanged for a floating rate in an interest rate swap, using the present value of cash flows (L) and discount factors (P). Click to expand
Futures price calculates the price of a futures contract based on the underlying asset's price (S), risk-free rate (r), dividend yield (q), and time to maturity (T-t). Click to expand
Caplet pricing calculates the value of an interest rate caplet using the zero-coupon bond prices (B), strike rate (K), and standard normal cumulative distribution function (N). Click to expand
Floorlet pricing calculates the value of an interest rate floorlet using the zero-coupon bond prices (B), strike rate (K), and standard normal cumulative distribution function (N). Click to expand
Swaption pricing calculates the value of a swaption using the swap rate (S), strike rate (K), and standard normal cumulative distribution function (N). Click to expand
CDS pricing calculates the ratio of present value of protection payments to present value of premium payments in a credit default swap. Click to expand
Derivative Pricing Models
Derivative Pricing Models Click to expand
Cox-Ingersoll-Ross model is a mean-reverting short rate model with stochastic volatility. Click to expand
Vasicek model is a mean-reverting short rate model with constant volatility. Click to expand
Hull-White model is a mean-reverting short rate model that allows time-varying mean reversion level. Click to expand
Constant elasticity of variance (CEV) model is a stochastic volatility model where volatility is proportional to the asset price raised to a power γ. Click to expand
Heston model is a stochastic volatility model that incorporates both asset price dynamics and volatility dynamics using two correlated Wiener processes (dWt^1, dWt^2). Click to expand
Bachelier model is an option pricing model that assumes the underlying asset follows a normal distribution, with a standard normal cumulative distribution function (N). Click to expand
Chen model is a stochastic volatility model where volatility is proportional to the asset price raised to a power γ. Click to expand
Girsanov theorem is used to change the probability measure of a stochastic process, allowing for risk-neutral pricing of derivatives. Click to expand
Martingale pricing calculates the present value of a derivative under the risk-neutral measure (𝔼^𝕼) using the filtration 𝔽t. Click to expand
Breeden-Litzenberger formula relates the second derivative of an option price with respect to the strike price to the risk-neutral density of the underlying asset. Click to expand
Basic counting principles are fundamental techniques in combinatorics for calculating the number of possible arrangements, selections, or combinations of objects. These principles include factorials, permutations, combinations, the binomial theorem, Pascal's triangle, and Vandermonde's identity. Basic counting principles have numerous applications across various fields, including computer science, cryptography, and statistical analysis. In machine learning, they are used for tasks such as feature selection, encoding categorical variables, and sampling methods. In finance, basic counting principles are applied in portfolio optimization, risk management, and the analysis of combinations of financial instruments.
Basic Counting Principles Click to expand
Factorial represents the product of all positive integers up to n. It is used in combinatorics, probability, and many mathematical formulas. For example, calculating the number of ways to arrange 5 books on a shelf: 5! = 5 × 4 × 3 × 2 × 1 = 120. Click to expand
Permutations represent the number of ways to arrange r elements out of n. It is used to calculate order-sensitive arrangements. For example, finding the number of ways to arrange 3 letters out of 5: $_5P_3 = \frac{5!}{(5-3)!} = 60$. Click to expand
Combinations represent the number of ways to choose r elements from n without considering the order. It is used in combinatorics and probability theory. For example, finding the number of ways to choose 3 people from a group of 5: $_5C_3 = \frac{5!}{3!(5-3)!} = 10$. Click to expand
The Binomial theorem expresses the expansion of powers of a binomial. It is used in algebra, combinatorics, and probability theory. For example, expanding $(a+b)^3 = {3 \choose 0}a^3b^0 + {3 \choose 1}a^2b^1 + {3 \choose 2}a^1b^2 + {3 \choose 3}a^0b^3$. Click to expand
Pascal's triangle is a triangular array of binomial coefficients. Each number is the sum of the two numbers directly above it. It is used in combinatorics, algebra, and probability theory. For example, the 4th row of Pascal's triangle is: 1, 4, 6, 4, 1. Click to expand
Vandermonde's identity is a combinatorial identity relating binomial coefficients. It is used in combinatorics, algebra, and probability theory. For example, Vandermonde's identity for $m=3, n=2, r=3$ is: $\sum_{k=0}^3 {3 \choose k} {2 \choose 3-k} = {5 \choose 3}$. Click to expand

Advanced Counting

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Generating Functions

Click to expand
Click to expand
Click to expand
Click to expand

Special Numbers

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Number Theory

Click to expand
Click to expand

Combinatorial Enumeration

Click to expand
Click to expand

Graph Combinatorics

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand

Continuous Distributions

PDF's integral over the entire domain equals 1, indicating that the total probability of all possible outcomes is 100%. Click to expand
Gives the probability that a random variable is less than or equal to a certain value. They are always non-decreasing and have a range between 0 and 1, inclusive. Click to expand
Models the time between events in a Poisson process. An interesting fact about the exponential distribution is that it has the memoryless property, meaning the remaining time until the next event is independent of the time elapsed since the last event. Click to expand
Represents the Gamma distribution, which is a flexible continuous probability distribution used to model waiting times between events. An interesting property of the Gamma distribution is that it's a conjugate prior for the exponential distribution's rate parameter. Click to expand
It is a continuous probability distribution on the interval [0, 1]. The Beta distribution is the conjugate prior for the Bernoulli and binomial distributions' success probability parameter. Click to expand
Chi-squared distribution: The equation $f(x) = \frac{1}{2^{k/2}\Gamma(k/2)}x^{(k/2)-1}e^{-x/2}$ defines the Chi-squared distribution, which is widely used in hypothesis testing and the construction of confidence intervals. An interesting property of the Chi-squared distribution is that it's a special case of the Gamma distribution. Click to expand
F-distribution: The equation $f(x) = \frac{\Gamma(\frac{v_1+v_2}{2})}{\Gamma(\frac{v_1}{2})\Gamma(\frac{v_2}{2})}(\frac{v_1}{v_2})^{\frac{v_1}{2}}\frac{x^{\frac{v_1}{2}-1}}{(1+\frac{v_1x}{v_2})^{\frac{v_1+v_2}{2}}}$ represents the F-distribution, which is a continuous probability distribution used in hypothesis testing, particularly when comparing the variances of two populations. It is derived from the ratio of two independent Chi-squared distributions. Click to expand
Discrete distributions are probability distributions defined on a countable set of values, such as integers or discrete outcomes. Common discrete distributions include the Bernoulli, Binomial, Poisson, and Geometric distributions. Discrete distributions are essential for modeling random processes and events in various fields, including computer science, statistics, and operations research. In machine learning, discrete distributions are used in tasks such as classification, natural language processing, and reinforcement learning. In finance, they are employed to model discrete events, such as stock price jumps, option exercise, or credit defaults.
Discrete Distributions Click to expand
Click to expand
Models the number of successes in a fixed number of Bernoulli trials. It is often used in finance to model the number of successful outcomes over a given time period. Click to expand
Models the number of events occurring in a fixed interval of time or space. It is often used in finance to model rare events, such as jumps in asset prices or the number of defaults in a portfolio. Click to expand
Models the number of Bernoulli trials required for the first success. It is often used in finance to model the time until the occurrence of an event, such as a stock price reaching a certain level. Click to expand
Models the number of Bernoulli trials required for the r-th success. It generalizes the Geometric distribution and is often used in finance to model the time until the occurrence of multiple events, such as a certain number of defaults in a credit portfolio. Click to expand

Probability Concepts

Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
The Wiener process, also known as standard Brownian motion, is a continuous-time stochastic process with independent increments. It is used in finance for modeling stock prices and interest rates. For example, simulating a stock price path using a Wiener process and Euler-Maruyama method. Click to expand
Brownian motion is a continuous-time stochastic process that models the random motion of particles. It is a fundamental concept in finance for option pricing and risk management. For example, simulating stock prices using geometric Brownian motion, a form of Brownian motion. Click to expand
Geometric Brownian motion is a continuous-time stochastic process used to model asset prices, such as stock prices. It incorporates drift and volatility. For example, the Black-Scholes option pricing model is based on geometric Brownian motion. Click to expand
The Ornstein-Uhlenbeck process is a continuous-time stochastic process that models mean-reverting processes. It is used in finance for interest rate modeling and volatility modeling. For example, the Vasicek model for interest rates is based on the Ornstein-Uhlenbeck process. Click to expand
Ito's lemma is a fundamental result in stochastic calculus that extends the chain rule to stochastic processes. It is used in finance for option pricing, risk management, and stochastic modeling. For example, Ito's lemma is used to derive the Black-Scholes PDE for option pricing. Click to expand
Martingales are processes where future values are equal to current values on average, often used in probability, finance, and statistical modeling. Click to expand
The Optional Stopping Theorem relates the expected value of a martingale at a stopping time to its initial value, and is used in various problems such as gambling, option pricing, and risk management. Click to expand
Doob's Martingale Inequality provides an upper bound on the probability of a martingale exceeding a given level, useful in establishing convergence and understanding tail probabilities. Click to expand
Doob-Meyer Decomposition is a unique decomposition of a submartingale into a martingale and a predictable, increasing process, and is used to study various properties of stochastic processes. Click to expand
Radon-Nikodym derivative: an importance sampling technique in probability theory that expresses one probability measure in terms of another. It is used to change measures in stochastic processes, e.g., to price derivative securities under risk-neutral probabilities. The Girsanov theorem is a consequence of the Radon-Nikodym derivative. Click to expand
Girsanov's theorem: a result in stochastic calculus that allows the transformation of a probability measure into another under which the Brownian motion is altered by a drift term. It is widely used in mathematical finance to switch between the real-world and risk-neutral probabilities, enabling the pricing of derivative securities. Click to expand
Feynman-Kac formula: a result connecting solutions to parabolic partial differential equations with expected values of functionals of stochastic processes. It is often used to solve pricing problems in finance, such as the Black-Scholes equation for option pricing, by transforming them into expectations of discounted payoffs under a specific probability measure. Click to expand
Markov property: In the given relationship, $\mathbb{P}(X_{t+1} | X_t, X_{t-1}, \dots, X_0) = \mathbb{P}(X_{t+1} | X_t)$, the Markov property states that the probability of the future state $X_{t+1}$, given all past states, depends only on the current state $X_t$. This property simplifies the analysis of stochastic processes and is widely used in various applications, including finance, AI, and more. Examples include Markov chains and Markov decision processes. Click to expand
Poisson process: In the relationship $\mathbb{P}(N(t+dt) - N(t) = 1) = \lambda dt + o(dt)$, a Poisson process is a counting process that models the number of events occurring in a fixed interval of time or space. The process has a constant rate $\lambda$, and events occur independently. Examples include the number of phone calls arriving at a call center or the number of particles emitted by a radioactive source. Click to expand
Exponential inter-arrival times: In the formula $f_T(t) = \lambda e^{-\lambda t}$, $f_T(t)$ represents the probability density function of the time between consecutive events in a Poisson process, with $\lambda$ being the arrival rate of events. Exponential inter-arrival times are memoryless, meaning the time until the next event is independent of past events. Examples include the time between customer arrivals at a store or the time between machine failures. Click to expand
Exponential inter-arrival times: In the formula $f_T(t) = \lambda e^{-\lambda t}$, $f_T(t)$ represents the probability density function of the time between consecutive events in a Poisson process, with $\lambda$ being the arrival rate of events. Exponential inter-arrival times are memoryless, meaning the time until the next event is independent of past events. Examples include the time between customer arrivals at a store or the time between machine failures. Click to expand
Merton's jump diffusion: In the equation $dS_t = (\mu - \lambda k)S_t dt + \sigma S_t dW_t + (Y - 1)S_t dN_t$, Merton's jump diffusion model is used to describe the price dynamics of a financial asset, incorporating both continuous Brownian motion ($dW_t$) and discrete jumps ($dN_t$). The model accounts for sudden changes in asset prices, such as those caused by market events or news. Examples include option pricing and credit risk modeling. Click to expand
Schwartz model: In the equations $dS_t = \mu S_t dt + \sigma S_t dW_t$ and $d\mu = \alpha(\theta - \mu) dt + \gamma dZ_t$, the Schwartz model is a two-factor model for commodity prices, with $S_t$ representing the spot price and $\mu$ being the stochastic convenience yield. The model captures both the mean-reverting behavior of the convenience yield and the uncertainty in the spot price. Examples include pricing commodity futures and options, as well as valuing natural resource investments. Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Click to expand
Statistical Inference
Statistical Inference Click to expand
Confidence interval estimates a population parameter with a specified level of confidence. Click to expand
Hypothesis testing compares a statistic with a null hypothesis. Click to expand
Student's t-test is used when the population standard deviation is unknown. Click to expand
Analysis of Variance
Analysis of Variance Click to expand
ANOVA F-test compares variances of multiple groups. Click to expand
Regression Analysis
Regression Analysis Click to expand
Coefficient of determination measures the proportion of variance in the response variable explained by the model. Click to expand
Residual sum of squares measures the difference between observed and predicted values. Click to expand
Total sum of squares measures the total variation in the response variable. Click to expand
Descriptive Statistics
Descriptive Statistics Click to expand
Mean value represents the average of a dataset. Click to expand
Standard deviation measures the dispersion of a dataset. Click to expand
Ordinary Differential Equations (ODEs) are equations involving a function and its derivatives, and they arise in various fields, including physics, engineering, and economics. ODEs describe the dynamics of systems with a finite number of variables, and they play a crucial role in understanding the behavior of these systems over time. In machine learning, ODEs are used in areas such as neural ODEs, dynamical systems modeling, and reinforcement learning. In finance, ODEs are employed to model the evolution of interest rates, option pricing, and other time-dependent phenomena.
Ordinary Differential Equations Click to expand
First-order ODE relates the derivative of a function to the function itself and an independent variable. Click to expand
Second-order ODE relates the second derivative of a function to the function, its first derivative, and an independent variable. Click to expand
Linear ODE is an ODE where the function and its derivatives appear linearly with coefficients depending on the independent variable. Click to expand
Homogeneous ODE is a linear ODE where the right-hand side is zero, representing a system with no external input. Click to expand
Partial Differential Equations (PDEs) are equations involving partial derivatives of a multivariate function. PDEs describe the behavior of systems with multiple interacting variables and are widely used in physics, engineering, and other fields. In machine learning, PDEs are applied in areas such as image processing, fluid dynamics simulations, and generative adversarial networks. In finance, PDEs are used to model the Black-Scholes equation for option pricing, portfolio optimization, and risk management.
Partial Differential Equations Click to expand
Laplace's equation describes steady-state phenomena, where the second derivatives sum to zero. Click to expand
Poisson's equation is a generalization of Laplace's equation, where the second derivatives sum to a source term f. Click to expand
Heat equation models heat diffusion, with time derivative proportional to the Laplacian of the function. Click to expand
Wave equation models wave propagation, with second time derivative proportional to the Laplacian of the function. Click to expand
Transport equation models the advection of a quantity, with time derivative proportional to the spatial derivative. Click to expand
Schrödinger equation is a fundamental equation in quantum mechanics, with complex time derivative proportional to the Laplacian and potential energy. Click to expand
Klein-Gordon equation is a relativistic wave equation, with second time derivative related to the Laplacian and mass-energy term. Click to expand
Navier-Stokes equation models fluid dynamics, with time derivative of velocity related to pressure, viscosity, and external forces. Click to expand
Continuity equation expresses the conservation of mass, with time derivative of density related to the divergence of the mass flux. Click to expand
Black-Scholes PDE is a fundamental equation in option pricing, relating the option value to the underlying asset price, time, and volatility. Click to expand
Fokker-Planck equation describes the time evolution of a probability distribution, considering drift and diffusion processes. Click to expand
Burgers' equation is a fundamental equation in fluid dynamics, combining advection and diffusion processes. Click to expand
Korteweg–de Vries equation models nonlinear wave propagation, combining advection, dispersion, and nonlinearity. Click to expand
Stochastic Differential Equations (SDEs) are differential equations that incorporate random processes, making them suitable for modeling systems with uncertainty or noise. SDEs are used in fields such as physics, finance, and biology to describe the dynamics of systems influenced by random factors. In machine learning, SDEs are employed in stochastic optimization, Bayesian filtering, and probabilistic modeling. In finance, SDEs are used to model asset prices, interest rates, and other financial instruments subject to random fluctuations, and they play a central role in quantitative finance and risk management.
Stochastic Differential Equations Click to expand
SPDE general form represents stochastic partial differential equations, combining deterministic PDEs with stochastic processes. Click to expand
Stochastic heat equation combines the heat equation with a stochastic process, modeling random diffusion. Click to expand
Stochastic wave equation combines the wave equation with a stochastic process, modeling random wave propagation. Click to expand
Stochastic Burgers' equation combines Burgers' equation with a stochastic process, modeling random advection and diffusion. Click to expand
Stochastic Navier-Stokes equation models fluid dynamics with random fluctuations, accounting for turbulence and uncertainty in fluid flow. Click to expand
Stochastic Schrödinger equation describes the time evolution of a quantum system with random fluctuations, useful for modeling quantum systems with noise. Click to expand
Stochastic reaction-diffusion equation models the spatiotemporal behavior of reacting and diffusing species with random fluctuations, useful for understanding pattern formation and biological processes. Click to expand
Stochastic Korteweg–de Vries equation models nonlinear wave propagation with random fluctuations, useful for understanding solitons and wave interactions in various physical systems. Click to expand
Stochastic Fokker-Planck equation describes the time evolution of probability density functions with random fluctuations, useful for understanding stochastic processes and their applications in various fields. Click to expand
Stochastic Allen-Cahn equation models phase separation and interface dynamics with random fluctuations, useful for understanding pattern formation and material science applications. Click to expand
Stochastic Ginzburg-Landau equation models phase transitions and critical phenomena with random fluctuations, useful for understanding superconductivity and other condensed matter systems. Click to expand
Stochastic Cahn-Hilliard equation models phase separation and coarsening dynamics with random fluctuations, useful for understanding material science and pattern formation. Click to expand
Stochastic Kuramoto-Sivashinsky equation models pattern formation and spatiotemporal chaos with random fluctuations, useful for understanding fluid dynamics and combustion processes. Click to expand
Stochastic Fisher-KPP equation models population dynamics and pattern formation with random fluctuations, useful for understanding biological processes and ecological systems. Click to expand
Stochastic Benjamin-Ono equation models nonlinear wave propagation and soliton interactions with random fluctuations, useful for understanding water waves and other physical systems. Click to expand
In the Bisection method, c represents the midpoint of the interval [a, b]. This method is useful for finding the root of a continuous function in a given interval by iteratively narrowing down the interval until the root is found. Click to expand
In Newton's method, x_n is the current approximation of the root, and x_{n+1} is the next approximation. This method is useful for finding the root of a differentiable function by iteratively improving the approximation using the function's derivative. Click to expand
In the Secant method, x_n and x_{n-1} are the current and previous approximations of the root, respectively. This method is useful for finding the root of a continuous function by iteratively updating the approximation using a secant line between the current and previous approximations. Click to expand
Fixed-point iteration: In this equation, g(x) is a function that maps x to another value, and the iteration continues until it converges to a fixed point. This method is useful for finding the roots of nonlinear equations and solving systems of nonlinear equations. Click to expand
Lagrange interpolation: In this equation, L(x) is the interpolating polynomial, f(x_i) are the function values at the given points, and the product term calculates the weights for each point. Lagrange interpolation is useful for approximating a function given a set of points. Click to expand
Divided differences: In this equation, f[x_0, x_1, ..., x_k] represents the divided difference of the function f at the given points. Divided differences are useful for constructing Newton's interpolation polynomial and for evaluating higher-order derivatives. Click to expand
Newton's interpolation: In this equation, N(x) is the interpolating polynomial, f[x_0], f[x_0, x_1], ... are the divided differences, and the product term calculates the weights for each point. Newton's interpolation is useful for approximating a function given a set of points and is more efficient than Lagrange interpolation when adding new points. Click to expand
Simpson's rule: In this equation, f(a), f((a+b)/2), and f(b) are the function values at the endpoints and midpoint of the interval [a, b]. Simpson's rule is a numerical integration method that provides a more accurate approximation than the trapezoidal rule by using a quadratic function to approximate the integrand. Click to expand
Trapezoidal rule: In this equation, f(a) and f(b) are the function values at the endpoints of the interval [a, b]. The trapezoidal rule is a numerical integration method that approximates the integral by using a linear function to approximate the integrand. Click to expand
Romberg integration: In this equation, R_{i,j} represents the Romberg integration approximation. Romberg integration is a numerical integration method that combines the trapezoidal rule with Richardson extrapolation to improve the accuracy of the approximation. Click to expand
Gauss quadrature: In this equation, w_i are the weights and x_i are the nodes of the quadrature. Gauss quadrature is a numerical integration method that provides more accurate approximations than other methods by choosing the nodes and weights optimally. Click to expand
Jacobi iteration: In this equation, a_{ij} are the coefficients of the linear system, b_i are the constant terms, and x_i^{(k)} are the iterates. Jacobi iteration is an iterative method for solving systems of linear equations. Click to expand
In Gauss-Seidel iteration, the equation is used to iteratively solve a system of linear equations. It converges faster than the Jacobi method and is useful for solving large sparse systems. Click to expand
In the SOR method, the equation is an extension of the Gauss-Seidel method with a relaxation factor (ω) to improve convergence. It is useful for solving large sparse systems more efficiently than Gauss-Seidel. Click to expand
In Euler's method, the equation is a simple numerical method for solving ordinary differential equations (ODEs) with a given initial value. It is useful for its simplicity and ease of implementation. Click to expand
In the Midpoint method, the equation is a numerical method for solving ordinary differential equations (ODEs) with a given initial value. It is an improvement over Euler's method in terms of accuracy. Click to expand
In the Runge-Kutta method (4th order), the equation is a widely used numerical method for solving ordinary differential equations (ODEs) with a given initial value. It provides better accuracy than Euler's and Midpoint methods. Click to expand
In the Adams-Bashforth method, the equation is a numerical method for solving ordinary differential equations (ODEs) with a given initial value. It is an explicit multistep method that uses previous steps to improve accuracy. Click to expand
In the Finite difference method, the equation is a numerical method for approximating the derivative of a function using discrete data points. It is widely used in various fields, including finance and engineering. Click to expand
In the Central difference method, the equation is a numerical method for approximating the second derivative of a function using discrete data points. It is widely used in various fields, including finance and engineering. Click to expand
In the Thomas algorithm, $c_i'$, $d_i'$, $x_n$, and $x_i$ are intermediate variables used to solve a tridiagonal system of linear equations. This algorithm is useful for efficiently solving such systems, which often arise in numerical methods for solving partial differential equations. Click to expand
In the QR factorization, $A$ is a given matrix, and $Q$ and $R$ are the orthogonal and upper triangular matrices, respectively, that satisfy the factorization. This decomposition is useful for solving linear systems, eigenvalue problems, and least squares problems. Click to expand
In the singular value decomposition (SVD), $A$ is a given matrix, and $U$, $\\Sigma$, and $V^T$ are the orthogonal, diagonal, and transpose of an orthogonal matrix, respectively, that satisfy the decomposition. SVD is useful for dimensionality reduction, data compression, and noise reduction. Click to expand
In the power method, $x^{(k+1)}$ is the next approximation of the eigenvector corresponding to the largest eigenvalue of matrix $A$, and $x^{(k)}$ is the current approximation. The power method is useful for finding the largest eigenvalue and its corresponding eigenvector of a matrix. Click to expand
In the Rayleigh quotient, $R(x)$ is a scalar value that approximates the eigenvalue of matrix $A$ corresponding to the eigenvector $x$. The Rayleigh quotient is useful for estimating eigenvalues and can be used in iterative methods, such as the Rayleigh quotient iteration. Click to expand
In the Gram-Schmidt process, $v_i$ is the orthogonalized vector obtained from the input vector $u_i$. This process is useful for constructing an orthonormal basis for a vector space, which is important in various linear algebra applications. Click to expand
In the LU factorization, $A$ is a given matrix, and $L$ and $U$ are the lower and upper triangular matrices, respectively, that satisfy the factorization. This decomposition is useful for solving linear systems, computing determinants, and inverting matrices. Click to expand
In the Cholesky factorization, $A$ is a symmetric positive definite matrix, and $L$ is a lower triangular matrix that satisfies the factorization. This decomposition is useful for solving linear systems, computing determinants, and inverting matrices when the matrix is symmetric positive definite. Click to expand
Algebra and Analysis Inequalities, such as the Triangle inequality, Cauchy-Schwarz, and Minkowski inequality, are indispensable in various quantitative fields, including machine learning, finance, and trading. These inequalities provide bounds on relationships between variables, which help in understanding their interactions, analyzing the stability of algorithms, and validating mathematical models, especially in higher-dimensional or multivariate cases where visualization is challenging. They also aid in simplifying complex expressions and proving essential theorems. For example, Cauchy-Schwarz is widely used in portfolio optimization, where it is crucial to quantify the correlation between asset returns. In machine learning, these inequalities help in understanding and analyzing the convergence of optimization algorithms and can improve the generalization of models. Using these inequalities allows practitioners to build more robust and efficient solutions compared to alternative approaches.
Algebra and Analysis Inequalities Click to expand
The Triangle inequality is used to prove that the shortest path between two points in a metric space is a straight line. For example, consider points A, B, and C in a plane. To prove that the shortest path between A and C is the straight line AC, apply the triangle inequality: $||A-C|| \le ||A-B|| + ||B-C||$. It shows that the straight line AC is shorter than or equal to any other path through an intermediate point B. Click to expand
The Cauchy-Schwarz inequality is used to prove the AM-GM inequality. Given $x_i \ge 0$ for $1 \le i \le n$, the Cauchy-Schwarz inequality can be applied with $x_i = \sqrt{x_i}$ and $y_i = \sqrt{\frac{1}{n}}$: $|\sum_{i=1}^n x_iy_i|^2 \le \sum_{i=1}^n x_i^2 \sum_{i=1}^n y_i^2$. This simplifies to the AM-GM inequality $\frac{x_1+x_2+\cdots+x_n}{n} \ge \sqrt[n]{x_1x_2\cdots x_n}$. Click to expand
The Arithmetic-Geometric mean inequality states that the arithmetic mean of a set of non-negative numbers is always greater than or equal to the geometric mean. Some interesting implications include:<br><ul><li>Estimating the average of a set of positive numbers: Since the geometric mean is always less than or equal to the arithmetic mean, it provides a lower bound for the average: $\sqrt[n]{x_1x_2\cdots x_n} \le \frac{x_1 + x_2 + \cdots + x_n}{n}$.</li><li>Proving the Cauchy-Schwarz inequality: $\left(\sum_{i=1}^n x_iy_i\right)^2 \le \left(\sum_{i=1}^n x_i^2\right)\left(\sum_{i=1}^n y_i^2\right)$ using AM-GM.</li><li>Optimizing mathematical expressions: For example, finding the minimum of $a^2 + \frac{1}{a^2}$ for a positive number $a$. Using AM-GM, we have $\frac{a^2 + \frac{1}{a^2}}{2} \ge \sqrt{a^2 \cdot \frac{1}{a^2}}$, which implies $a^2 + \frac{1}{a^2} \ge 2$.</li><li>Providing an upper bound for the harmonic mean of a set of positive numbers: $\frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + \cdots + \frac{1}{x_n}} \le \sqrt[n]{x_1x_2\cdots x_n}$.</li></ul> Click to expand
Jensen's inequality states that for a convex function $f$, the function evaluated at the average of a set of points is less than or equal to the average of the function evaluated at each point. Some interesting implications include:<br><ul><li>For any probability distribution, the expected value of the logarithm of a random variable is less than or equal to the logarithm of the expected value: $E(\log X) \le \log(E(X))$ (by applying Jensen's inequality to the logarithm function).</li><li>The geometric mean of a set of positive numbers is always less than or equal to the arithmetic mean: $\sqrt[n]{x_1x_2\cdots x_n} \le \frac{x_1 + x_2 + \cdots + x_n}{n}$ (by applying Jensen's inequality to the logarithm function).</li><li>The exponential of the average of a set of numbers is less than or equal to the average of the exponential of each number: $e^{\frac{x_1 + x_2 + \cdots + x_n}{n}} \le \frac{e^{x_1} + e^{x_2} + \cdots + e^{x_n}}{n}$ (by applying Jensen's inequality to the exponential function).</li></ul> Click to expand
Hölder's inequality can be used to prove the Cauchy-Schwarz inequality. Given vectors $x$ and $y$, let $p=2$ and $q=2$, and apply Hölder's inequality: $\sum_{i=1}^n |x_iy_i| \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}}\left(\sum_{i=1}^n |y_i|^q\right)^{\frac{1}{q}}$. This simplifies to the Cauchy-Schwarz inequality $|x^T y| \le ||x|| ||y||$. Click to expand
Minkowski inequality is used to prove that the $L^p$ spaces are normed vector spaces. Given sequences $x$ and $y$ and $1 \le p < \infty$, apply the Minkowski inequality: $\left(\sum_{i=1}^n |x_i + y_i|^p\right)^{\frac{1}{p}} \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}} + \left(\sum_{i=1}^n |y_i|^p\right)^{\frac{1}{p}}$. This shows that the $L^p$ norm satisfies the triangle inequality, a necessary condition for a normed vector space. Click to expand
Young's inequality can be used to prove Hölder's inequality. Given positive numbers $a$ and $b$, let $p>1$ and $q>1$ be conjugate exponents ($\frac{1}{p}+\frac{1}{q}=1$). Apply Young's inequality: $ab \le \frac{a^p}{p} + \frac{b^q}{q}$. This can be used as an intermediate step to prove Hölder's inequality. By applying Young's inequality to each term in the sum of the products of $x_i$ and $y_i$, we can show that $\sum_{i=1}^n |x_iy_i| \le \left(\sum_{i=1}^n |x_i|^p\right)^{\frac{1}{p}}\left(\sum_{i=1}^n |y_i|^q\right)^{\frac{1}{q}}$. Click to expand
Concentration Inequalities, such as Chebyshev's, Bernstein's, Hoeffding's, and Azuma's inequalities, are invaluable in many quantitative fields like finance, trading, and machine learning. These inequalities provide bounds on the probability of a random variable deviating significantly from its mean, allowing practitioners to quantify the likelihood of extreme events and evaluate the risk associated with their models. Unlike some alternative methods that assume asymptotic regimes, concentration inequalities offer more accurate estimations in finite sample scenarios. In finance and trading, concentration inequalities are used to estimate the probability of large losses or gains, which informs risk management and helps in making better-informed decisions. In machine learning, these inequalities are used to assess model stability and to derive generalization bounds, leading to improved performance and robustness. Concentration inequalities offer a powerful approach to understanding uncertainty and managing risk, which can be more informative and actionable than alternative methods.
Concentration Inequalities Click to expand
In Chebyshev's inequality, $X$ is a random variable with mean $\\mu$ and standard deviation $\\sigma$. The inequality bounds the probability that $X$ deviates from its mean by at least $k\\sigma$. This inequality is useful for understanding the concentration of probability around the mean of a random variable. Click to expand
In Markov's inequality, $X$ is a non-negative random variable with expected value $E[X]$. The inequality bounds the probability that $X$ is at least $a$ times its expected value. This inequality is useful for understanding the tail behavior of random variables. Click to expand
In Bernstein's inequality, $X_i$ are independent random variables with mean $E[X_i]$ and variance $\\text{Var}(X_i)$. The inequality bounds the probability that the sum of the centered random variables deviates from its expected value by at least $t$. This inequality is useful for understanding the concentration of probability around the mean of a sum of random variables. Click to expand
In Hoeffding's inequality, $X_i$ are independent random variables with mean $E[X_i]$ and bounded range $[a_i, b_i]$. The inequality bounds the probability that the sum of the centered random variables deviates from its expected value by at least $t$. This inequality is useful for understanding the concentration of probability around the mean of a sum of bounded random variables. Click to expand
In Azuma's inequality, $X_i$ are independent random variables with mean $E[X_i]$ and bounded differences $|X_i - X_{i-1}| \le c_i$. The inequality bounds the probability that the sum of the centered random variables deviates from its expected value by at least $t$. This inequality is useful for understanding the concentration of probability around the mean of a sum of random variables with bounded differences. Click to expand

Optimization Inequalities and Functional Analysis

In Schur's inequality, $a, b, c$ are non-negative real numbers and $p, q, r$ are positive real numbers such that $p+q=r$. The inequality states that the sum of the products of the numbers raised to their respective powers is greater than or equal to the product of the numbers raised to the difference of their powers. Schur's inequality is useful in various fields of mathematics, including algebra and number theory. Click to expand
In the Rearrangement inequality, the sequences a and b are sorted in non-increasing order, while the sequence c is sorted in non-decreasing order. This inequality is useful for bounding sums of products in various applications, such as optimization problems. Click to expand
In Cauchy's interlacing, A and B are Hermitian matrices with B being a principal submatrix of A. The $\lambda_i$ values represent the eigenvalues of the matrices. This theorem is useful for understanding the relationship between eigenvalues of a matrix and its submatrices, which has applications in various fields, including signal processing and linear algebra. Click to expand
In the Schwarz inequality, a and b are sequences of real or complex numbers. This inequality is a fundamental result in linear algebra and analysis, with applications in various fields, such as optimization and probability theory. Click to expand
In the Poincaré inequality, u is a function in a Sobolev space, and $\Omega$ is a bounded domain. This inequality is useful for bounding the $L^2$ norm of a function by the $L^2$ norm of its gradient, which has applications in partial differential equations and functional analysis. Click to expand
In the Sobolev inequality, u is a function in a Sobolev space, and $\Omega$ is a bounded domain. The $L^p$ and $L^{p^*}$ norms are Lebesgue space norms. This inequality is useful for bounding the norm of a function in one Lebesgue space by the norm of its gradient in another Lebesgue space, which has applications in partial differential equations and functional analysis. Click to expand
In the Cramér-Rao inequality, $\hat{\theta}$ is an unbiased estimator of the parameter $\theta$, and $I(\theta)$ is the Fisher information of the parameter. This inequality provides a lower bound on the variance of an unbiased estimator, which is useful in statistics and estimation theory. Click to expand
In the Paley-Zygmund inequality, X is a random variable, and $\alpha$ is a positive constant. This inequality provides a lower bound on the probability that the absolute difference between X and its expected value is at least $\alpha$ times the expected absolute difference. It is useful in probability theory and has applications in various fields, such as signal processing and information theory. Click to expand
In the Rademacher-Menchov inequality, $\varepsilon_i$ are independent Rademacher random variables (taking values +1 or -1 with equal probability), and $a_i$ are real numbers. This inequality provides an upper bound on the variance of a sum of Rademacher random variables, which is useful in probability theory and has applications in various fields, such as concentration inequalities and learning theory. Click to expand
In the Rayleigh quotient, the expression represents the range of eigenvalues of a symmetric matrix A, where λ_min and λ_max are the minimum and maximum eigenvalues, respectively. This relationship is useful for understanding the properties of eigenvalues and eigenvectors in linear algebra and optimization problems. Click to expand
Courant-Fisher theorem states the k-th eigenvalue of a symmetric matrix A can be found by minimizing the Rayleigh quotient over all k-dimensional subspaces U_k. This theorem is useful for finding eigenvalues and solving optimization problems in various fields, including finance and engineering. Click to expand
Gershgorin's theorem states that the eigenvalues of a matrix A lie within the union of disks centered at the diagonal elements a_ii with radii R_i, which is the sum of the absolute values of the off-diagonal elements in the i-th row. This theorem is useful for estimating the location of eigenvalues and understanding the properties of matrices. Click to expand
Hadwiger's inequality provides an upper bound on the volume of a convex body K in n-dimensional space, given the radii r_i of its n inradii. This inequality is useful for understanding the geometric properties of convex bodies and has applications in computational geometry and optimization. Click to expand
Bonferroni's inequality provides an upper bound on the probability of the union of events A_i, taking into account the pairwise intersections of the events. This inequality is useful for understanding the properties of probabilities and has applications in statistics and probability theory. Click to expand
Boole's inequality provides an upper bound on the probability of the union of events A_i, without considering the intersections of the events. This inequality is useful for understanding the properties of probabilities and has applications in statistics and probability theory. Click to expand
Nash inequality relates the L^2 norm, L^1 norm, and the gradient of a function u in a domain Ω, providing a bound on the L^2 norm in terms of the gradient and L^1 norm. This inequality is useful for understanding the properties of functions and has applications in partial differential equations and mathematical analysis. Click to expand
Hardy-Littlewood-Sobolev inequality provides a bound on the L^q norm of a function u in a domain Ω in terms of its gradient's L^p norm. This inequality is useful for understanding the properties of functions and has applications in partial differential equations and mathematical analysis. Click to expand

Approximation Algorithms

Greedy set cover uses an approximation algorithm with a logarithmic factor of the input size, n, as an upper bound on its solution quality. Click to expand
Metric TSP algorithm provides a 1.5 approximation factor of the optimal solution. Click to expand
Minimum vertex cover approximation algorithm provides a solution no more than twice the optimal value. Click to expand
Max-cut approximation algorithm guarantees a solution at least half of the optimal value. Click to expand
Max-k-cut algorithm guarantees an approximation factor of k/(k-1) of the optimal solution. Click to expand
Knapsack approximation algorithm guarantees a solution within (1 - ε) of the optimal value. Click to expand
Bin packing approximation algorithm provides a solution within 11/9 of the optimal value plus 4. Click to expand
Luby's MIS algorithm computes a maximal independent set in O(log n) time. Click to expand
FPTAS guarantees a solution within (1 - ε) of the optimal value for optimization problems. Click to expand
Christofides algorithm provides a 1.5-approximation for metric TSP problems. Click to expand
Facility location approximation algorithm provides a solution within twice the optimal value. Click to expand
Local search provides an approximation algorithm with a solution at least as good as the optimal value. Click to expand
Fractional knapsack problem has an optimal solution using a greedy approach. Click to expand
Greedy matching algorithm guarantees a solution at least half the optimal value. Click to expand
Randomized rounding guarantees a solution that is at least as good as the optimal fractional solution. Click to expand
Scheduling with rejection guarantees a solution within (1 - ε) of the optimal value. Click to expand
Balanced allocation algorithm ensures an expected load of O(log log n) for a set of items. Click to expand
Multiway cut approximation algorithm guarantees a solution within a factor of 2 of the optimal value. Click to expand

Randomized Algorithms

Karger's min-cut algorithm guarantees a success probability of at least 1 / n choose 2. Click to expand
Fermat primality test gives an upper bound on the probability of false positives. Click to expand
Miller-Rabin primality test has a smaller upper bound on the probability of false positives than Fermat's test. Click to expand
Rabin-Karp algorithm has an upper bound on the probability of hash collisions. Click to expand
Metropolis-Hastings algorithm converges to a stationary probability distribution π(x). Click to expand
Gibbs sampling produces samples from a distribution proportional to the target distribution π(x). Click to expand
Coupon collector problem gives the expected number of trials to collect n distinct coupons. Click to expand
PageRank algorithm calculates the probability of a user visiting a page as a linear combination of the expected number of inbound links and the uniform distribution. Click to expand
Simulated annealing algorithm guarantees a near-optimal solution with probability at least 1 - exp(-t). Click to expand
Randomized quicksort has an expected time complexity of O(n log n). Click to expand
Monte Carlo method for approximating π has an expected value equal to π. Click to expand
Las Vegas algorithms always produce a correct result with a probability of 1. Click to expand
Power of two choices paradigm reduces the expected load in a load balancing problem to O(log log n). Click to expand

Graph Algorithms

Multiway cut problem aims to find a cut with minimum cost in a graph, and the approximation algorithm guarantees a solution with cost at most twice the optimal solution. This problem appears in clustering, network design, and image segmentation. Click to expand
Johnson's algorithm is an approximation algorithm for the scheduling problem, where the goal is to minimize the maximum completion time. The algorithm guarantees a schedule with maximum completion time at most 2 - 1/k times the optimal. Click to expand
Minimum spanning tree (MST) algorithms aim to find an optimal tree connecting all vertices in a graph with minimum cost. MST is a fundamental problem in graph theory, with applications in network design, clustering, and transportation. Click to expand
Prim's algorithm is a greedy algorithm for finding the minimum spanning tree of a graph, guaranteeing an optimal solution. It is widely used in network design, clustering, and transportation optimization problems. Click to expand
Kruskal's algorithm is a greedy algorithm for finding the minimum spanning tree of a graph, guaranteeing an optimal solution. It is widely used in network design, clustering, and transportation optimization problems. Click to expand
Boruvka's algorithm, also known as Sollin's algorithm, is an algorithm for finding the minimum spanning tree of a graph, guaranteeing an optimal solution. It is widely used in network design, clustering, and transportation optimization problems. Click to expand
Max flow-min cut theorem states that the maximum flow in a network is equal to the capacity of the minimum cut. This theorem is fundamental in network flow problems and combinatorial optimization. Click to expand
Edmonds-Karp algorithm is an implementation of the Ford-Fulkerson method for computing the maximum flow in a flow network, guaranteeing an optimal solution. Click to expand
Ford-Fulkerson algorithm is a greedy algorithm for computing the maximum flow in a flow network, guaranteeing an optimal solution. Click to expand
Spectral clustering is a technique for partitioning data into clusters, where the approximation is bounded by the square root of twice the optimal solution. This method is particularly useful for graph partitioning and image segmentation. Click to expand
Traveling salesman LP is a linear programming relaxation of the TSP problem, where the approximation is bounded by twice the optimal solution. This method provides a lower bound for the TSP problem and is used for solving TSP heuristics. Click to expand

Numerical Algorithms

SVD image compression is a technique that leverages Singular Value Decomposition (SVD) to approximate an image matrix, keeping only the most significant singular values. This results in a compressed image with minimal information loss. Click to expand
AKS primality is an algorithm for determining if a given number is a prime. The algorithm is deterministic, polynomial-time, and guarantees that the probability of a number being prime is 1 when the algorithm concludes it is prime. Click to expand
Load balancing is the problem of distributing tasks or resources among servers or processors to minimize the maximum load. The approximation factor for many load balancing algorithms is at most twice the optimal solution. Click to expand
Linear programming rounding is a technique used to find an integer solution to a linear programming problem by rounding the optimal fractional solution. The approximation factor of this technique is equal to or greater than the optimal solution. Click to expand

Data Structures

Bloom filters are probabilistic data structures used to test whether an element is a member of a set. They have a false positive probability determined by the number of hash functions (k), the number of items (n), and the size of the bit array (m). Click to expand
Randomized treaps are a type of binary search tree with randomized priorities assigned to each node. The expected height of a randomized treap with n elements is O(log n), providing good balance and efficient search, insert, and delete operations. Click to expand

Online Algorithms

The ski-rental problem is an online algorithm problem in which an individual must decide whether to rent or buy skis. The competitive ratio for this problem is at most 2, meaning the online algorithm's cost is at most twice the offline optimal solution. Click to expand
Online paging is an online algorithm problem in which a limited number of pages must be stored in cache memory while minimizing page faults. The competitive ratio for this problem is at most k+1, where k is the cache size. Click to expand
The multiplicative weight update algorithm is an iterative optimization method that maintains weights for a set of variables. The approximation factor of this algorithm is at least 1/ε times the optimal solution, where ε is the desired precision. Click to expand

Other

The set cover linear programming relaxation is an approximation algorithm for the set cover problem, a classical NP-hard problem. The approximation factor for this algorithm is O(log n), where n is the number of elements in the set. Click to expand
Max-SAT is an optimization problem that seeks to maximize the number of satisfied clauses in a Boolean formula. The approximation algorithm guarantees at least 1/2 times the optimal solution. Click to expand
Max k-CNF-SAT is a generalization of Max-SAT, where each clause has exactly k literals. The approximation algorithm guarantees at least (k-1)/k times the optimal solution. Click to expand
Vertex cover linear programming relaxation is an approximation algorithm for the vertex cover problem, another classical NP-hard problem. The approximation factor for this algorithm is 2, meaning that the solution is at most twice the optimal solution. Click to expand
Dilworth's theorem states that in a partially ordered set, the size of the largest antichain equals the minimum number of chains needed to cover the entire set. This theorem is used to analyze the structure of partially ordered sets and their partitions. Click to expand
Graham's algorithm is an approximation algorithm for the makespan minimization problem on parallel identical machines. The algorithm guarantees a makespan at most 4/3 times the optimal solution. Click to expand
List scheduling is an approximation algorithm for the makespan minimization problem on m parallel identical machines. The algorithm guarantees a makespan at most 2 - 1/m times the optimal solution. Click to expand
Dinic's algorithm is an efficient algorithm for computing the maximum flow in a flow network. It guarantees the correct maximum flow value in the given network. Click to expand
The push-relabel algorithm is another efficient algorithm for computing the maximum flow in a flow network. It also guarantees the correct maximum flow value in the given network. Click to expand
Randomized incremental construction is a general technique for constructing geometric and combinatorial objects. It uses randomization to simplify the analysis of the algorithm's complexity, which is typically O(n log n) for most problems. Click to expand
Yao's minimax principle is a method for proving lower bounds on the performance of algorithms, especially in the context of online algorithms and decision-making under uncertainty. It relates the worst-case performance of a randomized algorithm to the best-case performance of a deterministic algorithm. Click to expand
A Nash equilibrium is a solution concept in game theory where each player's strategy is optimal given the strategies of all other players. In a Nash equilibrium, the payoff of each player is at least as good as the optimal payoff. Click to expand
Centrality Measures in Network Science
Centrality Measures Click to expand
Degree centrality measures the number of edges connected to a node, normalized by the maximum possible degree (n-1). It is useful for identifying influential nodes in a network. Click to expand
Closeness centrality quantifies how close a node is to all other nodes in the network. It's useful for identifying well-connected nodes that can quickly disseminate information through the network. Click to expand
Ranking Algorithms and Coefficients in Network Science
Ranking Algorithms and Coefficients Click to expand
Betweenness centrality measures the extent to which a node acts as a bridge along the shortest paths between other nodes. It's useful for identifying nodes that control information flow within a network. Click to expand
Eigenvector centrality assigns relative scores to nodes based on their connections' importance. It's useful for identifying influential nodes considering not only their degree but also the importance of their neighbors. Click to expand
Katz centrality measures the relative influence of a node within a network by considering the total number of walks between nodes, exponentially weighted by their length. It's useful for identifying influential nodes that may not be directly connected to others. Click to expand
PageRank is an algorithm that assigns importance scores to nodes in a network based on their incoming links' importance. It's widely known for its use in Google's search engine and is useful for ranking web pages based on their relevance. Click to expand
Graph Properties and Metrics in Network Science
Graph Properties and Metrics Click to expand
Clustering coefficient measures the proportion of a node's neighbors that are also connected to each other. It's useful for understanding the local structure of a network and identifying tightly-knit communities within it. Click to expand
Global clustering coefficient measures the overall level of clustering in a network by considering all triangles formed by connected nodes. It's useful for understanding the overall tendency for nodes to cluster together in a network. Click to expand
Transitivity is a measure of the likelihood that two nodes with a common neighbor are also connected to each other. It's useful for understanding how well-connected the nodes within a network are and can indicate the presence of densely connected communities. Click to expand
Average shortest path length measures the average distance between all pairs of nodes in a network. It's useful for understanding how efficiently information can be disseminated through a network and indicates how small or large the world phenomenon is present in it. Click to expand
Network Science: Graph properties
Graph properties Click to expand
Diameter is the longest shortest path between any two nodes in a graph, useful for understanding the spread of information or influence. Click to expand
Eccentricity measures how far a node is from the farthest node in the graph, useful for identifying central and peripheral nodes. Click to expand
Radius represents the smallest eccentricity among all vertices, indicating how close the central nodes are to peripheral ones. Click to expand
Network Science: Community structure
Community structure Click to expand
Modularity quantifies the strength of community structures in a network, useful for identifying groups of nodes with strong internal connections. Click to expand
Assortativity measures the correlation between degrees of connected nodes, indicating whether high-degree nodes tend to connect with similar or dissimilar nodes. Click to expand
Network Science: Subgraphs
Subgraphs Click to expand
K-core refers to a subgraph where all vertices have at least k-degree, useful for identifying dense regions within graphs. Click to expand
Network Science: Graph measures
Graph measures Click to expand
Edge density quantifies how many edges are present compared to the maximum possible number, indicating network connectivity. Click to expand
Small-world coefficient compares clustering and path length between a given network and its random counterpart, useful for identifying small-world properties. Click to expand
Network Science: Similarity measures
Similarity measures Click to expand
Jaccard similarity measures the overlap between two nodes' neighborhoods, useful for predicting links or community membership. Click to expand
Cosine similarity calculates the angle between two nodes' neighborhood vectors, useful for comparing node connections and predicting links. Click to expand
Network Science: Index-based measures
Index-based measures Click to expand
Adamic-Adar index weighs common neighbors by their degree, emphasizing rare/common features when predicting links. Click to expand
Resource allocation index quantifies the likelihood of a link between nodes based on their shared neighbors' degrees, useful for predicting links or detecting anomalies. Click to expand
Network Science: Node degree measures
Node degree measures Click to expand
Preferential attachment assumes that nodes with higher degrees are more likely to form new connections, useful for modeling network growth and predicting links. Click to expand
Network Science: Path-based measures
Path-based measures Click to expand
Network Science: Shortest Path and Connected Components
Shortest Path and Connected Components Click to expand
d(u,v) is the shortest path between nodes u and v. It's useful for measuring distances in networks. Click to expand
Click to expand
Network Science: Network Models
Network Models Click to expand
Click to expand
Click to expand
Click to expand
Network Science: Community Detection Algorithms
Community Detection Algorithms Click to expand
Click to expand
Click to expand
Click to expand
Network Science: Cliques and Core-Periphery Structures
Cliques and Core-Periphery Structures Click to expand
Click to expand
Click to expand
Click to expand
Network Science: Rich-Club Coefficient
Rich-Club Coefficient Click to expand
$\phi(k)$ measures the tendency of high-degree nodes to form tightly interconnected subgraphs. It's useful for identifying influential actors within financial networks or market segments. Click to expand
Graph Properties and Parameters
Graph Properties and Parameters Click to expand
Handshaking lemma states that the sum of degrees of all vertices equals twice the number of edges. It's useful for analyzing graph connectivity. Click to expand
Euler's formula relates vertices, edges, and faces of a planar graph. It is important for studying planar graphs and their properties. Click to expand
The inequality states an upper bound on the number of edges in a planar graph. It helps identify non-planarity and analyze dense graphs' properties. Click to expand
Graph Degree Characteristics
Graph Degree Characteristics Click to expand
Maximum degree represents the highest vertex degree in a graph. It's useful for analyzing vertex connectivity and centrality. Click to expand
Graph Theory Properties and Parameters
Graph Properties and Parameters Click to expand
Minimum degree of a graph G, denoted as δ(G), is the smallest degree among all vertices in the graph. It helps to identify properties and constraints of the graph. Click to expand
Chromatic number χ(G) is the smallest number of colors needed to color the vertices of G such that no two adjacent vertices share the same color. The inequality relates chromatic number, vertex set size |V|, and independence number α(G). Click to expand
Chromatic index χ'(G) is the smallest number of colors needed to color the edges of G such that no two adjacent edges share the same color. The inequality relates chromatic index and maximum vertex degree Δ(G). Click to expand
Vertex connectivity κ(G) is the minimum number of vertices that must be removed to disconnect G or make it trivial. It helps to measure the robustness of a graph. Click to expand
Edge connectivity λ(G) is the minimum number of edges that must be removed to disconnect G. It helps to measure the robustness of a graph from an edge perspective. Click to expand
Graph Theory Cycles and Distances
Graph Cycles and Distances Click to expand
Girth g(G) is the length of the shortest cycle in G. It helps to identify the presence of small cycles in a graph. Click to expand
Clique number ω(G) is the largest size of a complete subgraph (clique) in G. It helps to identify highly connected subgraphs within a larger graph. Click to expand
Independence number α(G) is the largest size of an independent set (no two vertices are adjacent) in G. It helps to identify groups of non-interacting elements within a graph. Click to expand
Degree sum formula states that the sum of vertex degrees in G equals twice the number of edges (2m). It's a fundamental property of graphs and helps to relate vertex degrees and edge count. Click to expand
Eulerian circuit condition states that a graph G has an Eulerian circuit if and only if the degree of each vertex is even. It helps to identify graphs that can be traversed by visiting each edge exactly once. Click to expand
Hamiltonian cycle condition states that a graph G has a Hamiltonian cycle if the neighborhood of every subset S of vertices satisfies |N(S)| ≥ |S|. It helps to identify graphs that can be traversed by visiting each vertex exactly once. Click to expand
Graph diameter diam(G) is the longest shortest path between any two vertices in G. It helps to measure the largest distance between any pair of vertices in a graph. Click to expand
Graph radius rad(G) is the smallest eccentricity among all vertices in G. It helps to measure the centralization of a graph by finding the minimum distance from a vertex to all other vertices. Click to expand
Eccentricity ecc(v) is the largest distance from vertex v to any other vertex in G. It helps to measure how far a vertex is from the farthest vertex within a graph. Click to expand
Wiener index W(G) is the sum of all pairwise distances between vertices in G, divided by 2. It helps to measure the overall distance between vertices within a graph and can be related to graph centralization. Click to expand
Graph Theory Matrices and Theorems
Graph Matrices and Theorems Click to expand
Laplacian matrix L(G) is the difference between the degree matrix D(G) and adjacency matrix A(G). It is used to analyze various properties of graphs such as connectivity, spectral clustering, and random walks. Click to expand
Kirchhoff's matrix theorem states that the determinant of the Laplacian matrix L with a row and column removed, multiplied by n, equals the determinant of L itself. It is used to compute the number of spanning trees in a graph. Click to expand
Graph density D(G) is the ratio of the number of edges |E| to the maximum possible number of edges for a given vertex set |V|. It measures how densely connected a graph is. Click to expand
Bipartite graph condition states that a graph G is bipartite if and only if its chromatic number χ(G) is 2. It helps to identify graphs that can be partitioned into two disjoint sets of vertices such that all edges connect vertices from different sets. Click to expand
Petersen's theorem states that a k-regular graph is k-edge-colorable. It helps to identify edge-colorable graphs based on their regularity. Click to expand
Ramsey's theorem states that R(s, t) is the smallest integer such that any graph with R(s, t) vertices contains either a clique of size s or an independent set of size t. It helps to find the minimum size of a graph that guarantees the existence of certain substructures. Click to expand
Erdős-Stone theorem relates the extremal number ex(n, H) of a graph H to its chromatic index χ'(H). It helps to find the maximum number of edges a graph can have without containing a given subgraph H. Click to expand
Dirac's theorem states that if the minimum degree δ(G) of a graph G is at least |V|/2, then G contains a Hamiltonian cycle. It helps to identify graphs with Hamiltonian cycles based on their minimum degree. Click to expand
Ore's theorem states that if the sum of degrees of any two non-adjacent vertices u and v is at least |V|, then G contains a Hamiltonian cycle. It helps to identify graphs with Hamiltonian cycles based on the sum of degrees of non-adjacent vertices. Click to expand
Graph complement $\overline{G}$ is a graph formed by the same vertex set V but with the edge set $\overline{E}$ consisting of all edges not present in G. It helps to analyze properties and structures of graphs by studying their complements. Click to expand
Turán's theorem states that the maximum number of edges ex(n, K_r) in an n-vertex graph without a complete subgraph K_r is given by the formula. It helps to find edge density constraints for graphs without specific complete subgraphs. Click to expand
Graph Theory: Minors, Planarity, and Coloring
Graph Minors, Planarity, and Coloring Click to expand
A graph minor G is a subgraph of H formed by deleting vertices/edges and contracting edges. Graph minors help identify if a graph contains another as a minor. Click to expand
Kuratowski's theorem states that a graph is planar if it doesn't contain K5 or K3,3 as a minor. Planar graphs are important due to their simpler structure and applications in network design. Click to expand
Vizing's theorem gives an upper and lower bound on the edge chromatic number of a graph. It helps with scheduling problems and resource allocation. Click to expand
Brooks' theorem provides an upper bound on the chromatic number of a graph. It helps with scheduling and resource allocation problems. Click to expand
Graph Theory: Matching and Connectivity
Matching and Connectivity Click to expand
Hall's marriage theorem gives a necessary and sufficient condition for the existence of perfect matchings. It has applications in combinatorial optimization and matching markets. Click to expand
Menger's theorem relates the minimum vertex cut of a graph to its connectivity. It helps analyze network robustness and resilience. Click to expand
Graph Theory: Probabilistic Methods and Algorithms
Probabilistic Methods and Algorithms Click to expand
Lovász Local Lemma gives a sufficient condition for the existence of an object avoiding certain events. It has applications in combinatorial optimization and probabilistic methods. Click to expand
Graph isomorphism checks if two graphs have the same structure. It has applications in pattern recognition and data analysis. Click to expand
Havel-Hakimi algorithm checks if a degree sequence is realizable as a simple graph. It helps with graph reconstruction and network analysis. Click to expand
Graph Theory: Topological and Spectral Properties
Topological and Spectral Properties Click to expand
Euler's characteristic relates the number of vertices, edges, and faces of a planar graph. It's important for understanding the topology of graphs. Click to expand
Spectral radius measures the largest eigenvalue of a graph's adjacency matrix. It helps analyze graph connectivity and spectral properties. Click to expand
Chromatic polynomial counts the number of proper colorings of a graph with k colors. It helps with scheduling problems and combinatorial optimization. Click to expand
Tutte's theorem gives a necessary and sufficient condition for the existence of perfect matchings. It has applications in combinatorial optimization and matching markets. Click to expand
Small Angle Approximations
Small Angle Approximations Click to expand
These approximations are useful when angles are small, simplifying calculations for trigonometric functions. Click to expand
Useful for simplifying calculations involving small angles in cosine functions. Click to expand
Useful for simplifying calculations involving small angles in tangent functions. Click to expand
Higher Order Approximations
Higher Order Approximations Click to expand
Higher order sine approximation for more accurate results in certain mathematical computations. Click to expand
Higher order cosine approximation for more accurate results in certain mathematical computations. Click to expand
Useful for approximating the exponential function with a few terms of the Taylor series. Click to expand
Higher order exponential approximation for more accurate results in mathematical computations involving exponentials. Click to expand
Approximates the natural logarithm for small values of x, useful for quick calculations. Click to expand
Higher order natural logarithm approximation for more accurate results in mathematical computations involving logarithms. Click to expand
Approximates the binomial expansion for small values of x, useful for quick calculations. Click to expand
Approximates the factorial function for large values of n, useful for simplifying calculations. Click to expand
Approximates the distribution of the sample mean, useful in statistics and probability. Click to expand
Approximates the probability mass function of a Poisson distribution, useful for modeling rare events. Click to expand
Approximates the sigmoid function for small values of x, useful for simplifying calculations. Click to expand
Approximates the arc length of a curve using a finite sum of line segments, useful for geometric calculations. Click to expand
Approximates the area under a curve using a finite sum, useful for numerical integration. Click to expand
Euler's method is a first-order numerical procedure for solving ordinary differential equations, useful for approximating solutions to initial value problems. Click to expand
Trapezoidal rule is a numerical integration technique that approximates the definite integral of a function using trapezoids, useful for quick calculations. Click to expand
Simpson's rule is a numerical integration technique that approximates the definite integral of a function using parabolas, providing more accurate results than the trapezoidal rule. Click to expand
Bode's rule is a numerical integration technique that approximates the definite integral of a function using a specific weighted combination of function values, providing more accurate results for certain problems. Click to expand
Gaussian quadrature is a numerical integration technique that approximates the definite integral of a function using a weighted sum of function values at specific points, providing more accurate results for certain problems. Click to expand
Newton-Cotes formulas are a family of numerical integration techniques that approximate the definite integral of a function using a weighted sum of function values at specific points, including the trapezoidal and Simpson's rule. Click to expand
Laplace's method is an asymptotic technique for approximating definite integrals of exponential functions, useful for certain problems with large exponents. Click to expand
L'Hôpital's rule is a technique for evaluating limits involving indeterminate forms, useful for simplifying calculations in calculus. Click to expand
Bernoulli's inequality is a useful inequality for approximating the power of a sum, applicable in various mathematical problems. Click to expand
Taylor series is a representation of a function as an infinite sum of terms, useful for approximating functions with polynomials. Click to expand
Maclaurin series approximates a function as a polynomial centered at 0, useful for approximating functions near 0 Click to expand
Power series generalizes Maclaurin series, centering the polynomial at any point c Click to expand
Riemann sum approximates the definite integral of a function using a finite sum of rectangles Click to expand
Linear interpolation estimates a value between two known points using a straight line Click to expand
Interpolation and approximation methods
Interpolation and Approximation Click to expand
Cubic spline interpolation estimates values using a piecewise-defined cubic polynomial Click to expand
Pade approximant rational function approximation of a series, can approximate functions with singularities Click to expand
Continued fraction representation of real numbers, can provide good approximations with few terms Click to expand
Hyperbolic functions and rules of thumb
Hyperbolic Functions and Rules of Thumb Click to expand
Hyperbolic sine approximation using a Maclaurin series Click to expand
Hyperbolic cosine approximation using a Maclaurin series Click to expand
Hyperbolic tangent approximation, useful for approximating activation functions Click to expand
Rule of 72 estimates doubling time for investments, useful for quick mental calculations Click to expand
Geometry and statistics relationships
Geometry and Statistics Relationships Click to expand
Heron's formula calculates the area of a triangle given its side lengths, useful in geometry problems Click to expand
Geometry and Trigonometry
Geometry and Trigonometry Click to expand
Pythagorean theorem relates the sides of a right-angled triangle. It's fundamental in geometry. Click to expand
Slope-intercept form represents a linear equation with slope m and y-intercept b. Click to expand
Sine function gives the ratio of the opposite side to the hypotenuse in a right triangle. Click to expand
Cosine function gives the ratio of the adjacent side to the hypotenuse in a right triangle. Click to expand
Tangent function is the ratio of sine to cosine functions. Click to expand
Cosecant function is the reciprocal of the sine function. Click to expand
Secant function is the reciprocal of the cosine function. Click to expand
Cotangent function is the reciprocal of the tangent function. Click to expand
Law of sines relates the ratios of sides and angles in any triangle. Click to expand
Law of cosines generalizes Pythagorean theorem for any triangle. Click to expand
Calculus and Analysis
Calculus and Analysis Click to expand
Euler's formula connects complex exponentials with trigonometric functions. It's fundamental in complex analysis. Click to expand
Geometry and Measurement
Geometry and Measurement Click to expand
Area of circle relates the area to the square of its radius. It's fundamental in geometry. Click to expand
Circumference of circle relates the circumference to its radius. It's fundamental in geometry. Click to expand
Volume of sphere relates the volume to the cube of its radius. It's fundamental in geometry. Click to expand
Surface area of sphere relates the surface area to the square of its radius. It's fundamental in geometry. Click to expand
Distance Metrics
Distance Metrics Click to expand
Euclidean Distance is the straight-line distance between two points in Euclidean space. Click to expand
Distance Metrics and Similarity Measures
Distance Metrics and Similarity Measures Click to expand
Minkowski Distance generalizes Euclidean and Manhattan distances. It measures the distance between two points in a vector space. Click to expand
Cosine Similarity measures the cosine of the angle between two vectors, indicating their similarity or dissimilarity. Click to expand
Jaccard Similarity measures the similarity between two sets by dividing the size of their intersection by the size of their union. Click to expand
Dot Product calculates the sum of the products of corresponding entries in two vectors. Click to expand
Cross Product calculates a vector perpendicular to two input vectors in 3D space. Click to expand
Angle Between Vectors calculates the angle between two vectors using their dot product and magnitudes. Click to expand
Orthogonal Vectors have a dot product of zero, indicating they are perpendicular to each other. Click to expand
Parallel Vectors have the same direction and can be expressed as a scalar multiple of each other. Click to expand
Cosine Similarity Click to expand
Euclidean Distance measures the straight line distance between two points in a vector space. Click to expand
Manhattan Distance measures the sum of the absolute differences between corresponding entries of two vectors. Click to expand
Minkowski Distance Click to expand
Hamming Distance counts the number of positions at which two strings of equal length are different. Click to expand
Jaccard Similarity Click to expand
Dice Similarity measures the similarity between two sets by dividing twice the size of their intersection by the sum of their sizes. Click to expand
Tanimoto Similarity measures the similarity between two vectors by dividing their dot product by the sum of their squared magnitudes minus their dot product. Click to expand
Wasserstein Distance measures the optimal transport cost between two probability distributions. Click to expand
Sinkhorn Distance is a regularized version of the Wasserstein distance that is computationally more efficient. Click to expand

Related

Mice Make Art!
··2 mins
The diffusion of discovery
·5 mins
Modeling stock prices with stochastic processes
·1 min