## Survey Papers / Repos

• Top 10 algorithms in data mining. [ICDM'06]

## Resources

### Supervised

• Linear Regression
$y=ax+b\\ L(y,\hat{y}) = (y-\hat{y})^2$
• Logistic Regression
$y=\frac{1}{1+e^{-(ax+b)}} \\ L(y,\hat{y}) = -\hat{y}\log y - (1 - \hat{y}) \log (1-y)$
• Naive Bayes
$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$
• Support Vector Machine (SVM)
• Training process: Lagrange -> Dual Problem -> SMO
$\min \frac{1}{2} ||w||^2 \\ \text{s.t.}~y^{(i)}(w^{T}x^{(i)}+b) \geq 1, i=1,...,m$
• K Nearest Neighbor (kNN)
• Expectation-Maximization (EM)
• Linear Discrimant Analysis (LDA)
• Decision Tree
• Random Forest

### Unsupervised

• Clustering
• K-means
• Mean-shift
• DBSCAN
• Principal Component Analysis (PCA)
• Latent Dirichlet allocation (LDA) Topic Modeling

## Others

### Ensemble

• K-Fold Cross Validation
• Bagging
• Boosting

### Metrics

Text
True Samples
False Samples
Predict True
True Positive
False Positive [Type I Error]
Predict False
False Negative [Type II Error]
True Negative
• Precision and Recall
• $\text{Precision} = \frac{\text{TP}}{\text{TP} +\text{FP}}$
• $\text{Recall} = \frac{\text{TP}}{\text{TP}+\text{FN}}$
• F1 Score
• $\text{F1 score} = 2 \cdot\frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} +\text{Recall}}$
• $\text{TPR} = \frac{\text{TP}}{\text{TP}+\text{FN}}$
• $\text{FPR} = \frac{\text{FP}}{\text{FP}+\text{TN}}$