Add decision tree notes

adilek · adilek · commit 2cfc1a144b14 · 2019-01-16T23:07:01.000+09:00
Signed-off-by: Adil Aliyev &lt;adilaliev@gmail.com&gt;
diff --git a/2-k_nearest_neighbors/1-K-Nearest Neighbours.ipynb b/2-k_nearest_neighbors/1-K-Nearest Neighbours.ipynb
@@ -45,6 +45,17 @@
     "$$D(x_1, x_2) = \\sqrt{\\sum_{i=0}^{n}\\mid x_1i-x_2i  \\mid^2} $$"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Evaluation\n",
+    "\n",
+    "To evaluate the accuracy of the \"k-nearest neighbours\" model the Jaccard index can be used.\n",
+    "\n",
+    "$$ J(A,B) = \\frac{\\mid A \\cap B \\mid}{\\mid A \\cup B \\mid} = \\frac{\\mid A \\cap B \\mid}{\\mid A \\mid + \\mid B \\mid - \\mid A \\cap B \\mid} $$"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
diff --git a/3-decision_trees/1-Decision Trees.ipynb b/3-decision_trees/1-Decision Trees.ipynb
@@ -0,0 +1,57 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Decision Trees"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The idea of building Decision Tree is to divide the data into groups with lower entropy. \n",
+    "\n",
+    "**Entropy** is a measure of randomness or uncertainity. The lower the Entropy, the less uniform the distribution, the purer the node.\n",
+    "\n",
+    "Entropy:\n",
+    "$$E = -p(A) \\times log(p(A)) - p(B) \\times log(p(B))$$\n",
+    "\n",
+    "**Information gain** is the information that can increase the level of certainity after splitting.\n",
+    "\n",
+    "Information Gain:\n",
+    "\n",
+    "$$IG = E_{before} - WE_{after} = E_{before} - (W_1\\times E_{node1} + W_2\\times E_{node2} )$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/4-logistic_regression/1-Logistic Regression.ipynb b/4-logistic_regression/1-Logistic Regression.ipynb
@@ -0,0 +1,48 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Logistic Regression\n",
+    "\n",
+    "**Logistic regression** is a classification algorithm for categorical variables. It can be used for both binary classification and multi-class classification.\n",
+    "\n",
+    "In case of binary classification we can use the following notation:\n",
+    "\n",
+    "$$ X \\in \\mathbb{R}^{m \\times n} $$\n",
+    "$$ y \\in {0,1}$$\n",
+    "$$\\hat{y} = P(y=1|x)$$\n",
+    "$$P(y=0|x) = 1-\\hat{y}$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}