Skip to content

Commit 2cfc1a1

Browse files
committed
Add decision tree notes
Signed-off-by: Adil Aliyev <adilaliev@gmail.com>
1 parent 21e6ab5 commit 2cfc1a1

File tree

3 files changed

+116
-0
lines changed

3 files changed

+116
-0
lines changed

2-k_nearest_neighbors/1-K-Nearest Neighbours.ipynb

+11
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,17 @@
4545
"$$D(x_1, x_2) = \\sqrt{\\sum_{i=0}^{n}\\mid x_1i-x_2i \\mid^2} $$"
4646
]
4747
},
48+
{
49+
"cell_type": "markdown",
50+
"metadata": {},
51+
"source": [
52+
"## Evaluation\n",
53+
"\n",
54+
"To evaluate the accuracy of the \"k-nearest neighbours\" model the Jaccard index can be used.\n",
55+
"\n",
56+
"$$ J(A,B) = \\frac{\\mid A \\cap B \\mid}{\\mid A \\cup B \\mid} = \\frac{\\mid A \\cap B \\mid}{\\mid A \\mid + \\mid B \\mid - \\mid A \\cap B \\mid} $$"
57+
]
58+
},
4859
{
4960
"cell_type": "code",
5061
"execution_count": null,
+57
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Decision Trees"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"The idea of building Decision Tree is to divide the data into groups with lower entropy. \n",
15+
"\n",
16+
"**Entropy** is a measure of randomness or uncertainity. The lower the Entropy, the less uniform the distribution, the purer the node.\n",
17+
"\n",
18+
"Entropy:\n",
19+
"$$E = -p(A) \\times log(p(A)) - p(B) \\times log(p(B))$$\n",
20+
"\n",
21+
"**Information gain** is the information that can increase the level of certainity after splitting.\n",
22+
"\n",
23+
"Information Gain:\n",
24+
"\n",
25+
"$$IG = E_{before} - WE_{after} = E_{before} - (W_1\\times E_{node1} + W_2\\times E_{node2} )$$"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {},
32+
"outputs": [],
33+
"source": []
34+
}
35+
],
36+
"metadata": {
37+
"kernelspec": {
38+
"display_name": "Python 3",
39+
"language": "python",
40+
"name": "python3"
41+
},
42+
"language_info": {
43+
"codemirror_mode": {
44+
"name": "ipython",
45+
"version": 3
46+
},
47+
"file_extension": ".py",
48+
"mimetype": "text/x-python",
49+
"name": "python",
50+
"nbconvert_exporter": "python",
51+
"pygments_lexer": "ipython3",
52+
"version": "3.6.5"
53+
}
54+
},
55+
"nbformat": 4,
56+
"nbformat_minor": 2
57+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Logistic Regression\n",
8+
"\n",
9+
"**Logistic regression** is a classification algorithm for categorical variables. It can be used for both binary classification and multi-class classification.\n",
10+
"\n",
11+
"In case of binary classification we can use the following notation:\n",
12+
"\n",
13+
"$$ X \\in \\mathbb{R}^{m \\times n} $$\n",
14+
"$$ y \\in {0,1}$$\n",
15+
"$$\\hat{y} = P(y=1|x)$$\n",
16+
"$$P(y=0|x) = 1-\\hat{y}$$"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": null,
22+
"metadata": {},
23+
"outputs": [],
24+
"source": []
25+
}
26+
],
27+
"metadata": {
28+
"kernelspec": {
29+
"display_name": "Python 3",
30+
"language": "python",
31+
"name": "python3"
32+
},
33+
"language_info": {
34+
"codemirror_mode": {
35+
"name": "ipython",
36+
"version": 3
37+
},
38+
"file_extension": ".py",
39+
"mimetype": "text/x-python",
40+
"name": "python",
41+
"nbconvert_exporter": "python",
42+
"pygments_lexer": "ipython3",
43+
"version": "3.6.5"
44+
}
45+
},
46+
"nbformat": 4,
47+
"nbformat_minor": 2
48+
}

0 commit comments

Comments
 (0)