|
105 | 105 | "cell_type": "markdown",
|
106 | 106 | "metadata": {},
|
107 | 107 | "source": [
|
108 |
| - "We do just minimal preprocessing: convert obviously contiuous *Age* and *Fare* variables to floats,\n", |
| 108 | + "We do just minimal preprocessing: convert obviously continuous *Age* and *Fare* variables to floats,\n", |
109 | 109 | "and *SibSp*, *Parch* to integers. Missing *Age* values are removed."
|
110 | 110 | ]
|
111 | 111 | },
|
|
170 | 170 | "cell_type": "markdown",
|
171 | 171 | "metadata": {},
|
172 | 172 | "source": [
|
173 |
| - "There is one tricky bit about the code above: one may be templed to just pass ``dense=True`` to ``DictVectorizer``: after all, in this case the matrixes are small. But this is not a great solution, because we will loose the ability to distinguish features that are missing and features that have zero value.\n", |
| 173 | + "There is one tricky bit about the code above: one may be tempted to just pass ``dense=True`` to ``DictVectorizer``: after all, in this case the matrixes are small. But this is not a great solution, because we will lose the ability to distinguish features that are missing and features that have zero value.\n", |
174 | 174 | "\n",
|
175 | 175 | "\n",
|
176 | 176 | "## 3. Explaining weights\n",
|
177 | 177 | "\n",
|
178 |
| - "In order to calculate a prediction, XGBoost sums predictions of all its trees.\n", |
| 178 | + "To calculate a prediction, XGBoost sums predictions of all its trees.\n", |
179 | 179 | "The number of trees is controlled by ``n_estimators`` argument and is 100 by default.\n",
|
180 |
| - "Each tree is not a great predictor on it's own, but by summing across all trees,\n", |
| 180 | + "Each tree is not a great predictor on its own, but by summing across all trees,\n", |
181 | 181 | "XGBoost is able to provide a robust estimate in many cases. Here is one of the trees:"
|
182 | 182 | ]
|
183 | 183 | },
|
|
1151 | 1151 | "source": [
|
1152 | 1152 | "## 5. Adding text features\n",
|
1153 | 1153 | "\n",
|
1154 |
| - "Right now we treat *Name* field as categorical, like other text features.\n", |
1155 |
| - "But in this dataset each name is unique, so XGBoost does not use this feature at all, because it's\n", |
| 1154 | + "Now we treat *Name* field as categorical, like other text features,\n", |
| 1155 | + "but in this dataset, each name is unique, so XGBoost does not use this feature at all, because it's\n", |
1156 | 1156 | "such a poor discriminator: it's absent from the weights table in section 3.\n",
|
1157 | 1157 | "\n",
|
1158 | 1158 | "But *Name* still might contain some useful information. We don't want to guess how to best pre-process it\n",
|
|
0 commit comments