Description
Describe the bug
When using sklearn's GridSearchCV
with SequentialFeatureSelector
, the configured hyperparameter values are not properly propagated to the actual classifier that is used for fitting and predicting. I put together a MWE below that is based on example 8 in the docs, the only major change is the custom classifier.
In the output listed in the docs you can see that the score doesn't change with the k
parameter of the KNN, which is very strange.
While searching for similar issues I found that this has already been mentioned in multiple other issues, e.g. #456 and #511. Below you can see the unexpected behavior in the suggested approach.
Steps/Code to Reproduce
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
import mlxtend
import sklearn.base
import numpy as np
class DebugClassifier(sklearn.base.BaseEstimator):
def __init__(self, max_depth=10):
self.max_depth = max_depth
def fit(self, X, y, groups=None):
print("Fitting with max_depth =", self.max_depth)
def predict(self, X, **kwargs):
print("Predicting with max_depth =", self.max_depth)
return np.zeros(len(X))
def set_params(self, **kwargs):
print("Setting params:", kwargs)
super().set_params(**kwargs)
print("max_depth after setparams:", self.max_depth)
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=123)
clf = DebugClassifier(max_depth=10)
sfs1 = SFS(estimator=clf,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
cv=5)
pipe = Pipeline([('sfs', sfs1),
('clf', clf)])
param_grid = [
{#'sfs__k_features': [1, 4],
'sfs__estimator__max_depth': [1, 5]}
]
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=False)
# run gridearch
gs = gs.fit(X_train, y_train)
Expected Results
Setting params: {'max_depth': 1}
max_depth after setparams: 1
Fitting with max_depth = 1
Predicting with max_depth = 1
Fitting with max_depth = 1
Predicting with max_depth = 1
...
Actual Results
Setting params: {'max_depth': 1}
max_depth after setparams: 1
Fitting with max_depth = 10
Predicting with max_depth = 10
Fitting with max_depth = 10
Predicting with max_depth = 10
...
As you can see, the value 1
for the hyperparameter max_depth
is correctly configured for some classifier, however while fitting and predicting it appears that a different classifier is used, where the default value of max_depth=10
is still set.
Versions
MLxtend 0.18.0
Linux-5.8.0-48-generic-x86_64-with-glibc2.29
Python 3.8.5 (default, Jan 27 2021, 15:41:15)
[GCC 9.3.0]
Scikit-learn 0.24.1
NumPy 1.20.1
SciPy 1.6.1