Skip to content

Custom generator for models in exhaustive feature selector #833

Open
@jonathan-taylor

Description

@jonathan-taylor

Describe the workflow you want to enable

I'd like to make it easier to do best subsets with categorical features -- for simplicity let's start by assuming an additive model so for each feature there are a set of columns in the design matrix associated with that feature. When all are continuous features
each feature is associated to a single column, otherwise there is a feature grouping that can be described as a sequence of length X.shape[1] assigning columns to a particular feature. More generally, this sequence assigning columns to features could also include interactions of both continuous and categorical variables.

Describe your proposed solution

It is (at least in some corners) common practice to include all columns associated to a categorical feature or none. This would be able to be encoded in the candidates list. If interactions were permitted then some conventions only include an interaction if both main effects are also included. While the logic of which candidates to generate may be user-specific, it would seem if we could supply a custom iterator for candidates then most of the code should not need to be modified. Instead of custom_names each particular candidate may have its own identifier, so one could specify
whether the iterator produces simply indices or (indices, identifier) pairs.

This would remove the need for the min_features/max_features argument as this would be encoded into the
iterator itself. So perhaps a helper functions to produce at least a few common iterators for candidates could be included.
Specifically one which produce the default "all continuous" iterator, and one which could easily handle an additive model
with possibly some categorical variables.

Describe alternatives you've considered, if relevant

I've considered simply wrapping R functions like regsubsets that easily handles the categorical variables. I would
prefer an sklearn aware version that could do this as well.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions