I
I
ivodopyanov2016-12-14 10:31:58
Machine learning
ivodopyanov, 2016-12-14 10:31:58

How can you use the same parameters for different Pipeline steps in scikit-learn with grid search?

I solve the problem of text classification using convolutional networks.
The pipeline consists of two steps:
1) the MyPreprocessor preprocessor, which breaks the text into words, defines a dictionary and replaces the words in the text with ordinal indices in the dictionary
2) the MyClassifier classifier, which actually trains the network.
However, these two steps share a common set of parameters (the size of the max_features dictionary and the maximum allowable length of the max_len phrase). What should be done to make them change synchronously?
Conditional code:

clf = Pipeline([('vect', MyPreprocessor()), 
                ('clf', MyClassifier())])}
params = {'vect__max_features': [5000, 10000], 
               'vect__max_len': [64, 96, 128],
               'clf__max_features': [5000, 10000],
               'clf__max_len': [64, 96, 128]}
gs_clf = GridSearchCV(clf, params, n_jobs=-1)
gs_clf = gs_clf.fit(X_train, Y_train)

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question