Answer the question
In order to leave comments, you need to log in
When training a model with nlp, ValueError: Found input variables with inconsistent numbers of samples: [1, 1692]?
I am writing a news classifier by topic. When training the model, the following error appears ValueError: Found input variables with inconsistent numbers of samples: [1, 1692] .
Here is the source code:
from sklearn.datasets import fetch_20newsgroups
from pandas import DataFrame
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
df = DataFrame(twenty_train['data'], columns=['text'])
df['target'] = twenty_train['target']
X_train, X_test, y_train, y_test = train_test_split(df.drop('target', axis=1), df['target'])
text_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', SGDClassifier())])
model = text_clf.fit(X_train, y_train)
import numpy as np
twenty_test = fetch_20newsgroups(subset='test',
categories=categories, shuffle=True, random_state=42)
docs_test = twenty_test.data
predicted = text_clf.predict(docs_test)
print(np.mean(predicted == twenty_test.target))
pred_y = model.predict(X_test)
print('accuracy - ', accuracy_score(y_test, pred_y))
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question