When training a model with nlp, ValueError: Found input variables with inconsistent numbers of samples: [1, 1692]?

A

Artyom Rekalov2021-08-14 15:18:05

Python

Artyom Rekalov, 2021-08-14 15:18:05

I am writing a news classifier by topic. When training the model, the following error appears ValueError: Found input variables with inconsistent numbers of samples: [1, 1692] .
Here is the source code:

from sklearn.datasets import fetch_20newsgroups
from pandas import DataFrame
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score

categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
twenty_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)

df = DataFrame(twenty_train['data'], columns=['text'])
df['target'] = twenty_train['target']

X_train, X_test, y_train, y_test = train_test_split(df.drop('target', axis=1), df['target'])

text_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', SGDClassifier())])
model = text_clf.fit(X_train, y_train)

import numpy as np

twenty_test = fetch_20newsgroups(subset='test',
                                 categories=categories, shuffle=True, random_state=42)
docs_test = twenty_test.data
predicted = text_clf.predict(docs_test)
print(np.mean(predicted == twenty_test.target))

pred_y = model.predict(X_test)
print('accuracy - ', accuracy_score(y_test, pred_y))

I will be grateful for help

Reply

Answer the question

In order to leave comments, you need to log in

0 answer(s)