How to properly vectorize data for neural network training?

N

newPsevdonim2021-11-16 11:03:51

Machine learning

newPsevdonim, 2021-11-16 11:03:51

I am new to this topic, in connection with this I had a question how to correctly vectorize a dataset column containing words. (This is not a categorical feature and the one hot encoding method will not work). I vertorized it using a bag of words, but I'm not sure if I did it right and whether it will be trained correctly with such data. There are also columns with categorical features, I have already applied the one hot encoding method to them. Please point out my mistakes and suggest how they can be corrected.

An example of rows from a column that I vectorized with a bag of words:
img price png
css font awesome min css

The code I used for this:

my_df = pd.read_csv('DICT_FOR_LEARN.csv', header= 0, sep=';')
vectorizer = CountVectorizer()
X1 = vectorizer.fit_transform(my_df['url_path']).toarray()
X2= pd.get_dummies(my_df['country'], sparse=True)
X3 = pd.get_dummies(my_df['continent'], sparse=True)
X4 = pd.get_dummies(my_df['timezone'], sparse=True)
X5 = pd.get_dummies(my_df['method'], sparse=True)
X6 = pd.get_dummies(my_df['http'], sparse=True)
X7 = pd.get_dummies(my_df['exit_system'], sparse=True)
X8 = pd.get_dummies(my_df['os'], sparse=True)
X9 = pd.get_dummies(my_df['browser'], sparse=True)
X10 = pd.get_dummies(my_df['device'], sparse=True)

x_train = X1, X2, X3, X4, X5, X6, X7, X8, X9, X10