Embedding Keras?

N

nasdi2019-10-18 18:20:05

Python

nasdi, 2019-10-18 18:20:05

Embedding Keras?

I'm trying to connect keras and word2vec. After get_keras_embedding, I don’t understand what to submit networks for training. Words, vectors, word tokens, initial sentences don't work.

Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 4457 arrays:

I do not understand why such an array dimension is expected.
Total about 5500 sentences and vocab word2vec about 8000

text = []
for i in df['Message']:
    text.append(i.split())
model = Word2Vec(text, size=300, window=3, min_count=3, workers=16)
kmodel = Sequential()
kmodel.add(model.wv.get_keras_embedding(train_embeddings=False))
kmodel.add(Dropout(0.2))

kmodel.add(Conv1D(50,
                 3,
                 padding='valid',
                 activation='relu',
                 strides=1))
kmodel.add(GlobalMaxPooling1D())

kmodel.add(Dense(250))
kmodel.add(Dropout(0.2))
kmodel.add(Activation('relu'))

kmodel.add(Dense(1))
kmodel.add(Activation('sigmoid'))

kmodel.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
kmodel.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_test, y_test))

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

N

nasdi, 2019-10-20
@nasdi

Understood, I leave the decision.
1) Token = number of words, you can’t take it with a margin !!!

token = Tokenizer(7229)
token.fit_on_texts(df.Message)
text = token.texts_to_sequences(df.Message)

2) as standard for keras, we supplement sentences with 0 3) we create a bag of words according to the initial sentences and give word2vec

mes = []
for i in df['Message']:
    mes.append(i.split())
model = Word2Vec(mes, size=300, window=3, min_count=1, workers=16)

4) As the network data supplied, we give tokenized sentences padded with 0. Convert to np.array
5) Create an embedding layer from gensim using wv.get_keras_embedding.

kmodel = Sequential()
kmodel.add(model.wv.get_keras_embedding(train_embeddings=True))
kmodel.add(Dropout(0.2))

kmodel.add(Conv1D(50,
                 3,
                 padding='valid',
                 activation='relu',
                 strides=1))
kmodel.add(GlobalMaxPooling1D())

kmodel.add(Dense(250))
kmodel.add(Dropout(0.2))
kmodel.add(Activation('relu'))

kmodel.add(Dense(1))
kmodel.add(Activation('sigmoid'))

kmodel.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
kmodel.fit(x_train, y_train,
          batch_size=32,
          epochs=5,
          validation_data=(x_test, y_test)

train_embeddings=True

Significantly increases accuracy, as well as training time.