M
M
maewyn2015-09-03 16:41:55
Machine learning
maewyn, 2015-09-03 16:41:55

Plsa training sample?

Hello. Studying the literature, I come across different definitions for the concept of "training sample". In probabilistic-thematic models, there are topics to which, with some probability, terms and documents from the main collection of documents need to be attributed. The topics themselves, as far as I understand, are a certain number of the same documents arranged according to their meaning. Further, among all the documents of the same topic, the most frequently used words are identified, and what is not necessary can be cut off. The question is, is the set of topics a training sample for the same plsa and lda? Or should we understand something else by the training sample? Well, along the way, I’ll ask you to throw links to Russian document corpora)) Thank you

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
maewyn, 2015-09-03
@maewyn

That is, if translated into my case, then x1...xn are the terms in the document, xi - then with what probability they relate to topics (which are from 1 to m). And Yi are already known probabilities for the ratios of topics and terms? Roughly speaking, we should set the model parameters so that xi is as close as possible to yi, then already on a large collection of documents, where yi is unknown, the result will be more or less normal?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question