S
S
Sergey Sokolov2018-10-06 09:59:22
Machine learning
Sergey Sokolov, 2018-10-06 09:59:22

How to identify the individual features of the texts of one author?

In order of general interest.
There is an array of texts by one author and there is a large volume of texts by his contemporaries. I would like to identify the most characteristic differences between the texts of this author from the rest, to highlight the individuality of his style. To generate text "in his style". The use of certain words in def. context. The predominance of some forms of the word over others. The length of sentences. The relative number of dialogues. And deeper into semantics.
Or an easier task. There is MNIST and there are samples of writing characters by one person. It is necessary to identify the most noticeable differences in the handwriting of a person from the rest in the sample. To reliably answer the question “did this person or someone else write it?”.
Having strengthened these differences, generate, as if he wrote a certain text - in a cartoonish, comical version, when the differences and features are deliberately exaggerated.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
ivodopyanov, 2018-10-08
@sergiks

There seems to be a lot of research work here. I would try to dig in the direction of conditional gan.

D
dollar, 2018-10-06
@dollar

Reliably - no way.
The most reliable way is to use distinguishing features (if there are any, of course). For example, if a person puts a space before a comma, and not after - this is a distinguishing feature, because very few people do this. Several of these features together make up a unique print . Certainly not entirely unique. But, for example, one in a million will suit you? That is, there will always be a chance that there will be someone else with the same fingerprint, the only question is the probability of this event. If it is small, then it can be neglected. But in reality there is no 100% guarantee, and cannot be. Well, if there are few signs, then even neglecting it will not work.
That is, it is incorrect to answer the question “Wrote this person or someone else?” If you do not also voice the probability of an error in the answer.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question