E
E
egens2013-01-28 16:24:02
Documentation
egens, 2013-01-28 16:24:02

Document classification software?

Statement of the problem - there is a file washer with documents of different categories. It is assumed that the content of the document can determine its category. Documents can be in various text formats, mainly Microsoft Office.
Is there software to automatically classify documents into given categories?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
A
astrobeglec, 2013-01-28
@astrobeglec

I can only remember the command to determine the contents of a file in Linux (the extension is not taken into account) - file.

[email protected]$ file nnn.doc 
file nnn.doc: CDF V2 Document, Little Endian, Os: Windows, Version 1.0, Code page: -535, Revision Number: 9, Total Editing Time: 02:46:00, Last Printed: Sun Sep  2 03:44:00 2012, Create Time/Date: Thu Aug 30 09:26:00 2012, Last Saved Time/Date: Sun Sep  2 03:45:00 2012
[email protected]$ file nnn.odt 
nnn.odt: Zip archive data, at least v1.0 to extract
[email protected]$ file nnn..xls 
nnn..xls: CDF V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1251, Last Saved By: system, Last Printed: Thu Mar 29 10:00:04 2012, Create Time/Date: Thu Jan  1 02:59:59 1970, Last Saved Time/Date: Wed Feb 29 07:57:42 2012, Security: 0

I don't know about Windows.
Is it necessary to determine the file type or can the category be recognized only by the content? If the content can also give instructions.

I
iPirat, 2013-01-28
@iPirat

Do you need something like "automator" like on a poppy, current under Windows? Today I searched, found an interesting free solution on java app.jbbres.com/actions/

N
Nikolai Turnaviotov, 2013-01-30
@foxmuldercp

If under Windows, then in 2012 the server made a chic classifier, depending on which access stigmas are already superimposed.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question