P
P
prox2010-11-18 00:44:31
HTML
prox, 2010-11-18 00:44:31

Cleaning HTML code from Microsoft Word tags (2000-2007)?

Please tell me the method of cleaning HTML code from Word-a tags There is a
lot of code , do not offer it manually. (~10MB)
(the user didn't use the tag clear button in TinyMCE)
Need to process the file/database for further use
tried: tidy, Word2003 web-filtered and a couple of other tools, but didn't give the expected result

Answer the question

In order to leave comments, you need to log in

11 answer(s)
I
ipswitch, 2010-11-18
@prox

At one time, the HTML Cleanup function worked fine - MS Word Cleanup in Dreamweaver MX, from Macromedia, to these Adobe Creative Suite. It was able to determine the Word version itself (97, 2000, XP / 2003) and it worked very cool ...

V
Vitaly, 2010-11-18
@vitstr

keep the macro for MSO 2003 (in 2007-2010 did not check for performance)
old, but working.
www.businesssite.ru/content.php?id=5

K
Kirill Sirenko, 2010-11-20
@Chieftec

this www.artlebedev.ru/tools/technogrette/etc/reformator/ did not look? The Reformer always helped me with a bang with such things.

4
4NATIC, 2010-11-18
@4NATIC

For individual files, I used this service before:
www.weare.ru/cgi-bin/clearhtml.cgi

R
Rodion Gashé, 2010-11-18
@zorba_buddha

jevix.ru/

A
Alexey, 2014-01-10
@Sh14

here is the service, it works fine www.sh14.ru/utils/avtomaticheskaya-ochistka-html-k...
principle of operation - removes all garbage from the Word and all prohibited attributes (not tags, therefore it works correctly with html 5)

M
Magir, 2010-11-18
@Magir

See how cleaning works in tinymce, rewrite it in PHP and process all the data.

M
maashaa, 2014-05-14
@maashaa

Here is another service that cleans almost all tag attributes
www.dataved.ru/2013/08/ms-word-document-filter.html

E
elky, 2010-11-19
@elky

There is a good solution for Django, which we have developed and actively use on our sites so that clients do not “dirty” the site.

S
Stanislav Agarkov, 2010-11-19
@stas_agarkov

strange that no one thought of it, but try regular expressions

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question