Answer the question
In order to leave comments, you need to log in
Cleaning HTML code from Microsoft Word tags (2000-2007)?
Please tell me the method of cleaning HTML code from Word-a tags There is a
lot of code , do not offer it manually. (~10MB)
(the user didn't use the tag clear button in TinyMCE)
Need to process the file/database for further use
tried: tidy, Word2003 web-filtered and a couple of other tools, but didn't give the expected result
Answer the question
In order to leave comments, you need to log in
At one time, the HTML Cleanup function worked fine - MS Word Cleanup in Dreamweaver MX, from Macromedia, to these Adobe Creative Suite. It was able to determine the Word version itself (97, 2000, XP / 2003) and it worked very cool ...
keep the macro for MSO 2003 (in 2007-2010 did not check for performance)
old, but working.
www.businesssite.ru/content.php?id=5
this www.artlebedev.ru/tools/technogrette/etc/reformator/ did not look? The Reformer always helped me with a bang with such things.
For individual files, I used this service before:
www.weare.ru/cgi-bin/clearhtml.cgi
here is the service, it works fine www.sh14.ru/utils/avtomaticheskaya-ochistka-html-k...
principle of operation - removes all garbage from the Word and all prohibited attributes (not tags, therefore it works correctly with html 5)
See how cleaning works in tinymce, rewrite it in PHP and process all the data.
Here is another service that cleans almost all tag attributes
www.dataved.ru/2013/08/ms-word-document-filter.html
There is a good solution for Django, which we have developed and actively use on our sites so that clients do not “dirty” the site.
strange that no one thought of it, but try regular expressions
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question