S
S
Sergey Toy2011-01-30 08:54:20
PHP
Sergey Toy, 2011-01-30 08:54:20

HTML Purifier: remove tabs, line breaks and spaces between blocks?

Anyone using HTML Purifier ?
Tell me how to remove all the trash that remains after applying AutoFormat.AutoParagraph .
For example, to have such input data:

Параграф 1
вторая строка первого параграфа

Параграф 2

Параграф 3

He processed it like this:
<p>Параграф 1<br>вторая строка первого параграфа</p><p>Параграф 2</p><p>Параграф 3</p>

Now the situation looks like this:
<p>Параграф 1
вторая строка первого параграфа</p>

<p>Параграф 2</p>

<p>Параграф 3</p>

Purifier configuration:
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.DefinitionID', 'ace');
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
$config->set('Cache.DefinitionImpl', null);
$config->set("HTML.AllowedElements",array("p","ul","ol","li","h4","h5","h6","img","a","b","i","s","u","blockquote","sup","sub","pre","br"));
$config->set("HTML.AllowedAttributes",array("img.src","img.alt","img.title","a.href","a.title"));
$config->set('AutoFormat.AutoParagraph', true);
$config->set('AutoFormat.RemoveEmpty.RemoveNbsp', true);
$config->set('AutoFormat.RemoveEmpty', true);
$config->set('Core.EscapeInvalidTags', true);

Answer the question

In order to leave comments, you need to log in

2 answer(s)
S
Sergey Toy, 2011-01-30
@Toy

On the official forum . They confirmed that there are no built-in tools for solving this problem, you need to use regular expressions :-(

C
clamaw, 2011-01-30
@clamaw

Generally speaking, the line "the second line of the first paragraph" is already the second paragraph. Because a line feed (aka carriage return,
, \n, etc.) is already the end of a paragraph. A double line feed is used to emulate indentation between paragraphs in plain text documents. Accordingly, in html such an indent is entered through css, and not
other garbage.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question