L
L
Lici2013-07-27 20:20:01
CMS
Lici, 2013-07-27 20:20:01

Drupal7: auto-replace quotes with normal ones in text?

On Habrahabr, if I write such quotes:
"кавычки"
they are automatically replaced in the text with the following:
"quotes"
I want to do the same on a site running Drupal7 for aesthetic reasons. But keep in mind that you cannot replace quotes with others in the HTML output code, for example, images.
In theory, this can be somehow implemented by adding a new filter to this text display format, which will change all quotes to normal ones, but I did not manage to do it myself. Please tell me the best way to do this. Thanks in advance.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
demimurych, 2013-07-28
@demimurych

I don’t know how in Drupal 7,
but in the fifth there was such a thing as filters. With which it was possible to expand both existing ones and write your own.
Read the documentation drupal.org/node/213156

M
m-haritonov, 2013-07-27
@m-haritonov

Speaking irrespective of Drupal, it is worth using the built-in DOMDocument class to process HTML in PHP. It is worth it, at least, because through it (through the loadHTML method) you can skip HTML code in order to correct HTML formatting errors in it (unclosed tags, etc.). Regarding your task, with the help of it you can access exactly the text nodes of the HTML tree.
An example of parsing (without the correct algorithm for replacing quotes):

<?php
$html = '"текст в кавычках" текст "ещё текст в кавычках" <a href="http://ya.ru"><em>неправильный порядок вложенности элементов</a></em> текст <div>незакрытый элемент';

$domDocument = loadHtml($html);
$xpath = new DOMXpath($domDocument);

// Выбирает только текстовые узлы
foreach ($xpath->query('/html/body//text()') as $textNode)
{
  // Тут можно производить замены кавычек, учитывая текст из предыдущих $textNode
  $textNode->data = str_replace('"', '«', $textNode->data);
}

print htmlspecialchars(saveHtml($domDocument));

/*
Выведет:
«текст в кавычках« текст «ещё текст в кавычках« <a href="http://ya.ru"><em>неправильный порядок вложенности элементов</em></a> текст <div>незакрытый элемент</div>
*/

The functions used in the above code (contain various fixes for the corresponding methods of the DOMDocument class):
<?php
/**
 * @return DOMDocument
 */
function loadHtml($html, $charset = 'utf-8')
{
  $domDocument = new DOMDocument();

  // Т.к. функция DOMDocument::loadHTML конвертирует сущности "&nbsp;" в обычные пробелы, 
  // приходится экранировать эти сущности. Засипи вида "&amp;nbsp;" экранируем для того, 
  // чтобы при обратном преобразовании они по ошибке не превратились в "&nbsp;"
  $html = str_replace('&amp;nbsp;', '&amp;amp;nbsp;', $html);
  $html = str_replace('&nbsp;', '&amp;nbsp;', $html);

  // Удаляем символы "\r", т.к. DOMDocument::loadHTML() преобразует их в "&#13;"
  $html = str_replace("\r", '', $html);

  $html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
    <html>
      <head>
        <meta http-equiv="content-type" content="text/html;charset=' . $charset . '"/>
        <title></title>
      </head>
        <body>' . $html . '</body>
    </html>
  ';
  
  // Функция DOMDocument::loadHTML может генерировать сообщения об ошибках, которые нам
  // не нужны (например, о незакрытом теге), т.к. мы используем данную функцию для 
  // коррекции HTML кода. Оператор @ ошибки у данной функции не подавляет.
  $useErrorsOld = libxml_use_internal_errors(true);
  $domDocument->loadHTML($html);
  libxml_use_internal_errors($useErrorsOld);

  return $domDocument;
}

/**
 * @return string
 */
function saveHtml(DOMDocument $domDocument)
{
  // Экспортируем в формате XHTML
  $html = $domDocument->saveXml();

  $html = str_replace('&amp;nbsp;', '&nbsp;', $html);
  $html = str_replace('&amp;amp;nbsp;', '&amp;nbsp;', $html);

  $html = preg_replace('/^\s*\<\?xml\s*[^\>]*\>\s*/is', '', $html);

  // Удаляем <![CDATA[]]>, которым оборачивается содержимое тега <script></script> при 
  // экспорте через DOMDocument::saveXml
  $html = preg_replace('/(\<script(\s*[^\>]*)?\>)\<\!\[CDATA\[/is', '$1', $html);
  $html = preg_replace('/\]\]\>(\<\/script\>)/is', '$1', $html);

  return preg_replace('/^.*?<body>(.*?)<\/body>\s*<\/html>$/is', '$1', $html);
}

PS: on habré, HTML entities in the <source>...</source> tag are not escaped to display the code unchanged (some kind of tin), I did the escaping manually, so please note that the code may become incorrect if habr programmers will fix this bug without proper replacement in existing messages.

M
Max, 2013-09-26
Akhmadinurov @movemind

Your task is completely solved by the Typogrify
module. It also knows how to replace a hyphen with a dash and much more, I highly recommend it.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question