M
M
Maxim Vyaznikov2020-05-27 03:34:12
PHP
Maxim Vyaznikov, 2020-05-27 03:34:12

PHP how to filter out unwanted UTF-8 characters in a text string?

The essence of the problem: there is an arbitrary input text variable, where various obscenities can fall: emoticons, special characters, and so on, because of which further output in the browser may result in a mess.

Googled various solutions, tried all available, for example from here https://stackoverflow.com/questions/1176904/php-ho...

Options such as Or
preg_replace( '/[^[:print:]]/'

filter_var($input, FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_LOW|FILTER_FLAG_STRIP_HIGH);

They work, they remove all the nonsense from the line, but including the Cyrillic alphabet. Only the characters of the English layout remain.

Please tell me some suitable filtering option that removes all non-printable characters, but does not affect various keyboard layouts.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
Northern Lights, 2020-05-27
@php666

further output in the browser may result in porridge
what does porridge mean?
In what encoding is the data in the database? What is the encoding of the server response?
There is an opinion that you just need to unify the project encoding to utf8mb4 and not deal with nonsense.

G
galaxy, 2020-05-27
@galaxy

Unicode is hard. It is safest to leave ascii and the desired alphabet (Russian, for example), ranges should work ([a-z] or other options ).
If you need any alphabets, filter by properties .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question