N
N
Nikita Sklyuev2012-06-28 10:37:00
Java
Nikita Sklyuev, 2012-06-28 10:37:00

How to convert html mnemonics and ASCII sequences to text in Java?

Hello dear habracitizens.
I am making an app for Android. The essence of the application is to parse a certain site and enter certain data into its database.
Faced a problem, this contains html mnemonics (example: &\lt;) and ASCII sequences (example: &\#039;)
The question is how to convert this data into plain text?
Googling didn't get me anywhere, or maybe I googled it wrong!?
Thanks a lot in advance!!!
UPD: I put a slash after the & (ampersant) because the browser interprets them!

Answer the question

In order to leave comments, you need to log in

5 answer(s)
A
abarmot, 2012-06-28
@trilodi

parser fool - http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml(java.io.Writer, java.lang.String)

A
Alexander, 2012-06-28
@xel

Wouldn't it be easier to simply display the data in a web browser component?
If you still want to solve the problem, I recommend that you look at how the html_entity_decode function works in phpjs. The key point is the use of a special table for numerous mnemonics. Those that are given by a hexadecimal number can be simply converted by extracting the character code with a regular expression and substituting the character with the corresponding code in its place.

A
abarmot, 2012-06-28
@abarmot

tyts - commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml(java.io.Writer , java.lang.String)

A
Andrey Kuntsevich, 2012-06-28
@titulusdesiderio

UPD: I put a slash after the & (ampersant) because the browser interprets them!

instead of slashes (;
&lt; = <

V
vasart, 2012-06-30
@vasart

To solve this problem in Android there is a class android.text.Html
Html.fromHtml("&\lt;").toString();

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question