Answer the question
In order to leave comments, you need to log in
How to convert html mnemonics and ASCII sequences to text in Java?
Hello dear habracitizens.
I am making an app for Android. The essence of the application is to parse a certain site and enter certain data into its database.
Faced a problem, this contains html mnemonics (example: &\lt;) and ASCII sequences (example: &\#039;)
The question is how to convert this data into plain text?
Googling didn't get me anywhere, or maybe I googled it wrong!?
Thanks a lot in advance!!!
UPD: I put a slash after the & (ampersant) because the browser interprets them!
Answer the question
In order to leave comments, you need to log in
parser fool - http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml(java.io.Writer, java.lang.String)
Wouldn't it be easier to simply display the data in a web browser component?
If you still want to solve the problem, I recommend that you look at how the html_entity_decode function works in phpjs. The key point is the use of a special table for numerous mnemonics. Those that are given by a hexadecimal number can be simply converted by extracting the character code with a regular expression and substituting the character with the corresponding code in its place.
tyts - commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringEscapeUtils.html#unescapeHtml(java.io.Writer , java.lang.String)
UPD: I put a slash after the & (ampersant) because the browser interprets them!
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question