Android parsing HTML?

andreevich2011-02-05 14:36:33

Android

andreevich, 2011-02-05 14:36:33

Good afternoon, I analyzed this example habrahabr.ru/blogs/android/91815/ , a piece of code where the user profile is parsed to find the avatar. Everything is implemented there through working with a string - one substring ().
The question arose: Is it possible to run through the DOM of the resulting document, as, for example, with jQuery, based on classes and element IDs?
Or, maybe, tell me a more humane method of obtaining data from the page.
Thank you!

Answer the question

In order to leave comments, you need to log in

6 answer(s)

Khurshed Nurmatov, 2011-02-06
@Hoorsh

Tell me, what is the task? After all, you can use regular expressions.
For example, here is how I pulled values from an html page:

    public String GetTemper(String urlsite) // фукция загрузки температуры
    {
        String matchtemper = "";
        try
        {
                // загрузка страницы
            URL url = new URL(urlsite);
            URLConnection conn = url.openConnection();
            InputStreamReader rd = new InputStreamReader(conn.getInputStream());
            StringBuilder allpage = new StringBuilder();
            int n = 0;
            char[] buffer = new char[40000];
            while (n >= 0)
            {
                n = rd.read(buffer, 0, buffer.length);
                if (n > 0)
                {
                    allpage.append(buffer, 0, n);                    
                }
            }
            // работаем с регулярками
            final Pattern pattern = Pattern.compile
            ("<span style=\"color:#[a-zA-Z0-9]+\">[^-+0]+([-+0-9]+)[^<]+</span>[^(а-яА-ЯёЁa-zA-Z0-9)]+([а-яА-ЯёЁa-zA-Z ]+)");
            Matcher matcher = pattern.matcher(allpage.toString());
            if (matcher.find())
            {    
                matchtemper = matcher.group(1);            
            }        
            return matchtemper;
        }
        catch (Exception e)
        {
            
        }
        return matchtemper; 
    };

eforce, 2011-02-05
@eforce

The topic is interesting, I googled a little and realized that parsing is done using Xml parsing, although the approach is controversial, because html is not always valid Xml, as far as libraries for working with Dom are concerned, today, as I understand it, there is nothing of the kind.
Links that might be helpful:
Android HTML Dom (link in answer)
Android parsing HTML entities using DOM parser for RSS feed

nocach, 2011-02-05
@nocach

For your purposes, you can use, for example, htmlcleaner . A simple library for house-parsing.
There is also Html Parser , which is quite cumbersome but supports CSS selectors.
In terms of speed, of course, it would be best to use the SAX xml parser .

leviathan, 2011-02-05
@leviathan

In my application, for this purpose, I use a bundle of TagSoup (generator of valid XHTML from almost any HTML) and SAX Parser. Works well.

leviathan, 2011-02-05
@leviathan

I also forgot - since version 2.2 (it seems) XPath support has finally been added to Android, in my mind this is just a tool for this purpose. But again, you need TagSoup or some other tool to get valid xhtml.

deep_orange, 2015-05-21
@deep_orange

The most humane Jsoup. Simple and fast. Only here on Android 4.4 for some reason it slows down unrealistically (while on 2.2 the same code flies). In general, check on virtual machines.