Data structure for finding suitable CSS rules

V

Vitaly2013-08-05 16:53:25

css

Vitaly, 2013-08-05 16:53:25

As a home project, I started to slowly make a page renderer in python .
But the question is not about language.

It is necessary to organize the correct data structure for easy search of suitable rules.

Example:

css:

a { display: block; font-size: 12pt; }
div p.links a { color: green; display: inline; }

Suppose I parse this fragment and get the necessary structure.

Now it should be possible to do this:

cssTable.get_styles( 'html > div#content > h3 > a' );

This will return: { display: block; font-size: 12pt }

If I request a path like this:

cssTable.get_styles( 'html > div#content > p.links > a' );

This should return the combined rule: { display: inline; color: green; font-size: 12pt;}

Any idea how to organize this structure properly?

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

P

Pavel Tyslyatsky, 2013-08-06
@tbicr

If you take into account that you are not attached to the DOM, then we can assume that all possible css rules are an infinite set (of course, you can make it finite with restrictions on the number of elements or the length of the rule line). Now any css or css group or one css rule can be represented as a subset of all css rules. As I understand it, you want to make another subset of the css group with less or equal number of rules (otherwise the whole point is lost). Those you want to find for any two rules, at best one, at worst two rules. Then:
a {...} a {...}can be converted to a {...}
a {...} div a {...}can be converted to a {...}only if the rules ahave higher precedence, for example!important, otherwise it cannot be done because these rules define different possible subsets.
I am inclined to think that it is almost impossible to reduce the number of rules, since they describe different subsets. When rules describe one subset, or a more general subset has a higher priority, then some rules can be dropped.
Now search. In order to determine the elements that the rule satisfies, it is necessary to walk through the entire tree. This is quite an interesting point, because some css rules can explicitly describe elements that are inside another rule, for example:
.base .childand .base .child .nodeit is known that all .base .child .nodeelements will be inside .base .child.
Thus, if you represent such rules as a tree, you can reduce the cost of finding elements in an already found base element.
Search Variations. I see two main options:
1. take a rule, iterate through the DOM, for each element, calculate whether the element's DOM rule is a subset of a certain rule and apply it, go to the next rule.
2. take the DOM and start iterating over it, for the element, calculating whether the element's DOM rule is a subset of each rule defined, and if so, apply it, go to the next element.
The second option fits very well with the tweak tree proposal and generally seems more interesting due to the fact that one DOM traversal is required.

M

m-haritonov, 2013-08-05
@m-haritonov

I don't quite understand what the question is. If you parse CSS, you end up with a CSS object data structure for tables, their rules, and selectors (which you can interact with programmatically). And in the implementation of the cssTable.get_styles function, you will need to write code that searches for the corresponding CSS style sheet rules for the passed CSS selector (for example, by matching the passed CSS selector to each of the CSS style sheet selectors).
Those. when a function like your cssTable.get_styles is used to find HTML elements, it (guided by the CSS selector passed to it and the CSS selector syntax rules) finds the desired HTML element from the entire tree (e.g. matching each HTML element in turn against the CSS selector passed in). In your case, the CSS style sheet will act as an HTML document, and when searching, you will match the CSS selector passed to the function to each of the CSS selectors in your style sheet.

D

Daedmen, 2013-08-05
@Daedmen

See how browsers store the DOM

S

Sergey, 2013-08-05
@seriyPS

Writing your own renderer, and even in python, is hardcore of course. I wouldn't dare.
It seems to me that in order to store CSS rules, you still need to directly impose rules sequentially on the DOM tree without using intermediate stores. Those. building a DOM tree. We read the rules sequentially from your CSS:

a { display: block; font-size: 12pt; }
div p.links a { color: green; display: inline; }

and sequentially impose rules on the selector (overwriting conflicting rules, taking into account !important, etc.). It turns out that when modifying the tree, all the rules will have to be run again.
You can try to dig into the browser code here: github.com/WebKit/webkit/tree/master/Source/WebCore/css , but there are no comments in the code, it is unlikely that you will be able to understand something =)