D
D
dymanoid2014-01-18 02:30:47
Automation
dymanoid, 2014-01-18 02:30:47

What are the principles for implementing an algorithm for automatic navigation through the GUI of an application?

There is an application with many different views (views or windows) and menus, as well as their full textual (XML) description - lines, icons and areas of their location on the screen. Using OCR, we analyze the current picture and get a textual description of currently visible elements and their positions.
The task is as follows: we give a certain goal as an input (menu such and such, item such and such), and our "navigator" must understand where we are at the moment and calculate the path to the goal - which menus and submenus in which windows should be opened to get there.
Advise an approximate approach and where and what to read?
Is it true that you need to create a lot of some descriptive objects and compare the "current screen" with them to find the current position? What is the best way to do this search?
The search problem is exacerbated by the fact that the text returned by OCR is not always 100% correct.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
rPman, 2014-01-18
@dymanoid

For each state, you need to define a set of its attributes. It will not necessarily be text, perhaps a simple set of dots and their colors, or even an average color value on an area (this is especially important if the interface is three-dimensional with a changing look, for example, a burning fire or a rotating world as a menu background, though in the general case the task can then be and not resolved).
Also, each state can contain a list of previous states (from which it can be reached) and the next (to which it can be moved from) in this case, the analysis can be carried out not by all signs, but by these lists.
Be sure to check at each step whether the state has changed or not. Make the most complete coverage of error checks and their correct processing so that the autoclicker does not think that it clicks on opening the file, but in reality it clicks on confirming its deletion.
Avoid infinite loops, for example, if the transition from one state to another has not been made, repeat a limited number of times, then stop.

@
@ntkt, 2014-01-18
_

1) According to the algorithm, a colleague has already outlined above: the main idea is to build a finite state machine (FSM), they are actively used in automatic software testing:
scholar.google.com/scholar?q=fsm+gui+automated+sof. ..
2) Why do you analyze the picture , use OCR? If possible, it would be more powerful to launch an integration module in the same system with the target application, which will directly get both the text and the hierarchy of menus and other objects from the graphical environment (regardless of the platform - in any graphical environment there is an API for iterating over objects, there is and ready-made scripted solutions). It will be more difficult to restore a tree of objects directly based on OCR results and an XML description of unrelated objects.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question