Tap

The Tap command triggers a tap on a UI element detected on the screen. Drizz uses Vision AI to identify tappable elements by text, iconography, structure, and position within the layout.

The command supports multiple targeting patterns, ambiguity resolution, contextual hints

1. Basic Tap

1.1 Tap using exact text

Exact text tapping is the most reliable and recommended method for targeting UI elements. It is used when the element displays a clear, readable, and unique label that appears unambiguously on the screen. By specifying the exact visible text in double quotes, Drizz can accurately identify and interact with the intended component without contextual or positional references.

Example:

Tap on Login CTA

Tap on Continue Button

1.2 Tap on Non-Textual UI Elements (Icons / Images)

Non-textual UI elements such as icons, glyphs and images require intent-based referencing because they do not contain visible labels. Drizz identifies these components using visual pattern recognition, allowing users to tap them through natural descriptive phrases. This method is especially useful for common UI controls (cart, profile, close, back, search) that appear consistently across apps but lack textual identifiers. Providing clear references or accessibility-like hints ensures accurate targeting and minimizes ambiguity across different screen contexts.

1.2.1 Icon with known reference name

Used when the icon has a universally recognized visual meaning. These commands rely on common UI metaphors (cart, profile, hamburger menu, settings), allowing Drizz to reliably identify the element based on its standard appearance.

Example:

Tap the cart icon

Tap the profile icon

Tap the hamburger icon

Tap the settings icon

1.2.2 Icon with accessibility-like hint

Used when the icon does not have readable text but represents a clear, universally understood action (close, back, search). Drizz interprets these elements based on their visual semantics, ensuring reliable recognition even without labels.

Example:

Tap the close icon

Tap the back arrow

Tap the search icon

1.3 Tap with positional context

Used when multiple identical labels or icons appear on the screen and the action must target a specific instance. Drizz resolves these cases by combining text recognition with spatial qualifiers such as first, last, upper, lower, left or right to accurately identify the intended UI element.

This ensures deterministic selection even in dense or repeated UI layouts (lists, product grids, repeated CTAs, menus, etc.).

Example:

Tap on the first Add beside product name

Tap on the last Remove CTA

Tap the cart icon at top right

1.4 Tap using surrounding/container context

Used when the target element appears multiple times on screen but can be uniquely identified by a nearby label, section header, or the container it belongs to. Drizz pairs the actionable element (e.g., Add, Remove, Buy, Select) with its surrounding textual or structural context to execute the precise tap.

This method is ideal for list items, product cards, pricing plans, grouped UI layouts or any screen where the same action repeats across multiple containers.

Example:

Tap Add under Snacks header

Tap Remove beside Product Details

Tap Select next to Basic Plan option

Tap Buy on the first card

1.5 Tap using color-based context

Used when an element can be uniquely identified by a distinct color cue or visual state instead of text. Color-based identification helps in scenarios where UI components indicate status, priority, or selection through color (e.g., green “Active”, red “Error”, grey “Disabled”).

Drizz uses Vision-AI interpretation to match the target element based on dominant color patterns, gradients, or tinted indicators. This method is ideal for status badges, active tabs, selected filters, progress indicators, and contextual highlights.

Example:

  • Tap the green Active button

  • Tap the red Retry indicator

  • Tap the highlighted Selected tab

  • Tap the orange Offer badge

1.6 Tap using mathematical / relative logic (Computed selection)

Used when the target is not determined by text or position, but by a computed value, such as ratings, price, popularity, ranking, or quantitative attributes visible on screen. Drizz evaluates numerical context (e.g., 4.8 ★, ₹199, 50% OFF) and selects the element that satisfies the specified mathematical condition.

This method is ideal for product listings, pricing selection, rating-based sorting, promotional cards, and numerical UI elements.

Example:

  • Tap the highest-rated product

  • Tap the lowest-priced item

  • Tap the plan with the maximum discount

  • Tap the restaurant with the most reviews

  • Tap the item with rating above 4.5

Last updated

Was this helpful?