BedrockAgentCore / Client / invoke_browser

invoke_browser

BedrockAgentCore.Client.invoke_browser(**kwargs)

Invokes an operating system-level action on a browser session in Amazon Bedrock AgentCore. This operation provides direct OS-level control over browser sessions, enabling mouse actions, keyboard input, and screenshots that the WebSocket-based Chrome DevTools Protocol (CDP) cannot handle — such as interacting with print dialogs, context menus, and JavaScript alerts.

You send a request with exactly one action in the BrowserAction union, and receive a corresponding result in the BrowserActionResult union.

The following operations are related to InvokeBrowser:

See also: AWS API Documentation

Request Syntax

response = client.invoke_browser(
    browserIdentifier='string',
    sessionId='string',
    action={
        'mouseClick': {
            'x': 123,
            'y': 123,
            'button': 'LEFT'|'RIGHT'|'MIDDLE',
            'clickCount': 123
        },
        'mouseMove': {
            'x': 123,
            'y': 123
        },
        'mouseDrag': {
            'endX': 123,
            'endY': 123,
            'startX': 123,
            'startY': 123,
            'button': 'LEFT'|'RIGHT'|'MIDDLE'
        },
        'mouseScroll': {
            'x': 123,
            'y': 123,
            'deltaX': 123,
            'deltaY': 123
        },
        'keyType': {
            'text': 'string'
        },
        'keyPress': {
            'key': 'string',
            'presses': 123
        },
        'keyShortcut': {
            'keys': [
                'string',
            ]
        },
        'screenshot': {
            'format': 'PNG'
        }
    }
)
Parameters:
  • browserIdentifier (string) –

    [REQUIRED]

    The unique identifier of the browser associated with the session. This must match the identifier used when creating the session with StartBrowserSession.

  • sessionId (string) –

    [REQUIRED]

    The unique identifier of the browser session on which to perform the action. This must be an active session created with StartBrowserSession.

  • action (dict) –

    [REQUIRED]

    The browser action to perform. Exactly one member of the BrowserAction union must be set per request.

    Note

    This is a Tagged Union structure. Only one of the following top level keys can be set: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, screenshot.

    • mouseClick (dict) –

      Click at the specified coordinates.

      • x (integer) – [REQUIRED]

        The X coordinate on screen where the click occurs.

      • y (integer) – [REQUIRED]

        The Y coordinate on screen where the click occurs.

      • button (string) –

        The mouse button to use. Defaults to LEFT.

      • clickCount (integer) –

        The number of clicks to perform. Valid range: 1–10. Defaults to 1.

    • mouseMove (dict) –

      Move the cursor to the specified coordinates.

      • x (integer) – [REQUIRED]

        The target X coordinate on screen.

      • y (integer) – [REQUIRED]

        The target Y coordinate on screen.

    • mouseDrag (dict) –

      Drag from a start position to an end position.

      • endX (integer) – [REQUIRED]

        The ending X coordinate for the drag.

      • endY (integer) – [REQUIRED]

        The ending Y coordinate for the drag.

      • startX (integer) – [REQUIRED]

        The starting X coordinate for the drag.

      • startY (integer) – [REQUIRED]

        The starting Y coordinate for the drag.

      • button (string) –

        The mouse button to use for the drag. Defaults to LEFT.

    • mouseScroll (dict) –

      Scroll at the specified position.

      • x (integer) – [REQUIRED]

        The X coordinate on screen where the scroll occurs.

      • y (integer) – [REQUIRED]

        The Y coordinate on screen where the scroll occurs.

      • deltaX (integer) –

        The horizontal scroll delta. Valid range: -1000 to 1000.

      • deltaY (integer) –

        The vertical scroll delta. Valid range: -1000 to 1000. Negative values scroll down.

    • keyType (dict) –

      Type a string of text.

      • text (string) – [REQUIRED]

        The text string to type. Maximum length: 10,000 characters.

    • keyPress (dict) –

      Press a key one or more times.

      • key (string) – [REQUIRED]

        The key name to press (for example, enter, tab, escape).

      • presses (integer) –

        The number of times to press the key. Valid range: 1–100. Defaults to 1.

    • keyShortcut (dict) –

      Press a key combination.

      • keys (list) – [REQUIRED]

        The key combination to press (for example, ["ctrl", "s"]). Maximum 5 keys.

        • (string) –

    • screenshot (dict) –

      Capture a full-screen screenshot.

      • format (string) –

        The image format for the screenshot. Defaults to PNG.

Return type:

dict

Returns:

Response Syntax

{
    'result': {
        'mouseClick': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'mouseMove': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'mouseDrag': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'mouseScroll': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'keyType': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'keyPress': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'keyShortcut': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string'
        },
        'screenshot': {
            'status': 'SUCCESS'|'FAILED',
            'error': 'string',
            'data': b'bytes'
        }
    },
    'sessionId': 'string'
}

Response Structure

  • (dict) –

    Response for the InvokeBrowser operation.

    • result (dict) –

      The result of the browser action. The member set in the result corresponds to the action that was performed.

      Note

      This is a Tagged Union structure. Only one of the following top level keys will be set: mouseClick, mouseMove, mouseDrag, mouseScroll, keyType, keyPress, keyShortcut, screenshot. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

      'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
      
      • mouseClick (dict) –

        The result of a mouse click action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • mouseMove (dict) –

        The result of a mouse move action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • mouseDrag (dict) –

        The result of a mouse drag action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • mouseScroll (dict) –

        The result of a mouse scroll action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • keyType (dict) –

        The result of a key type action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • keyPress (dict) –

        The result of a key press action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • keyShortcut (dict) –

        The result of a key shortcut action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

      • screenshot (dict) –

        The result of a screenshot action.

        • status (string) –

          The status of the action execution.

        • error (string) –

          The error message. Present only when the action failed.

        • data (bytes) –

          The base64-encoded image data. Present only when the action succeeded.

    • sessionId (string) –

      The unique identifier of the browser session on which the action was performed.

Exceptions

  • BedrockAgentCore.Client.exceptions.ServiceQuotaExceededException

  • BedrockAgentCore.Client.exceptions.AccessDeniedException

  • BedrockAgentCore.Client.exceptions.ValidationException

  • BedrockAgentCore.Client.exceptions.ResourceNotFoundException

  • BedrockAgentCore.Client.exceptions.ThrottlingException

  • BedrockAgentCore.Client.exceptions.InternalServerException