

# Connecting to Google Analytics 4
<a name="connecting-to-googleanalytics"></a>

 Google Analytics 4 is an analytics service that tracks and reports metrics about visitor interactions with your apps and websites. These metrics include page views, active users, and events. If you are a Google Analytics 4 user, you can connect AWS Glue to your Google Analytics 4 account. You can use Google Analytics 4 as a data source in your ETL jobs. Run these jobs to transfer data from Google Analytics 4 to AWS services or other supported applications. 

**Topics**
+ [AWS Glue support for Google Analytics 4](googleanalytics-support.md)
+ [Policies containing the API operations for creating and using connections](googleanalytics-configuring-iam-permissions.md)
+ [Configuring Google Analytics 4](googleanalytics-configuring.md)
+ [Configuring Google Analytics 4 connections](googleanalytics-configuring-connections.md)
+ [Reading from Google Analytics 4 entities](googleanalytics-reading-from-entities.md)
+ [Google Analytics 4 connection options](googleanalytics-connection-options.md)
+ [Creating a Google Analytics 4 account](googleanalytics-create-account.md)
+ [Steps to create a client app and OAuth 2.0 credentials](googleanalytics-client-app-oauth-credentials.md)
+ [Limitations and considerations](googleanalytics-connector-limitations.md)

# AWS Glue support for Google Analytics 4
<a name="googleanalytics-support"></a>

AWS Glue supports Google Analytics 4 as follows:

**Supported as a source?**  
Yes. You can use AWS Glue ETL jobs to query data from Google Analytics 4.

**Supported as a target?**  
No.

**Supported Google Analytics 4 API versions**  
 v1 Beta. 

# Policies containing the API operations for creating and using connections
<a name="googleanalytics-configuring-iam-permissions"></a>

 The following sample policy describes the required AWS permissions for creating and using connections. If you are creating a new role, create a policy that contains the following: 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:ListConnectionTypes",
        "glue:DescribeConnectionType",
        "glue:RefreshOAuth2Tokens",
        "glue:ListEntities",
        "glue:DescribeEntity"
      ],
      "Resource": "*"
    }
  ]
}
```

------

You can also use the following managed IAM policies to allow access:
+  [ AWSGlueServiceRole ](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole) – Grants access to resources that various AWS Glue processes require to run on your behalf. These resources include AWS Glue, Amazon S3, IAM, CloudWatch Logs, and Amazon EC2. If you follow the naming convention for resources specified in this policy, AWS Glue processes have the required permissions. This policy is typically attached to roles specified when defining crawlers, jobs, and development endpoints. 
+  [ AWSGlueConsoleFullAccess ](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) – Grants full access to AWS Glue resources when an identity that the policy is attached to uses the AWS Management Console. If you follow the naming convention for resources specified in this policy, users have full console capabilities. This policy is typically attached to users of the AWS Glue console. 

# Configuring Google Analytics 4
<a name="googleanalytics-configuring"></a>

Before you can use AWS Glue to transfer from Google Analytics 4, you must meet these requirements:

## Minimum requirements
<a name="googleanalytics-configuring-min-requirements"></a>
+  You have a Google Analytics account with one or more data streams that collect the data that you want to transfer. 
+  You have a Google Cloud Platform account and a Google Cloud project. 
+  In your Google Cloud project, you've enabled the following APIs: 
  +  Google Analytics API 
  +  Google Analytics Admin API 
  +  Google Analytics Data API 
+  In your Google Cloud project, you've configured an OAuth consent screen for external users. For information about the OAuth consent screen, see [Setting up your OAuth consent screen](https://support.google.com/cloud/answer/10311615#) in the Google Cloud Platform Console Help. 
+  In your Google Cloud project, you've configured an OAuth 2.0 client ID. For more information, see [Setting up OAuth 2.0 ](https://support.google.com/cloud/answer/6158849?hl=en#zippy=). 

 If you meet these requirements, you’re ready to connect AWS Glue to your Google Analytics 4 account. 

# Configuring Google Analytics 4 connections
<a name="googleanalytics-configuring-connections"></a>

To configure a Google Sheet connection:

1.  In AWS Secrets Manager, create a secret with the following details. It is required to create a secret for each connection in AWS Glue. 

   1.  For AuthorizationCode grant type: 
      +  For customer managed connected app – Secret should contain the connected app Consumer Secret with `USER_MANAGED_CLIENT_APPLICATION_CLIENT_SECRET` as key. 

1. In AWS Glue Glue Studio, create a connection under **Data Connections** by following the steps below: 

   1. When selecting a **Connection type**, select Google Analytics 4.

   1. Provide the `INSTANCE_URL` of the Google Analytics 4 you want to connect to.

   1.  Select the IAM role which AWS Glue can assume and has permissions for following actions: 

------
#### [ JSON ]

****  

      ```
      {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
          {
            "Effect": "Allow",
            "Action": [
              "secretsmanager:DescribeSecret",
              "secretsmanager:GetSecretValue",
              "secretsmanager:PutSecretValue",
              "ec2:CreateNetworkInterface",
              "ec2:DescribeNetworkInterfaces",
              "ec2:DeleteNetworkInterface"
            ],
            "Resource": "*"
          }
        ]
      }
      ```

------

   1.  Select the `secretName` which you want to use for this connection in AWS Glue to put the tokens. 

   1.  Select the network options if you want to use your network. 

1.  Grant the IAM role associated with your AWS Glue job permission to read `secretName`. 

 `AUTHORIZATION_CODE` grant type. 

 This grant type is considered “three-legged” OAuth as it relies on redirecting users to the third party authorization server to authenticate the user. It is used when creating connections via the AWS Glue Console. The AWS Glue Console will redirect the user to Google Analytics 4 where the user must login and allow AWS Glue the requested permissions to access their Google Analytics 4 instance. 

 Users may still opt to create their own connected app in Google Analytics 4 and provide their own client id and client secret when creating connections through the AWS Glue Console. In this scenario, they will still be redirected to Google Analytics 4 to login and authorize AWS Glue to access their resources. 

 This grant type results in a refresh token and access token. The access token is short lived, and may be refreshed automatically without user interaction using the refresh token. 

 For more information, see [ Using Auth 2.0 to Access Google APIs ](https://developers.google.com/identity/protocols/oauth2). 

# Reading from Google Analytics 4 entities
<a name="googleanalytics-reading-from-entities"></a>

 **Prerequisites** 
+  A Google Analytics 4 object you would like to read from. Refer the supported entities table below to check the available entities. 

 **Supported entities** 


| Entity | Can be Filtered | Supports Limit | Supports Order By | Supports Select \$1 | Supports Partitioning | 
| --- | --- | --- | --- | --- | --- | 
| Real-Time Report | Yes | Yes | Yes | Yes | No | 
| Core Report | Yes | Yes | Yes | Yes | Yes | 

 **Example** 

```
googleAnalytics4_read = glueContext.create_dynamic_frame.from_options(
    connection_type="GoogleAnalytics4",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "API_VERSION": "v1beta"
    }
```

 **Google Analytics 4 entity and field details** 


| Entity | Field | Data Type | Supported Operators | 
| --- | --- | --- | --- | 
| Core Report | Dynamic Fields |  |  | 
| Core Report | Dimension Fields | String | LIKE, = | 
| Core Report | Dimension Fields | Date | LIKE, = | 
| Core Report | Metric Fields | String | >, <, >=, <=, = BETWEEN | 
| Core Report | Custom Dimension and Custom Metric Fields | String | NA | 
| Real-Time Report | appVersion | String | LIKE, = | 
| Real-Time Report | audienceId | String | LIKE, = | 
| Real-Time Report | audienceName | String | LIKE, = | 
| Real-Time Report | city | String | LIKE, = | 
| Real-Time Report | cityId | String | LIKE, = | 
| Real-Time Report | country | String | LIKE, = | 
| Real-Time Report | countryId | String | LIKE, = | 
| Real-Time Report | deviceCategory | String | LIKE, = | 
| Real-Time Report | eventName | String | LIKE, = | 
| Real-Time Report | minutesAgo | String | LIKE, = | 
| Real-Time Report | platform | String | LIKE, = | 
| Real-Time Report | streamId | String | LIKE, = | 
| Real-Time Report | streamName | String | LIKE, = | 
| Real-Time Report | unifiedScreenName | String | LIKE, = | 
| Real-Time Report | activeUsers | String | >, <, >=, <=, = BETWEEN | 
| Real-Time Report | conversions | String | >, <, >=, <=, = BETWEEN | 
| Real-Time Report | eventCount | String | >, <, >=, <=, = BETWEEN | 
| Real-Time Report | screenPageViews | String | >, <, >=, <=, = BETWEEN | 

 **Partitioning queries** 

1.  **Filter-based partition** 

    Additional spark options `PARTITION_FIELD`, `LOWER_BOUND`, `UPPER_BOUND`, `NUM_PARTITIONS` can be provided if you want to utilize concurrency in Spark. With these parameters, the original query would be split into `NUM_PARTITIONS` number of sub-queries that can be executed by spark tasks concurrently. 
   +  `PARTITION_FIELD`: the name of the field to be used to partition query. 
   +  `LOWER_BOUND`: an inclusive lower bound value of the chosen partition field. 

      For date, we accept the Spark date format used in Spark SQL queries. Example of valid values: `"2024-02-06"`. 
   +  `UPPER_BOUND`: an exclusive upper bound value of the chosen partition field. 
   +  `NUM_PARTITIONS`: number of partitions. 

    **Example** 

   ```
   googleAnalytics4_read = glueContext.create_dynamic_frame.from_options(
       connection_type="GoogleAnalytics4",
       connection_options={
           "connectionName": "connectionName",
           "ENTITY_NAME": "entityName",
           "API_VERSION": "v1beta",
           "PARTITION_FIELD": "date"
           "LOWER_BOUND": "2022-01-01"
           "UPPER_BOUND": "2024-01-02"
           "NUM_PARTITIONS": "10"
       }
   ```

1.  **Record-based partition** 

    Additional spark options `NUM_PARTITIONS` can be provided if you want to utilize concurrency in Spark. With these parameters, the original query would be split into `NUM_PARTITIONS` number of sub-queries that can be executed by spark tasks concurrently. 
   +  `NUM_PARTITIONS`: number of partitions. 

    **Example** 

   ```
   googleAnalytics4_read = glueContext.create_dynamic_frame.from_options(
       connection_type="GoogleAnalytics4",
       connection_options={
           "connectionName": "connectionName",
           "ENTITY_NAME": "entityName",
           "API_VERSION": "v1beta",
           "NUM_PARTITIONS": "10"
       }
   ```

# Google Analytics 4 connection options
<a name="googleanalytics-connection-options"></a>

The following are connection options for Google Analytics 4:
+  `ENTITY_NAME`(String) - (Required) Used for Read. The name of your Object in Google Analytics 4. 
+  `API_VERSION`(String) - (Required) Used for Read. Google Analytics 4 Rest API version you want to use. 
+  `SELECTED_FIELDS`(List<String>) - Default: empty(SELECT \$1). Used for Read. Columns you want to select for the object. 
+  `FILTER_PREDICATE`(String) - Default: empty. Used for Read. It should be in the Spark SQL format. 
+  `QUERY`(String) - Default: empty. Used for Read. Full Spark SQL query. 
+  `PARTITION_FIELD`(String) - Used for Read. Field to be used to partition query. 
+  `LOWER_BOUND`(String)- Used for Read. An inclusive lower bound value of the chosen partition field. 
+  `UPPER_BOUND`(String) - Used for Read. An exclusive upper bound value of the chosen partition field. 
+  `NUM_PARTITIONS`(Integer) - Default: 1. Used for Read. Number of partitions for read. 
+  `INSTANCE_URL`(Integer) - Used for Read. (Optional) 

# Creating a Google Analytics 4 account
<a name="googleanalytics-create-account"></a>

 Follow the steps to create a Google Analytics 4 account: [ https://support.google.com/analytics/answer/9304153?hl=en ](https://support.google.com/analytics/answer/9304153?hl=en) 

# Steps to create a client app and OAuth 2.0 credentials
<a name="googleanalytics-client-app-oauth-credentials"></a>

 For more information, see [ Google Analytics4 API documentation ](https://developers.google.com/analytics/devguides/reporting/data/v1). 

1.  Create and set up your account by logging in to your [ Google Analytics Account ](https://analytics.google.com/) with your credentials. Then navigate to **Admin** > **Create Account**. 

1.  Create property for the account you have created by choosing **Create Property**. Set up the property with required details. Once all the details provided corresponding property id will be generated. 

1.  Add Data Stream for the created property by choosing **Data Streams** > **Add Stream** > **Web** from the drop-down. Provide the website details such as URL and other required fields. After providing all details, the corresponding **stream id **and **measurement id** will be generated. 

1.  Set up Google Analytics in your website by copying the measurement id and add to your website's configuration. 

1.  Create Report from Google Analytics by navigating to **Reports** and generating the required report. 

1.  Authorize your app by navigating to [ console.cloud.google.com ]( https://console.cloud.google.com) and search for Google Analytics Data API, then enable the API. 

   1.  Navigate to the API and Services page and choose **Credentials** > **setup OAuth 2.0 Client IDs**. 

   1.  Provide redirect URL by adding the AWS Glue Redirect URL. 

1.  Copy the client id and client secret which will require further for creating connection. 

# Limitations and considerations
<a name="googleanalytics-connector-limitations"></a>

The following are limitations for the Google Analytics 4 connector:
+  For the Core Report entity, only 9 dimension fields and 10 metric fields are allowed to send in a request. If the allowed number of fields is exceeded then request will fail and connector will throw an error message. 
+  For the Realtime Report entity, only 4 dimension fields are allowed to send in a request. If the allowed number of fields is exceeded then request will fail and connector will throw an error message. 
+  Google Analytics 4 is a beta version free tool, so there will be regular update on new feature, entities enhancement, adding new fields and deprecating existing fields. 
+  Core Report fields are populated dynamically, so there will be addition, depreciation and renaming of fields and imposing new limits on fields can be done anytime. 
+  The default start date is 30 days and the end date is yesterday (one day before the current date), and these dates will get overridden in filter expression code if user has set the value OR if the flow is incremental. 
+  As per the documentation, Real-Time report entity returns 10,000 records if limit is not pass in the request, otherwise the API returns a maximum of 250,000 rows per request, no matter how many you ask for. For more information see [ Method: properties.runRealtimeReport ](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/properties/runRealtimeReport) in the Google Analytics documentation. 
+  Real-Time Report entity does not support Record Based Partition as it does not support pagination. Also, it does not support Field Based Partition as none of the fields fulfill the criteria defined. 
+  Due to the limitation on number of fields that can be passed in a request. We are setting default dimension and metric fields within the designated limits. If "select all" is chosen, only the data from those predetermined fields will be retrieved. 
  +  Core Report 
    +  As per limitation from SAAS - requests are allowed up to 9 dimensions and up to 10 metrics only (that is, a request can contain a maximum of 19 fields(metrics \$1 dimension). 
    +  As per the implementation - If user utilizes SELECT\$1ALL or selected fields more than 25, then default fields will be pass in the request. 
    +  The following fields are considered as default fields for Core Report - "country", "city", "eventName", "cityId", "browser", "date", "currencyCode", "deviceCategory", "transactionId", active1DayUsers", "active28DayUsers", "active7DayUsers", "activeUsers", "averagePurchaseRevenue", "averageRevenuePerUser", "averageSessionDuration", "engagedSessions", "eventCount", "engagementRate". 
  +  Real-Time Report 
    +  As per limitation from SAAS requests are allowed up to 4 dimensions. 
    +  If user pass SELECT\$1ALL or selected fields more than 15, then default fields will be pass in the request. 
    +  The following fields are considered as default fields for RealTime Report - "country", "deviceCategory", "city", "cityId", "activeUsers", "conversions", "eventCount", "screenPageViews". 
+  In Core-Report entity, if partition on date field and filter on startDate is present simultaneously. In that case dateRange value gets overridden with the startDate filter value, But, since partition must always be the priority, hence discarding startDate filter if partition on date field is already present. 
+  As now cohortSpecs is also a part of core-report request body we enhanced the current core-report entity to include support for the cohortSpec attribute. In cohortSpecs request body, nearly all fields require user input. To address this, we have set default values for those attributes/fields and provided provision for user to override these values if needed.     
<a name="google-analytics-connector-limitations-table"></a>[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/glue/latest/dg/googleanalytics-connector-limitations.html)
+  You can also pass all these filters together at once or with other filters. 
  +  Example 1 - filterPredicate: startDate between "2023-05-09" and "2023-05-10" AND startOffset=1 AND endOffset=2 AND granularity="WEEKLY" 
  +  Example 2 - filterPredicate: city=“xyz” AND startOffset=1 AND endOffset=2 AND granularity="WEEKLY" 
+  In cohort request: 
  +  If ‘cohortNthMonth’ is passed in the request, then internally granularity value will be set as “MONTHLY” 
  +  Similarly, if ‘cohortNthWeek’ is passed, then granularity value will be set as “WEEKLY” 
  +  And, for ‘cohortNthDay’ granularity value will be set as “DAILY”. For more information, see: 
    +  [ https://developers.google.com/analytics/devguides/reporting/data/v1/advanced ](https://developers.google.com/analytics/devguides/reporting/data/v1/advanced) 
    +  [ https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/CohortSpec ](https://developers.google.com/analytics/devguides/reporting/data/v1/rest/v1beta/CohortSpec) 
  +  Provision is given for the user to override dateRange and granularity default value. Refer to the above table. 