

# Authoring code with AWS Glue Studio notebooks
<a name="notebooks-chapter"></a>

 Data engineers can author AWS Glue jobs faster and more easily than before using the interactive notebook interface in AWS Glue Studio or interactive sessions in AWS Glue. 

## Limitations
<a name="notebooks-chapter-limitations"></a>
+  AWS Glue Studio notebooks do not support Scala. 

**Topics**
+ [Limitations](#notebooks-chapter-limitations)
+ [Overview of using notebooks](using-notebooks-overview.md)
+ [Creating an ETL job using notebooks in AWS Glue Studio](create-notebook-job.md)
+ [Notebook editor components](notebook-components.md)
+ [Saving your notebook and job script](save-notebook.md)
+ [Managing notebook sessions](manage-notebook-sessions.md)
+ [Using Amazon Q Developer with AWS Glue Studio notebooks](glue-studio-notebooks-amazon-q-developer.md)

# Overview of using notebooks
<a name="using-notebooks-overview"></a>

 AWS Glue Studio allows you to interactively author jobs in a notebook interface based on Jupyter Notebooks. Through notebooks in AWS Glue Studio, you can edit job scripts and view the output without having to run a full job, and you can edit data integration code and view the output without having to run a full job, and you can add markdown and save notebooks as .ipynb files and job scripts. You can start a notebook without installing software locally or managing servers. When you are satisfied with your code, AWS Glue Studio can convert your notebook to a Glue job with the click of a button. 

 Some benefits of using notebooks include: 
+  No cluster to provision or manage 
+  No idle clusters to pay for 
+  No up-front configuration required 
+  No installation of Jupyter notebooks required 
+  The same runtime/platform as AWS Glue ETL 

 When you start a notebook through AWS Glue Studio, all the configuration steps are done for you so that you can explore your data and start developing your job script after only a few seconds. AWS Glue Studio configures a Jupyter notebook with the AWS Glue Jupyter kernel. You don’t have to configure VPCs, network connections, or development endpoints to use this notebook. 

 To create jobs using the notebook interface: 
+  configure the necessary IAM permissions. 
+  start a notebook session to create a job 
+  write code in the cells in the notebook 
+  run and test the code to view the output 
+  save the job 

 After your notebook is saved, your notebook is a full AWS Glue job. You can manage all aspects of the job, such as scheduling jobs runs, setting job parameters, and viewing the job run history right along side your notebook. 

# Creating an ETL job using notebooks in AWS Glue Studio
<a name="create-notebook-job"></a>

**To start using notebooks in the AWS Glue Studio console**

1.  Attach AWS Identity and Access Management policies to the AWS Glue Studio user and create an IAM role for your ETL job and notebook. 

1.  Configure additional IAM security for notebooks, as described in [Granting permissions for the IAM role](notebook-getting-started.md#studio-notebook-permissions). 

1.  Open the AWS Glue Studio console at [https://console.aws.amazon.com/gluestudio/](https://console.aws.amazon.com/gluestudio/). 
**Note**  
Check that your browser does not block third-party cookies. Any browser that blocks third party cookies either by default or as a user-enabled setting will prevent notebooks from launching. For more information on managing cookies, see:
   + [Chrome](https://support.alertlogic.com/hc/en-us/articles/360018127132-Turn-Off-Block-Third-Party-Cookies-in-Chrome-for-Windows)
   + [Firefox](https://support.mozilla.org/en-US/kb/third-party-cookies-firefox-tracking-protection)
   + [Safari](https://support.apple.com/guide/safari/manage-cookies-sfri11471/mac)

1. Choose the **Jobs** link in the left-side navigation menu. 

1.  Choose **Jupyter notebook** and then choose **Create** to start a new notebook session. 

1.  On the **Create job in Jupyter notebook** page, provide the job name, and choose the IAM role to use. Choose **Create job**. 

    After a short time period, the notebook editor appears. 

1.  After you add the code you must execute the cell to initiate a session. There are multiple ways to execute the cell: 
   + Press the play button.
   +  Use a keyboard shortcut: 
     +  On MacOS, **Command** \$1 **Enter** to run the cell. 
     +  On Windows, **Shift** \$1 **Enter** to run the cell. 

    For information about writing code using a Jupyter notebook interface, see * [The Jupyter Notebook User Documentation ](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html) *. 

1.  To test your script, run the entire script, or individual cells. Any command output will be displayed in the area beneath the cell. 

1.  After you have finished developing your notebook, you can save the job and then run it. You can find the script in the **Script** tab. Any magics you added to the notebook will be stripped away and won't be saved as part of the script of the generated AWS Glue job. AWS Glue Studio will auto-add a `job.commit()` to the end of your generated script from the notebook contents.

   For more information about running jobs, see [Start a job run](managing-jobs-chapter.md#start-jobs). 

   

# Notebook editor components
<a name="notebook-components"></a>

 The notebook editor interface has the following main sections. 
+  Notebook interface (main panel) and toolbar 
+  Job editing tabs 

## The notebook editor
<a name="notebook-editor"></a>

 The AWS Glue Studio notebook editor is based on the Jupyter Notebook Application. The AWS Glue Studio notebook interface is similar to that provided by Juypter Notebooks, which is described in the section [ Notebook user interface ](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html?highlight=toolbar#notebook-user-interface). The notebook used by interactive sessions is a Jupyter Notebook. 

 Although the AWS Glue Studio notebook is similar to Juptyer Notebooks, it differs in a few key ways: 
+  currently, the AWS Glue Studio notebook cannot install extensions 
+  you cannot use multiple tabs; there is a 1:1 relationship between a job and a notebook 
+  the AWS Glue Studio notebook does not have the same top file menu that exists in Jupyter Notebooks 
+  currently, the AWS Glue Studio notebook only runs with the AWS Glue kernel. Note that you cannot update the kernel on your own. 

## AWS Glue Studio job editing tabs
<a name="notebook-job-tabs"></a>

 The tabs that you use to interact with the ETL job are at the top of the notebook page. They are similar to tabs that appear in the visual job editor of AWS Glue Studio, and they perform the same actions. 
+  **Notebook** – Use this tab to view the job script using the notebook interface. 
+  **Job details** – Configure the environment and properties for the job runs. 
+  **Runs** – View information about previous runs of this job. 
+  **Schedules** – Configure a schedule for running your job at specific times. 

# Saving your notebook and job script
<a name="save-notebook"></a>

 You can save your notebook and the job script you are creating at any time. Simply choose the **Save** button in the upper right corner, the same as if you were using the visual or script editor. 

 When you choose **Save**, the notebook file is saved in the default locations: 
+  By default, the job script is saved to the Amazon S3 location indicated in the **Job Details** tab, under **Advanced properties**, in the Job details property **Script path**. Job scripts are saved in a subfolder named `Scripts`. 
+  By default, the notebook file (`.ipynb`) is saved to the Amazon S3 location indicated in the **Job Details** tab, under **Advanced properties**, in the Job details **Script path**. Notebook files are saved in a subfolder named `Notebooks`. 

**Note**  
 When you save the job, the job script contains only the code cells from the notebook. The Markdown cells and magics aren't included in the job script. However, the `.ipynb` file will contain any markdown and magics. 

 After you save the job, you can then run the job using the script that you created in the notebook. 

# Managing notebook sessions
<a name="manage-notebook-sessions"></a>

 Notebooks in AWS Glue Studio are based on the interactive sessions feature of AWS Glue. There is a cost for using interactive sessions. To help manage your costs, you can monitor the sessions created for your account, and configure the default settings for all sessions. 

## Change the default timeout for all notebook sessions
<a name="change-default-timeout"></a>

 By default, the provisioned AWS Glue Studio notebook times out after 12 hours if the notebook was launched and no cells have been executed. There is no cost associated to it and the timeout is not configurable. 

 Once you execute a cell this will start an interactive session. This session has a default timeout of 48 hours. This timeout can be configured by passing an `%idle_timeout` magic before executing a cell. 

**To modify the default session timeout for notebooks in AWS Glue Studio**

1.  In the notebook, enter the `%idle_timeout` magic in a cell and specify the timeout value in minutes. 

1.  For example: `%idle_timeout 15` will change the default timeout to 15 minutes. If the session is not used in 15 minutes, the session is automatically stopped. 

## Installing additional Python modules
<a name="specify-default-modules"></a>

 If you would like to install additional modules to your session using pip you can do so by using `%additional_python_modules` to add them to your session: 

```
%additional_python_modules awswrangler, s3://amzn-s3-demo-bucket/mymodule.whl
```

 All arguments to additional\$1python\$1modules are passed to `pip3 install -m <>` 

 For a list of available Python modules, see [Using Python libraries with AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html). 

## Changing AWS Glue Configuration
<a name="specify-default-modules"></a>

 You can use magics to control AWS Glue job configuration values. If you want to change a job configuration value you have to use the proper magic in the notebook. See [Magics supported by AWS Glue interactive sessions for Jupyter](https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions-magics.html). 

**Note**  
 Overriding properties for a running session is no longer available. In order to change the session’s configurations, you can stop the session, set the new configurations and then start a new session. 

 AWS Glue supports various worker types. You can set the worker type with `%worker_type`. For example: `%worker_type G.2X `. Available worker types include G.1X, G.2X, G.4X, G.8X, G.12X, G.16X, R.1X, R.2X, R.4X, and R.8X. The default is G.1X. 

 You can also specify the Number of workers with `%number_of_workers`. For example, to specify 40 workers: `%number_of_workers 40`. 

 For more information see [Defining Job Properties](https://docs.aws.amazon.com/glue/latest/dg/add-job.html) 

## Stop a notebook session
<a name="stop-notebook-session"></a>

 To stop a notebook session, use the magic `%stop_session`. 

 If you navigate away from the notebook in the AWS console, you will receive a warning message where you can choose to stop the session. 

# Using Amazon Q Developer with AWS Glue Studio notebooks
<a name="glue-studio-notebooks-amazon-q-developer"></a>

 AWS Glue Studio allows you to interactively author jobs in a notebook interface based on Jupyter Notebooks. Using Amazon Q Developer improves the authoring experience within AWS Glue Studio notebooks. 

 The Amazon Q Developer extension supports writing code by generating code recommendations and suggesting improvements related to code issues. Amazon Q Developer supports both Python and Scala, the two languages used for coding ETL scripts for Spark jobs in AWS Glue Studio notebooks. 

## What is Amazon Q Developer?
<a name="w2aac33c15c36b9"></a>

 Amazon Q Developer is a service powered by machine learning that helps improve developer productivity. Amazon Q Developer achieves this by generating code recommendations based on developers’ comments in natural language and their code in the IDE. The service integrates with JupyterLab, Amazon SageMaker AI Studio, Amazon SageMaker AI notebook instances, and other integrated development environments (IDEs). 

 For more information, see [Using Amazon Q Developer with AWS Glue Studio](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/glue-setup.html). 