After careful consideration, we decided to end support for Amazon FinSpace, effective October 7, 2026. Amazon FinSpace will no longer accept new customers beginning October 7, 2025. As an existing customer with an Amazon FinSpace environment created before October 7, 2025, you can continue to use the service as normal. After October 7, 2026, you will no longer be able to use Amazon FinSpace. For more information, see [Amazon FinSpace end of support](https://docs.aws.amazon.com/finspace/latest/userguide/amazon-finspace-end-of-support.html). 

# Adding and managing data in Amazon FinSpace
<a name="finspace-add-data"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

People with different roles such as Analyst, Data Scientist, Data Engineer, Data Governor, Audit personnel use Amazon FinSpace for data organization, governance, preparation, and analysis. FinSpace supports data of any file format with additional features for structure data formats such as CSV.

FinSpace represents data in the catalog using a structure called a Dataset. Dataset is a logical container of semantically identical data and schema.

![\[A diagram that shows the dataset meta model.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/dataset-meta-model.png)


The first step is loading data into FinSpace, often referred to as ingesting data. FinSpace supports loading data in a variety of data formats and sources. You can load data by connecting in your data feeds or upload ad-hoc data through the web application.

After your data is available in FinSpace, you can do the following:
+ Describe datasets to provide business context by using fields specified from Attribute Sets.
+ Control who can access the data by assigning permissions to permission groups.
+ Create data views that allow users to query data in FinSpace notebooks.
+ Using the notebooks, create derived data by joining data and from the results of analysis of a dataset.
+ Generate audit report on activity.

** **Topics** **
+ [Loading data into Amazon FinSpace](load-data-into-finspace.md)
+ [Supported data types and file formats in Amazon FinSpace](supported-data-types.md)
+ [Working with datasets in Amazon FinSpace](working-with-datasets.md)

# Loading data into Amazon FinSpace
<a name="load-data-into-finspace"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

Data can be loaded into FinSpace from the following sources
+ Amazon S3
+ On-premises data stores
+ Local desktop

Data can be loaded using following methods
+  [FinSpace web application](tutorial-load-data-analyze-finspace.md) 
+  [SDK to connect your data feeds](https://docs.aws.amazon.com/finspace/latest/data-api/fs-api-welcome.html)

# Supported data types and file formats in Amazon FinSpace
<a name="supported-data-types"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

Amazon FinSpace provides support for a variety of data types in structured data and file formats.

## Supported column types and values for structured data
<a name="supported-column-types-and-values-for-structured-data"></a>

FinSpace currently supports the following data types for the columns of structured data
+ String
+ Char
+ Integer
+ Tiny Integer
+ Small Integer
+ Big Integer
+ Float
+ Double
+ Date. Supported Date format is yyyy-MM-dd. For example, 2016-12-31
+ Datetime. Support Datetime format is yyyy-MM-dd HH:mm:ss. For example, 2016-12-31 15:30:00
+ Boolean
+ Binary

## Supported file formats
<a name="supported-file-formats"></a>

Files of any format can be ingested into FinSpace, but data view creation is only supported for the following formats:
+ CSV – Only UTF-8 encoding is supported
+ JSON
+ Parquet
+ XML

## Format options for loading data
<a name="format-options-for-loading-data"></a>

FinSpace supports following formatting options when loading data in supported formats types. Currently, the only formats that FinSpace supports are CSV, JSON, Parquet, and XML.

**Note**  
The FinSpace web application only supports ingestion for CSV format for creation of data views and comma delimited and `withHeader` option. Other formats are supported with SDK.

## CSV
<a name="formattypecsv"></a>

This value designates comma-separated-values as the data format (for example, see RFC 4180 and RFC 7111).

You can use the following `formatParams` values with `FormatType="csv"`:

1.  `separator` – Specifies the delimiter character. The default is a comma "," but any other character can be specified.

1.  `escaper` – Specifies a character to use for escaping. This option is used only when reading CSV files. The default value is none. If enabled, the character that immediately follows is used as-is, except for a small set of well-known escapes (\$1n, \$1r, \$1t, and \$10).

1.  `quoteChar` – Specifies the character to use for quoting. The default is a double quote ("). Set this to -1 to disable quoting entirely.

1.  `multiLine` – A Boolean value that specifies whether a single record can span multiple lines. This can occur when a field contains a quoted new-line character. You must set this option to "True" if any record spans multiple lines. The default value is "False", which allows for more aggressive file-splitting during parsing.

1.  `withHeader` – A Boolean value that specifies whether to treat the first line as a header. The default value is "True".

1.  `skipFirst` – A Boolean value that specifies whether to skip the first data line. The default value is "False".

**Note**  
If any of the default values are changed, all format values must be supplied.

## JSON
<a name="formattypejson"></a>

This value designates a JavaScript Object Notation data format.

You can use the following `formatParams` values with `FormatType="json"`:

1.  `jsonPath` – A JsonPath expression that identifies an object to be read into records. This is particularly useful when a file contains records nested inside an outer array. For example, the following JsonPath expression targets the id field of a JSON object.

 `format="json", format_options={"jsonPath": "$.id"}` 

## Parquet
<a name="formattypeparquet"></a>

This value designates Apache Parquet as the data format.

There are no `formatParams` values for `FormatType="parquet"`.

## XML
<a name="formattypexml"></a>

This value designates XML as the data format, parsed through a fork of the [XML data source for Apache spark](https://github.com/databricks/spark-xml) parser.

You can use the following `formatParams` values with `FormatType="xml"`:

1.  `rowTag` – Specifies the XML tag in the file to treat as a row. Row tags cannot be self-closing.

1.  `encoding` – Specifies the character encoding. The default value is "UTF-8".

1.  `excludeAttribute` – A Boolean value that specifies whether you want to exclude attributes in elements or not. The default value is "false".

1.  `treatEmptyValuesAsNulls` – A Boolean value that specifies whether to treat white space as a null value. The default value is "false".

1.  `attributePrefix` – A prefix for attributes to differentiate them from elements. This prefix is used for field names. The default value is "\$1".

1.  `valueTag` – The tag used for a value when there are attributes in the element that have no child. The default is "\$1VALUE".

1.  `ignoreSurroundingSpaces` – A Boolean value that specifies whether the white space that surrounds values should be ignored. The default value is "false".

# Working with datasets in Amazon FinSpace
<a name="working-with-datasets"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

The Amazon FinSpace dataset is a logical container of semantically identical data and schema. Dataset keeps track of all the data that is ingested, and also tracks data views that get generated on the ingested data. The data views are used to access the data for analysis within notebooks. Dataset can contain structured data with a schema or unstructured data like PDF files or blobs.

** **Topics** **
+ [Dataset details page](dataset-details-page.md)
+ [Creating a dataset](creating-dataset.md)
+ [Creating changesets in a dataset](creating-changeset-in-a-dataset.md)
+ [Corrections to a dataset](corrections-to-a-dataset.md)
+ [Removing a dataset](deleting-a-dataset.md)

# Dataset details page
<a name="dataset-details-page"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

The dataset details page contains detailed information about the dataset. This page contains overview of a dataset, all the data views created for the dataset, the schema and permissions related to a dataset under the following tabs.
+ [Data Overview](#data-overview-tab)
+ [All Data Views](#all-data-views-tab)
+ [Schema](#schema-tab)
+ [Permissions](#permissions-tab)

From the right side of the page, you can edit the dataset description or remove the dataset by choosing the **More** menu.

You can also view the information related to when the dataset was created and the user who created this dataset.

![\[A screenshot that shows the owner information on dataset details page.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/owner-information.png)


From the **See Related** section, you can easily navigate to related datasets in the application. Each label in this section corresponds to attribute values and category values associated to a dataset. The labels listed in this section match the values of the attributes that you select at the bottom of the **Data Overview** tab. Selecting any labels will take you to the data browser where other datasets with the same label will be shown in the results. 

![\[A screenshot that shows the related tags on dataset details page.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/see-related.png)


## Data Overview
<a name="data-overview-tab"></a>

This tab shows the description of the dataset, latest data views, and associated attribute sets that describe the dataset.

![\[A screenshot of the dataset overview tab in FinSpace.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/dataset-overview.png)


## All Data Views
<a name="all-data-views-tab"></a>

This tab shows the details of all the data that is ingested into the dataset as changesets, and all the data views that have been created. 

![\[A screenshot of the All data views tab in FinSpace.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/dataset-alldataviews.png)


In this tab, you can do the following:
+ View list of data views under the **Data Views** section. Choose **Details** to view detailed information about a specific data view. 
+ Load a data view for analysis. Choose the **Analyze in Notebook** button to open the data view in FinSpace notebook. Choose the **External API Access** button to access the data view externally using the FinSpace API.
+ Create new data views by choosing the **Create Data View** button. For more information, see [Create data view](create-data-view.md).
+ View dataset update history and make corrections to datasets.
+ Load data to the dataset by uploading a file or through FinSpace API.
+ Create changeset with `Append` and `Replace` type. For more information, see [Creating changesets in a dataset](creating-changeset-in-a-dataset.md).

## Schema
<a name="schema-tab"></a>

This tab shows the schema of the dataset. The existing schema can only be edited if no data views have been created.

![\[A screenshot of the dataset schema tab in FinSpace.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/dataset-schema.png)


## Permissions
<a name="permissions-tab"></a>

This tab shows the list of permission groups that are entitled to use the dataset. From this section, you can assign new permission groups to the dataset by choosing **Assign Permission Group**.

![\[A screenshot of the dataset permissions tab in FinSpace.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/dataset-permissions.png)


# Creating a dataset
<a name="creating-dataset"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

**Note**  
In order to create and manage datasets, you must be a superuser or a member of a group with necessary permissions – **Create Datasets**.

A dataset can be created by loading a file using the Amazon FinSpace web application.

 **To create a dataset** 

1. Sign in to the FinSpace web application. For more information, see [Signing in to the Amazon FinSpace web application](signing-into-amazon-finspace.md).

1. On the left navigation bar of the home page, choose **Add Data**.

1. Drag and drop a .csv file or choose **Browse Files** to select a file. Once the file is detected by the web application, schema of the file will be displayed. The column names are read from the file and data types are inferred.  
![\[A screenshot that shows the Add Data page in FinSpace.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/add-data.png)

1. Change the data types as required by choosing **Edit Derived Schema**. Take note of the data types and formats that are supported.

1. Choose **Save Schema**.

1. Choose **Confirm Schema & Upload File**. This action starts the following process:

   1. Create a dataset with name of the .csv file that was loaded and takes you to the [dataset details page](dataset-details-page.md).

   1. Once the upload of the sample data file is complete, a changeset is created with the content of the data file. Verify by checking the **Dataset Update History** table under **All Data Views** tab.

   1. Data view creation process is started. Once the upload of the sample data file is complete, a process is kicked off to create a data view that can be analyzed in a notebook. 

      For small files of up to 100 megabytes, data view creation takes approximately 2 minutes. For larger files of around 1 gigabyte, expect data view creation to take approximately 3-4 minutes. Views with partitioning and sorting schemes may take longer.

      Once a dataset is created, you can start adding data to it. A new set of data added to a dataset creates a corresponding changeset.

# Creating changesets in a dataset
<a name="creating-changeset-in-a-dataset"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

Data files are added to datasets and tracked as a changeset. A changeset is created in a dataset when one or more data files are ingested in a single operation. All changesets in a dataset are preserved unless a dataset itself is deleted. A changeset is created with a unique identifier and a system timestamp is assigned to it at the time of creation.

 **A changeset is created as one of the following types** 
+  **Append** – New changeset is considered an addition to the end of the prior ingested changesets. For example, addition of a new daily file.
+  **Replace** – New changeset is considered a replacement to all prior ingested changesets in a dataset. This does not mean that the prior ingested changesets are deleted but they will not be considered for the view creation.

![\[A screenshot that shows the changeset types.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/append-and-replace-data.png)


## Replace data
<a name="create-a-changeset-with-replace-type"></a>

 **To create a changeset with type as Replace** 

1. From the homepage search for a dataset where you want to replace data.

1. Choose the dataset name to view the dataset details page.

1. Choose the **All Data Views** tab.

1. Scroll down and choose **Replace Data**.

1. Choose **Select CSV File** to select and upload a file from your desktop. 

1. Once the file is uploaded, choose the input format for the ingested data from the following options:
   + **Delimiter** – Specifies the delimiter character. The default value is *Comma*.
   + **Escape Character** – Specifies a character to use for escaping. The default value is *None*.
   + **Quotes** – Specifies the character to use for quoting. The default value is *Double Quotes* (").
   + **Multiline Records** – Specifies whether a single record can span multiple lines. By default this option is disabled. Enable this option if you want any record to span multiple lines.
   + **Treat First Line As Header** – Specifies whether to treat the first line as a header. By default this option is disabled.
   + **Skip First Data Line** – Specifies whether to skip the first data line. By default this option is disabled.

1. Choose **Replace Data**.

1. Once the file upload is complete, you should see a new entry for a changeset of type *Replace* under the **Dataset Update History** table with a **Pending** status. Once the status is set to **Available**, a data view that includes the new changeset can be created.

## Append data
<a name="create-a-changeset-with-append-type"></a>

 **To create a changeset with type as Append** 

1. From the homepage, search for the dataset to which you want to append data.

1. Choose the dataset name to view the dataset details page.

1. Choose the **All Data Views** tab.

1. Scrolls down and choose **Append Data**.

1. Choose **Select CSV File** to select and upload a file from your desktop. 

1. Once the file is uploaded, choose the input format for the ingested data from the following options:
   + **Delimiter** – Specifies the delimiter character. The default value is *Comma*.
   + **Escape Character** – Specifies a character to use for escaping. The default value is *None*.
   + **Quotes** – Specifies the character to use for quoting. The default value is *Double Quotes* (").
   + **Multiline Records** – Specifies whether a single record can span multiple lines. By default this option is disabled. Enable this option if you want any record to span multiple lines.
   + **Treat First Line As Header** – Specifies whether to treat the first line as a header. By default this option is disabled.
   + **Skip First Data Line** – Specifies whether to skip the first data line. By default this option is disabled.

1. Choose **Append Data**.

1. Once the file upload is complete, you should see a new entry for a changeset of type *Append* under the **Dataset Update History** table with a **Pending** status. Once the status is set to **Available**, a data view that includes the new changeset can be created.

# Corrections to a dataset
<a name="corrections-to-a-dataset"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

A changeset can be ingested as a correction to an already created changeset. This action does not delete the prior ingested set but signifies that the replaced changeset will be used when a view is created if both changesets fall under the specified date and time of the view.

 **To create a changeset that is a replacement to an existing changeset** 

1. From the homepage, search for the dataset that you want to make corrections to.

1. Choose the dataset name to view the dataset details page.

1. Choose the **All Data Views** tab.

1. Under the **Dataset Update History** table, from the list of changesets identify the changeset to be replaced and then choose the corrections icon (![\[Two curved arrows forming a circular shape, indicating a refresh or sync operation.\]](http://docs.aws.amazon.com/finspace/latest/userguide/images/05-add-and-manage-data/corrections-icon.png)).

1. Choose **Choose CSV File** to select and upload a file from your desktop.

1. Once the file is uploaded, choose the input format for the ingested data from the following options:
   + **Delimiter** – Specifies the delimiter character. The default value is *Comma*.
   + **Escape Character** – Specifies a character to use for escaping. The default value is *None*.
   + **Quotes** – Specifies the character to use for quoting. The default value is *Double Quotes* (").
   + **Multiline Records** – Specifies whether a single record can span multiple lines. By default this option is disabled. Enable this option if you want any record to span multiple lines.
   + **Treat First Line As Header** – Specifies whether to treat the first line as a header. By default this option is disabled.
   + **Skip First Data Line** – Specifies whether to skip the first data line. By default this option is disabled.

1. Choose **Save**. The changeset is added to the **Dataset Update History** table with a **Pending** or **Running** status that changes to **Available** once the update is successful.

# Removing a dataset
<a name="deleting-a-dataset"></a>

**Important**  
Amazon FinSpace Dataset Browser will be discontinued on *March 26, 2025*. Starting *November 29, 2023*, FinSpace will no longer accept the creation of new Dataset Browser environments. Customers using [Amazon FinSpace with Managed Kdb Insights](https://aws.amazon.com/finspace/features/managed-kdb-insights/) will not be affected. For more information, review the [FAQ](https://aws.amazon.com/finspace/faqs/) or contact [AWS Support](https://aws.amazon.com/contact-us/) to assist with your transition.

A dataset can be permanently removed from your Amazon FinSpace environment.

 **To remove a dataset** 

1. From the homepage, search for the dataset that you want to remove.

1. Choose the dataset name to view the dataset details page.

1. On the top-right corner, choose the **More** menu and then choose **Remove Dataset**.

1. In the confirmation dialog box, choose **Remove Dataset**.
**Note**  
This action is irreversible. Once removed, a dataset cannot be recovered.