# Understanding Collations in Babelfish for Aurora PostgreSQL
<a name="babelfish-collations"></a>

When you create an Aurora PostgreSQL DB cluster with Babelfish, you choose a collation for your data. A *collation* specifies the sort order and bit patterns that produce the text or characters in a given written human language. A collation includes rules comparing data for a given set of bit patterns. Collation is related to localization. Different locales affect character mapping, sort order, and the like. Collation attributes are reflected in the names of various collations. For information about attributes, see the [Babelfish collation attributes table](#bfish-collation-attributes-table). 

Babelfish maps SQL Server collations to comparable collations provided by Babelfish. Babelfish predefines Unicode collations with culturally sensitive string comparisons and sort orders. Babelfish also provides a way to translate the collations in your SQL Server DB to the closest-matching Babelfish collation. Locale-specific collations are provided for different languages and regions. 

Some collations specify a code page that corresponds to a client-side encoding. Babelfish automatically translates from the server encoding to the client encoding depending on the collation of each output column. 

Babelfish supports the collations listed in the [Babelfish supported collations table](#bfish-collations-table). Babelfish maps SQL Server collations to comparable collations provided by Babelfish. 

Babelfish uses version 153.80 of the International Components for Unicode (ICU) collation library. For more information about ICU collations, see [ Collation](https://unicode-org.github.io/icu/userguide/collation/) in the ICU documentation. To learn more about PostgreSQL and collation, see [Collation Support](https://www.postgresql.org/docs/current/collation.html) in the the PostgreSQL documentation.

**Topics**
+ [DB cluster parameters that control collation and locale](#babelfish-collations.parameters)
+ [Deterministic and nondeterministic collations in Babelfish](#babelfish-collations.deterministic-nondeterministic)
+ [Collations supported at database level in Babelfish](#babelfish-collations.database-level)
+ [Server and object Collations in Babelfish](#babelfish-collations.reference-tables-supported-collations)
+ [Default Collation behavior in Babelfish](#babelfish-collations-default)
+ [Managing collations](collation.managing.md)
+ [Collation limitations and behavior differences](collation.limitations.md)

## DB cluster parameters that control collation and locale
<a name="babelfish-collations.parameters"></a><a name="collation-related-parameters"></a>

The following parameters affect collation behavior. 

**babelfishpg\$1tsql.default\$1locale**  
This parameter specifies the default locale used by the collation. This parameter is used in combination with attributes listed in the [Babelfish collation attributes table](#bfish-collation-attributes-table) to customize collations for a specific language and region. The default value for this parameter is `en-US`.  
The default locale applies to all Babelfish collation names that start with "BBF" and to all SQL Server collations that are mapped to Babelfish collations. Changing the setting for this parameter on an existing Babelfish DB cluster doesn't affect the locale of existing collations. For the list of collations, see the [Babelfish supported collations table](#bfish-collations-table). 

**babelfishpg\$1tsql.server\$1collation\$1name**  
This parameter specifies the default collation for the server (Aurora PostgreSQL DB cluster instance) and the database. The default value is `sql_latin1_general_cp1_ci_as`. The `server_collation_name` has to be a `CI_AS` collation because in T-SQL, the server collation determines how identifiers are compared.  
When you create your Babelfish DB cluster, you choose the **Collation name** from the selectable list. These include the collations listed in the [Babelfish supported collations table](#bfish-collations-table). Don't modify the `server_collation_name` after the Babelfish database is created.

The settings you choose when you create your Babelfish for Aurora PostgreSQL DB cluster are stored in the DB cluster parameter group associated with the cluster for these parameters and set its collation behavior.

## Deterministic and nondeterministic collations in Babelfish
<a name="babelfish-collations.deterministic-nondeterministic"></a>

Babelfish supports deterministic and nondeterministic collations:
+ A *deterministic collation* evaluates characters that have identical byte sequences as equal. That means that `x` and `X` aren't equal in a deterministic collation. Deterministic collations can be case-sensitive (CS) and accent-sensitive (AS).
+ A *nondeterministic collation* doesn't need an identical match. A nondeterministic collation evaluates `x` and `X` as equal. Nondeterministic collations are case-insensitive (CI) or accent-insensitive (AI), or both.

In the table following, you can find some behavior differences between Babelfish and PostgreSQL when using nondeterministic collations.


| Babelfish | PostgreSQL | 
| --- | --- | 
|  Supports the LIKE clause for CI\$1AS collations.  |  Doesn't support the LIKE clause on nondeterministic collations.  | 
|  Supports the LIKE clause only on the following AI collations from Babelfish version 4.2.0: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/babelfish-collations.html)  |  Doesn't support the LIKE clause on nondeterministic collations.  | 

For a list of other limitations and behavior differences for Babelfish compared to SQL Server and PostgreSQL, see [Collation limitations and behavior differences](collation.limitations.md). 

Babelfish and SQL Server follow a naming convention for collations that describe the collation attributes, as shown in the table following.<a name="bfish-collation-attributes-table"></a>


| Attribute | Description | 
| --- | --- | 
| AI | Accent-insensitive. | 
| AS | Accent-sensitive. | 
| BIN2 | BIN2 requests data to be sorted in code point order. Unicode code point order is the same character order for UTF-8, UTF-16, and UCS-2 encodings. Code point order is a fast deterministic collation. | 
| CI | Case-insensitive. | 
| CS | Case-sensitive. | 
| PREF | To sort uppercase letters before lowercase letters, use a PREF collation. If comparison is case-insensitive, the uppercase version of a letter sorts before the lowercase version, if there is no other distinction. The ICU library supports uppercase preference with `colCaseFirst=upper`, but not for CI\$1AS collations. PREF can be applied only to `CS_AS` deterministic collations. | 

## Collations supported at database level in Babelfish
<a name="babelfish-collations.database-level"></a>

The following collations are supported at database level in Babelfish:
+ bbf\$1unicode\$1bin2
+ bbf\$1unicode\$1cp1\$1ci\$1ai
+ bbf\$1unicode\$1cp1\$1ci\$1as
+ bbf\$1unicode\$1cp1250\$1ci\$1ai
+ bbf\$1unicode\$1cp1250\$1ci\$1as
+ bbf\$1unicode\$1cp1257\$1ci\$1ai
+ bbf\$1unicode\$1cp1257\$1ci\$1as
+ estonian\$1ci\$1ai
+ estonian\$1ci\$1as
+ finnish\$1swedish\$1ci\$1ai
+ finnish\$1swedish\$1ci\$1as
+ french\$1ci\$1ai
+ french\$1ci\$1as
+ latin1\$1general\$1bin2
+ latin1\$1general\$1ci\$1ai
+ latin1\$1general\$1ci\$1as
+ latin1\$1general\$190\$1bin2
+ latin1\$1general\$1100\$1bin2
+ latin1\$1general\$1140\$1bin2
+ modern\$1spanish\$1ci\$1ai
+ modern\$1spanish\$1ci\$1as
+ polish\$1ci\$1ai
+ polish\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1\$1ci\$1ai
+ sql\$1latin1\$1general\$1cp1\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1250\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1251\$1ci\$1as
+ sql\$1latin1\$1general\$1cp1257\$1ci\$1as
+ traditional\$1spanish\$1ci\$1ai
+ traditional\$1spanish\$1ci\$1as

**Note**  
To use a different collation at the database level, ensure it matches the server-level collation. For more information, see [Server and object Collations in Babelfish](#babelfish-collations.reference-tables-supported-collations)

## Server and object Collations in Babelfish
<a name="babelfish-collations.reference-tables-supported-collations"></a>

Use the following collations as a server collation or an object collation.<a name="bfish-collations-table"></a>


| Collation ID | Notes | 
| --- | --- | 
|  bbf\$1unicode\$1general\$1ci\$1as  |  Supports case-insensitive comparison and the LIKE operator.  | 
|  bbf\$1unicode\$1cp1\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt) also known as CP1252.  | 
|  bbf\$1unicode\$1CP1250\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt) used to represent texts in Central European and Eastern European languages that use Latin script.  | 
|  bbf\$1unicode\$1CP1251\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1251.txt) for languages that use the Cyrillic script.  | 
|  bbf\$1unicode\$1cp1253\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1253.txt) used to represent modern Greek.  | 
|  bbf\$1unicode\$1cp1254\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1254.txt) that supports Turkish.  | 
|  bbf\$1unicode\$1cp1255\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) that supports Hebrew.  | 
|  bbf\$1unicode\$1cp1256\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1256.txt) used to write languages that use Arabic script.  | 
|  bbf\$1unicode\$1cp1257\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1257.txt) used to support Estonian, Latvian, and Lithuanian languages.  | 
|  bbf\$1unicode\$1cp1258\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1258.txt) used to write Vietnamese characters.  | 
|  bbf\$1unicode\$1cp874\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit874.txt) used to write Thai characters.  | 
|  sql\$1latin1\$1general\$1cp1250\$1ci\$1as  |  [Nondeterministic single-byte character encoding](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1250.txt) used to represent Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1251\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1251.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1253\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1253.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1254\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1254.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1255\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1255.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1256\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1256.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1257\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1257.txt) that supports Latin characters.  | 
|  sql\$1latin1\$1general\$1cp1258\$1ci\$1as  |  [Nondeterministic collation](https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1258.txt) that supports Latin characters.  | 
|  chinese\$1prc\$1ci\$1as  |  Nondeterministic collation that supports Chinese (PRC).  | 
|  cyrillic\$1general\$1ci\$1as  |  Nondeterministic collation that supports Cyrillic.  | 
|  finnish\$1swedish\$1ci\$1as  |  Nondeterministic collation that supports Finnish.  | 
|  french\$1ci\$1as  |  Nondeterministic collation that supports French.  | 
|  japanese\$1ci\$1as  | Nondeterministic collation that supports Japanese. Supported in Babelfish 2.1.0 and higher releases. | 
|  korean\$1wansung\$1ci\$1as  |  Nondeterministic collation that supports Korean (with dictionary sort).  | 
|  latin1\$1general\$1ci\$1as  |  Nondeterministic collation that supports Latin characters.  | 
|  modern\$1spanish\$1ci\$1as  |  Nondeterministic collation that supports Modern Spanish.  | 
|  polish\$1ci\$1as  |  Nondeterministic collation that supports Polish.  | 
|  thai\$1ci\$1as  |  Nondeterministic collation that supports Thai.  | 
|  traditional\$1spanish\$1ci\$1as  |  Nondeterministic collation that supports Spanish (traditional sort).  | 
|  turkish\$1ci\$1as  |  Nondeterministic collation that supports Turkish.  | 
|  ukrainian\$1ci\$1as  |  Nondeterministic collation that supports Ukrainian.  | 
|  vietnamese\$1ci\$1as  |  Nondeterministic collation that supports Vietnamese.  | 

You can use the following collations as object collations.<a name="bfish-icu-collations-table"></a>


| Dialect | Deterministic options | Nondeterministic options | 
| --- | --- | --- | 
|  Arabic  |  Arabic\$1CS\$1AS  |  Arabic\$1CI\$1AS Arabic\$1CI\$1AI  | 
|  Arabic script  |  BBF\$1Unicode\$1CP1256\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1256\$1CS\$1AS  |  BBF\$1Unicode\$1CP1256\$1CI\$1AI BBF\$1Unicode\$1CP1256\$1CS\$1AI  | 
|  Binary  |  latin1\$1general\$1bin2 BBF\$1Unicode\$1BIN2  |  –  | 
|  Central European and Eastern European languages that use Latin script  |  BBF\$1Unicode\$1CP1250\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1250\$1CS\$1AS  |  BBF\$1Unicode\$1CP1250\$1CI\$1AI BBF\$1Unicode\$1CP1250\$1CS\$1AI  | 
|  Chinese  |  Chinese\$1PRC\$1CS\$1AS  |  Chinese\$1PRC\$1CI\$1AS Chinese\$1PRC\$1CI\$1AI  | 
|  Cyrillic\$1General  |  Cyrillic\$1General\$1CS\$1AS  |  Cyrillic\$1General\$1CI\$1AS Cyrillic\$1General\$1CI\$1AI  | 
|  Cyrillic script  |  BBF\$1Unicode\$1CP1251\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1251\$1CS\$1AS  |  BBF\$1Unicode\$1CP1251\$1CI\$1AI BBF\$1Unicode\$1CP1251\$1CS\$1AI  | 
|  Estonian  |  Estonian\$1CS\$1AS  |  Estonian\$1CI\$1AS Estonian\$1CI\$1AI  | 
|  Estonian, Latvian, and Lithuanian  |  BBF\$1Unicode\$1CP1257\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1257\$1CS\$1AS  |  BBF\$1Unicode\$1CP1257\$1CI\$1AI BBF\$1Unicode\$1CP1257\$1CS\$1AI  | 
|  Finnish\$1Swedish  |  Finnish\$1Swedish\$1CS\$1AS  |  Finnish\$1Swedish\$1CI\$1AS Finnish\$1Swedish\$1CI\$1AI  | 
|  French  |  French\$1CS\$1AS  |  French\$1CI\$1AS French\$1CI\$1AI  | 
|  Greek  |  Greek\$1CS\$1AS  |  Greek\$1CI\$1AS Greek\$1CI\$1AI  | 
|  Hebrew  |  BBF\$1Unicode\$1CP1255\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1255\$1CS\$1AS Hebrew\$1CS\$1AS  |  BBF\$1Unicode\$1CP1255\$1CI\$1AI BBF\$1Unicode\$1CP1255\$1CS\$1AI Hebrew\$1CI\$1AS Hebrew\$1CI\$1AI  | 
|  Japanese (Babelfish 2.1.0 and higher)  | Japanese\$1CS\$1AS | Japanese\$1CI\$1AI Japanese\$1CI\$1AS | 
|  Korean\$1Wamsung  |  Korean\$1Wamsung\$1CS\$1AS  |  Korean\$1Wamsung\$1CI\$1AS Korean\$1Wamsung\$1CI\$1AI  | 
|  Latin characters for code page CP1252  |  latin1\$1general\$1cs\$1as BBF\$1Unicode\$1General\$1CS\$1AS BBF\$1Unicode\$1General\$1Pref\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1\$1CS\$1AS BBF\$1Unicode\$1CP1\$1CS\$1AS  |  latin1\$1general\$1ci\$1as latin1\$1general\$1ci\$1ai latin1\$1general\$1cs\$1ai BBF\$1Unicode\$1General\$1CI\$1AI BBF\$1Unicode\$1General\$1CS\$1AI BBF\$1Unicode\$1CP1\$1CI\$1AI BBF\$1Unicode\$1CP1\$1CS\$1AI  | 
|  Modern Greek  |  BBF\$1Unicode\$1CP1253\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1253\$1CS\$1AS  |  BBF\$1Unicode\$1CP1253\$1CI\$1AI BBF\$1Unicode\$1CP1253\$1CS\$1AI  | 
|  Modern\$1Spanish  |  Modern\$1Spanish\$1CS\$1AS  |  Modern\$1Spanish\$1CI\$1AS Modern\$1Spanish\$1CI\$1AI  | 
|  Mongolian  |  Mongolian\$1CS\$1AS  |  Mongolian\$1CI\$1AS Mongolian\$1CI\$1AI  | 
|  Polish  |  Polish\$1CS\$1AS  |  Polish\$1CI\$1AS Polish\$1CI\$1AI  | 
|  Thai  |  BBF\$1Unicode\$1CP874\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP874\$1CS\$1AS Thai\$1CS\$1AS  |  BBF\$1Unicode\$1CP874\$1CI\$1AI BBF\$1Unicode\$1CP874\$1CS\$1AI Thai\$1CI\$1AS, Thai\$1CI\$1AI  | 
|  Traditional\$1Spanish  |  Traditional\$1Spanish\$1CS\$1AS  |  Traditional\$1Spanish\$1CI\$1AS Traditional\$1Spanish\$1CI\$1AI  | 
|  Turkish  |  BBF\$1Unicode\$1CP1254\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1254\$1CS\$1AS Turkish\$1CS\$1AS  |  BBF\$1Unicode\$1CP1254\$1CI\$1AI BBF\$1Unicode\$1CP1254\$1CS\$1AI Turkish\$1CI\$1AS, Turkish\$1CI\$1AI  | 
|  Ukranian  |  Ukranian\$1CS\$1AS  |  Ukranian\$1CI\$1AS Ukranian\$1CI\$1AI  | 
|  Vietnamese  |  BBF\$1Unicode\$1CP1258\$1CS\$1AS BBF\$1Unicode\$1Pref\$1CP1258\$1CS\$1AS Vietnamese\$1CS\$1AS  |  BBF\$1Unicode\$1CP1258\$1CI\$1AI BBF\$1Unicode\$1CP1258\$1CS\$1AI Vietnamese\$1CI\$1AS Vietnamese\$1CI\$1AI  | 

## Default Collation behavior in Babelfish
<a name="babelfish-collations-default"></a>

Earlier, the default collation of the collatable datatypes was `pg_catalog.default`. The datatypes and the objects that depends on these datatypes follows cases-sensitive collation. This condition potentially impacts the T-SQL objects of the data set with case-insensitive collation. Starting with Babelfish 2.3.0, the default collation for the collatable data types (except TEXT and NTEXT) is the same as the collation in the `babelfishpg_tsql.server_collation_name` parameter. When you upgrade to Babelfish 2.3.0, the default collation is picked automatically at the time of DB cluster creation, which doesn't create any visible impact. 

# Managing collations
<a name="collation.managing"></a>

The ICU library provides collation version tracking to ensure that indexes that depend on collations can be reindexed when a new version of ICU becomes available. To see if your current database has collations that need refreshing, you can use the following query after connecting using `psql` or or `pgAdmin`:

```
SELECT pg_describe_object(refclassid, refobjid,
    refobjsubid) AS "Collation", 
    pg_describe_object(classid, objid, objsubid) AS "Object" 
    FROM pg_depend d JOIN pg_collation c ON refclassid = 'pg_collation'::regclass
    AND refobjid = c.oid WHERE c.collversion <> pg_collation_actual_version(c.oid) 
    ORDER BY 1, 2;
```

This query returns output such as the following:

```
 Collation | Object
-----------+--------
(0 rows)
```

In this example, no collations need to be updated.

To get a listing of the predefined collations in your Babelfish database, you can use `psql` or `pgAdmin` with the following query:

```
SELECT * FROM pg_collation;
```

Predefined collations are stored in the `sys.fn_helpcollations` table. You can use the following command to display information about a collation (such as its lcid, style, and collate flags). To get a listing of all collations by using `sqlcmd`, connect to the T-SQL port (1433, by default) and run the following query: 

```
1> :setvar SQLCMDMAXVARTYPEWIDTH 40
2> :setvar SQLCMDMAXFIXEDTYPEWIDTH 40
3> SELECT * FROM fn_helpcollations()
4> GO
name                                     description
---------------------------------------- ----------------------------------------
arabic_cs_as                             Arabic, case-sensitive, accent-sensitive
arabic_ci_ai                             Arabic, case-insensitive, accent-insensi
arabic_ci_as                             Arabic, case-insensitive, accent-sensiti
bbf_unicode_bin2                         Unicode-General, case-sensitive, accent-
bbf_unicode_cp1250_ci_ai                 Default locale, code page 1250, case-ins
bbf_unicode_cp1250_ci_as                 Default locale, code page 1250, case-ins
bbf_unicode_cp1250_cs_ai                 Default locale, code page 1250, case-sen
bbf_unicode_cp1250_cs_as                 Default locale, code page 1250, case-sen
bbf_unicode_pref_cp1250_cs_as            Default locale, code page 1250, case-sen
bbf_unicode_cp1251_ci_ai                 Default locale, code page 1251, case-ins
bbf_unicode_cp1251_ci_as                 Default locale, code page 1251, case-ins
bbf_unicode_cp1254_ci_ai                 Default locale, code page 1254, case-ins
...
(124 rows affected)
```

Lines 1 and 2 shown in the example narrow the output for documentation readability purposes only. 

```
1> SELECT SERVERPROPERTY('COLLATION')
2> GO
serverproperty
---------------------------------------------------------------------------
sql_latin1_general_cp1_ci_as

(1 rows affected)
1>
```

# Collation limitations and behavior differences
<a name="collation.limitations"></a>

Babelfish uses the ICU library for collation support. PostgreSQL is built with a specific version of ICU and can match at most one version of a collation. Variations across versions are unavoidable, as are minor variations across time as languages evolve. In the following list you can find known limitations and behavior variations of Babelfish collations:
+ **Indexes and collation type dependency** – An index on a user-defined type that depends on the International Components for Unicode (ICU) collation library (the library used by Babelfish) isn't invalidated when the library version changes.
+ **COLLATIONPROPERTY function** – Collation properties are implemented only for the supported Babelfish BBF collations. For more information, see the [Babelfish supported collations table](babelfish-collations.md#bfish-collations-table).
+ **Unicode sorting rule differences** – SQL collations for SQL Server sort Unicode-encoded data (`nchar` and `nvarchar`) differently than data that's not Unicode-encoded (`char` and `varchar`). Babelfish databases are always UTF-8 encoded and always apply Unicode sorting rules consistently, regardless of data type, so the sort order for `char` or `varchar` is the same as it is for `nchar` or `nvarchar`.
+ **Secondary-equal collations and sorting behavior** – The default ICU Unicode secondary-equal (`CI_AS`) collation sorts punctuation marks and other nonalphanumeric characters before numeric characters, and numeric characters before alphabetic characters. However, the order of punctuation and other special characters is different. 
+ **Tertiary collations, workaround for ORDER BY** – SQL collations, such as `SQL_Latin1_General_Pref_CP1_CI_AS`, support the `TERTIARY_WEIGHTS` function and the ability to sort strings that compare equally in a `CI_AS` collation to be sorted uppercase first: `ABC`, `ABc`, `AbC`, `Abc`, `aBC`, `aBc`, `abC`, and finally `abc`. Thus, the `DENSE_RANK OVER (ORDER BY column)` analytic function assesses these strings as having the same rank but orders them uppercase first within a partition.

  You can get a similar result with Babelfish by adding a `COLLATE` clause to the `ORDER BY` clause that specifies a tertiary `CS_AS` collation that specifies `@colCaseFirst=upper`. However, the `colCaseFirst` modifier applies only to strings that are tertiary-equal (rather than secondary-equal such as with `CI_AS` collation). Thus, you can't emulate tertiary SQL collations using a single ICU collation. 

  As a workaround, we recommend that you modify applications that use the `SQL_Latin1_General_Pref_CP1_CI_AS` collation to use the `BBF_SQL_Latin1_General_CP1_CI_AS` collation first. Then add `COLLATE BBF_SQL_Latin1_General_Pref_CP1_CS_AS` to any `ORDER BY` clause for this column.
+ **Character expansion** – A character expansion treats a single character as equal to a sequence of characters at the primary level. SQL Server's default `CI_AS` collation supports character expansion. ICU collations support character expansion for accent-insensitive collations only.

  When character expansion is required, then use a `AI` collation for comparisons. However, such collations aren't currently supported by the LIKE operator.
+ **char and varchar encoding** – When SQL collations are used for `char` or `varchar` data types, the sort order for characters preceding ASCII 127 is determined by the specific code page for that SQL collation. For SQL collations, strings declared as `char` or `varchar` might sort differently than strings declared as `nchar` or `nvarchar`.

  PostgreSQL encodes all strings with the database encoding, so all characters are converted to UTF-8 and sorted using Unicode rules.

  Because SQL collations sort nchar and nvarchar data types using Unicode rules, Babelfish encodes all strings on the server using UTF-8. Babelfish sorts nchar and nvarchar strings the same way it sorts char and varchar strings, using Unicode rules.
+ **Supplementary character** – The SQL Server functions `NCHAR`, `UNICODE`, and `LEN` support characters for code-points outside the Unicode Basic Multilingual Plane (BMP). In contrast, non-SC collations use surrogate pair characters to handle supplementary characters. For Unicode data types, SQL Server can represent up to 65,535 characters using UCS-2, or the full Unicode range (1,114,114 characters) if supplementary characters are used. 
+ **Kana-sensitive (KS) collations** – A Kana-sensitive (KS) collation is one that treats `Hiragana` and `Katakana` Japanese Kana characters differently. ICU supports the Japanese collation standard `JIS X 4061`. The now deprecated `colhiraganaQ [on | off]` locale modifier might provide the same functionality as KS collations. However, KS collations of the same name as SQL Server aren't currently supported by Babelfish.
+ **Width-sensitive (WS) collations** – When a single-byte character (half-width) and the same character represented as a double-byte character (full-width) are treated differently, the collation is called *width-sensitive (WS)*. WS collations with the same name as SQL Server aren't currently supported by Babelfish.
+ **Variation-selector sensitive (VSS) collations** – Variation-selector sensitive (VSS) collations distinguish between ideographic variation selectors in Japanese collations `Japanese_Bushu_Kakusu_140` and `Japanese_XJIS_140`. A variation sequence is made up of a base character plus an additional variation selector. If you don't select the `_VSS` option, the variation selector isn't considered in the comparison.

  VSS collations aren't currently supported by Babelfish.
+ **BIN and BIN2 collations** – A BIN2 collation sorts characters according to code point order. The byte-by-byte binary order of UTF-8 preserves Unicode code point order, so this is also likely to be the best-performing collation. If Unicode code point order works for an application, consider using a BIN2 collation. However, using a BIN2 collation can result in data being displayed on the client in an order that is culturally unexpected. New mappings to lowercase characters are added to Unicode as time progresses, so the `LOWER` function might perform differently on different versions of ICU. This is a special case of the more general collation versioning problem rather than as something specific to the BIN2 collation. 

  Babelfish provides the `BBF_Latin1_General_BIN2` collation with the Babelfish distribution to collate in Unicode code point order. In a BIN collation only the first character is sorted as a wchar. Remaining characters are sorted byte-by-byte, effectively in code point order according to its encoding. This approach doesn't follow Unicode collation rules and isn't supported by Babelfish.
+ **Non-deterministic collations and CHARINDEX limitation** – For Babelfish releases older than version 2.1.0, you can't use CHARINDEX with non-deterministic collations. By default, Babelfish uses a case-insensitive (non-deterministic) collation. Using CHARINDEX for older versions of Babelfish raises the following runtime error:

  ```
  nondeterministic collations are not supported for substring searches
  ```
**Note**  
This limitation and workaround apply to Babelfish version 1.x only (Aurora PostgreSQL 13.x versions). Babelfish 2.1.0 and higher releases don't have this issue.

  You can work around this issue in one of the following ways:
  + Explicitly convert the expression to a case-sensitive collation and case-fold both arguments by applying LOWER or UPPER. For example, `SELECT charindex('x', a) FROM t1` would become the following:

    ```
    SELECT charindex(LOWER('x'), LOWER(a COLLATE sql_latin1_general_cp1_cs_as)) FROM t1
    ```
  + Create a SQL function f\$1charindex, and replace CHARINDEX calls with calls to the following function:

    ```
    CREATE function f_charindex(@s1 varchar(max), @s2 varchar(max)) RETURNS int
    AS
    BEGIN
    declare @i int = 1
    WHILE len(@s2) >= len(@s1)
    BEGIN
      if LOWER(@s1) = LOWER(substring(@s2,1,len(@s1))) return @i
      set @i += 1
      set @s2 = substring(@s2,2,999999999)
    END
    return 0
    END
    go
    ```