MSCK Optimization
Hive stores a list of partitions for each table in its metastore. However, when
                partitions are directly added to or removed from the file system, the Hive metastore
                is unaware of these changes. The  MSCK command
MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];
Hive implements this command as follows:
- 
                    
Hive retrieves all the partitions for the table from the metastore. From the list of partition paths that do not exist in the file system then creates a list of partitions to drop from the metastore.
 - 
                    
Hive gathers the partition paths present in the file system, compares them with the list of partitions from the metastore, and generates a list of partitions that need to be added to the metastore.
 - 
                    
Hive updates the metastore using
ADD,DROP, orSYNCmode. 
Note
When there are many partitions in the metastore, the step to check if a
                    partition does not exist in the file system takes a long time to run because the
                    file system's exists API call must be made for each
                    partition.
In Amazon EMR 6.5.0, Hive introduced a flag called
                    hive.emr.optimize.msck.fs.check. When enabled, this flag causes
                Hive to check for the presence of a partition from the list of partition paths from
                the file system that is generated in step 2 above instead of making file system API
                calls. In Amazon EMR 6.8.0, Hive enabled this optimization by default, eliminating the
                need to set the flag hive.emr.optimize.msck.fs.check.