sqoop - Sqoop Validation - apache sqoop - sqoop tutorial - sqoop hadoop



What is Sqoop-Validation?

  • Validate the data copied, either import or export by comparing the row counts from the source and the target post copy.
  • The Sqoop validate option is used to compare the row counts between source and target after data imported into HDFS.
  • When the rows are deleted or added during the imports, Sqoop tracks this change and updates the log file.

SQOOP process:

 Sqoop Process

Learn Sqoop - Sqoop tutorial - Sqoop Process - Sqoop examples - Sqoop programs

Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation

SQOOP process with Validate option:

 Sqoop Process Validate

Learn Sqoop - Sqoop tutorial - Sqoop Process Validate - Sqoop examples - Sqoop programs

  • There are 3 basic interfaces:
    • ValidationThreshold - Determines if the error margin between the source and target are acceptable: Absolute, Percentage Tolerant, etc. Default implementation is AbsoluteValidationThreshold which ensures the row counts from source and targets are the same.
    • ValidationFailureHandler - Responsible for handling failures: log an error/warning, abort, etc. Default implementation is LogOnFailureHandler that logs a warning message to the configured logger.
    • Validator - Drives the validation logic by delegating the decision to ValidationThreshold and delegating failure handling to ValidationFailureHandler. The default implementation is RowCountValidator which validates the row counts from source and the target.

Syntax:

$ sqoop import (generic-args) (import-args)
$ sqoop export (generic-args) (export-args)
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team
  • Validation arguments are part of import and export arguments.
Sqoop related tags : sqoop import , sqoop interview questions , sqoop export , sqoop commands , sqoop user guide , sqoop documentation

Configuration:

  • The validation framework is extensible and pluggable. It comes with default implementations but the interfaces can be extended to allow custom implementations by passing them as part of the command line arguments as described below.

Validator

  • Property: validator
  • Description: Driver for validation, must implement org.apache.sqoop.validation.Validator
  • Supported values: The value has to be a fully qualified class name.
  • Default value: org.apache.sqoop.validation.RowCountValidator

Validation Threshold

  • Property: validation-threshold
  • Description: Drives the decision based on the validation meeting the threshold or not. Must implement org.apache.sqoop.validation.ValidationThreshold
  • Supported values: The value has to be a fully qualified class name.
  • Default value: org.apache.sqoop.validation.AbsoluteValidationThreshold

Validation Failure Handler

  • Property: validation-failurehandler
  • Description: Responsible for handling failures, must implement org.apache.sqoop.validation.ValidationFailureHandler
  • Supported values: The value has to be a fully qualified class name.
  • Default value: org.apache.sqoop.validation.LogOnFailureHandler

Limitations:

  • Validation currently only validates data copied from a single table into HDFS. The following are the limitations in the current implementation:
    • all-tables option
    • free-form query option
    • Data imported into Hive or HBase
    • table import with --where argument
    • incremental imports

Example Invocations:

A basic import of a table named EMPLOYEES in the corp database that uses validation to validate the row counts:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp  \
    --table EMPLOYEES --validate
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team

A basic export to populate a table named bar with validation enabled:

$ sqoop export --connect jdbc:mysql://db.example.com/foo --table bar  \
    --export-dir /results/bar_data --validate
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team

Another example that overrides the validation args:

$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
    --validate --validator org.apache.sqoop.validation.RowCountValidator \
    --validation-threshold \
          org.apache.sqoop.validation.AbsoluteValidationThreshold \
    --validation-failurehandler \
          org.apache.sqoop.validation.LogOnFailureHandler
Click "Copy code" button to copy into clipboard - By wikitechy - sqoop tutorial - team


Related Searches to Sqoop Validation