What is cloudera impala ?

July 19, 2021 2 Min Read

93 0

Cloudera’s Impala

Impala was the first to bring SQL querying to the public in April 2013. Impala comes with a bunch of interesting features:

Impala can query many file format such as Parquet, Avro, Text, RCFile, SequenceFile
Impala supports data stored in HDFS, Apache HBase and Amazon S3
Impala supports multiple compression codecs:
- Snappy (Recommended for its effective balance between compression ratio and decompression speed),
- Gzip (Recommended when achieving the highest level of compression),
- Deflate (not supported for text files), Bzip2, LZO (for text files only);
Impala provides security through authorization based on Sentry (OS user ID)
- Defining which users are allowed to access which resources,
- What operations are they allowed to perform authentication based on Kerberos + ability to specify Active Directory username/password,
- How does Impala verify the identity of the users to confirm that they are allowed exercise their privileges assigned to that user auditing,
- What operations were attempted,
- Did they succeed or not, allowing to track down suspicious activity; audit data are collected by Cloudera Manager;
Impala supports SSL network encryption between Impala and client programs, and between the Impala-related daemons running on different nodes in the cluster;
Impala allows to use UDFs and UDAFs;
Impala orders the joins automatically to be the most efficient;
Impala allows admission control – prioritization and queueing of queries within impala;
Impala allows multi-user concurrent queries;
Impala caches frequently accessed data in memory;
Impala computes statistics (with COMPUTE STATS);
Impala provides window functions (aggregation OVER PARTITION, RANK, LEAD, LAG, NTILE, and so on) – to provide more advanced SQL analytic capabilities (since version 2.0);
Impala allows external joins and aggregation using disk (since version 2.0) – enables operations to spill to disk if their internal state exceeds the aggregate memory size;
Impala allows subqueries inside WHERE clauses;
Impala allows incremental statistics – only run statistics on the new or changed data for even faster statistics computations;
Impala enables queries on complex nested structures including maps, structs and arrays;
Impala enables merging (MERGE) in updates into existing tables;
Impala enables some OLAP functions (ROLLUP, CUBE, GROUPING SET);
Impala allows use of impala for inserts and updates into HBase.

Tags:

What is cloudera impala ?

Cloudera’s Impala

Tags:

Editor

Other Articles

Communication between vCenter Server and ESX ?

What is Cloudera’s technology stack ?

No Comment! Be the first one.

Leave a Reply

Popular Posts

UV, RI, PDA Detector principles ?

Why Na lamp is used in a polarimeter ?

What is unit of KF ?

What is organic chemistry ?

Categories

Type and hit Enter to search

Type and hit Enter to search

What is cloudera impala ?

Cloudera’s Impala

Tags:

Share Article

Editor

Other Articles

Communication between vCenter Server and ESX ?

What is Cloudera’s technology stack ?

No Comment! Be the first one.

Leave a Reply

Popular Posts

UV, RI, PDA Detector principles ?

Why Na lamp is used in a polarimeter ?

What is unit of KF ?

What is organic chemistry ?

Categories