python tutorial - Python Hashing (Hash Tables And Hashlib) - learn python - python programming




Hash Tables

  • While an array can be used to construct hash tables, array indexes its elements using integers.
  • However, if we want to store data and use keys other than integer, such as 'string', we may want to use dictionary.
  • Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.
  • We declare an empty dictionary like this:
>>> D = {}
click below button to copy the code. By Python tutorial team
  • Then, we can add its elements:
>>> D['a'] = 1
>>> D['b'] = 2
>>> D['c'] = 3
>>> D
{'a': 1, 'c': 3, 'b': 2}
click below button to copy the code. By Python tutorial team
  • It's a structure with (key, value) pair:
D[key] = value
click below button to copy the code. By Python tutorial team
  • The string used to "index" the hash table D is called the "key". To access the data stored in the table, we need to know the key:
>>> D['b'] 
2
click below button to copy the code. By Python tutorial team

How we loop through the hash table ?

>>> for k in D.keys():
...     print D[k]
... 
1
3
2
click below button to copy the code. By Python tutorial team
  • If we want to print the (key, value) pair:
>>> for k,v in D.items():
...     print k,':',v
... 
a : 1
c : 3
b : 2
click below button to copy the code. By Python tutorial team

Hashing from two arrays

  • Using two Arrays of equal length, create a Hash object where the elements from one array (the keys) are associated with the elements of the other (the values):
>>> keys = ['a', 'b', 'c']
>>> values = [1, 2, 3]
>>> hash = {k:v for k, v in zip(keys, values)}
>>> hash
{'a': 1, 'c': 3, 'b': 2}
click below button to copy the code. By Python tutorial team

Hashing

  • Here are some hashing samples using built-in hash function:
>>> map(hash, [0, 1, 2, 3])
[0, 1, 2, 3]
>>> map(hash, ['0','1','2','3'])
[6144018481, 6272018864, 6400019251, 6528019634]
>>> hash('0')
6144018481
click below button to copy the code. By Python tutorial team
  • As we can see from the example, Python is using different hash() function depending on the type of data.
  • Python provides hashlib for secure hashes and message digests:

md5(), sha*():

>>> import hashlib

>>> hashlib.md5('a')

>>> hashlib.md5('a').digest()
'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a;'
>>> hashlib.md5('a').hexdigest()
'0cc175b9c0f1b6a831c399e269772661'

>>> hashlib.sha512('a')

>>> hashlib.sha512('a').digest()
'\[email protected]\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i\xfb\xbb\xce\x8c\x12\xcf\xc7\x95{&R;\xfe\x9au'
>>> hashlib.sha512('a').hexdigest()
'1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75'
>>> 
click below button to copy the code. By Python tutorial team

Hashing Sample Code

  • In this section, we used 64 bit integer (hash value from hash()) for the comparison of shingles instead of directly working on the string.
from __future__ import division
import itertools
import re
import hashlib

# a shingle in this code is a string with K-words
K = 4

def jaccard_set(s1, s2):
    " takes two sets and returns Jaccard coefficient"
    u = s1.union(s2)
    i = s1.intersection(s2)
    return len(i)/len(u)

def make_a_set_of_tokens(doc):
    """makes a set of K-tokens"""

    # replace non-alphanumeric char with a space, and then split
    tokens = re.sub("[^\w]", " ",  doc).split()

    sh = set()
    for i in range(len(tokens)-K):
        t = tokens[i]
        for x in tokens[i+1:i+K]:
            t += ' ' + x 
        sh.add(t)
    return sh

if __name__ == '__main__':

    documents = ["The legal system is made up of civil courts, criminal courts and specialty courts
such as family law courts and bankruptcy court...."] shingles = [] # handle documents one by one # makes a list of sets which are compresized of a list of K words string for doc in documents: # makes a set of tokens # sh = set([' ', ..., ' ']) sh = make_a_set_of_tokens(doc) print("sh = %s") %(sh) # hasing bucket = map(hash, sh) # print("bucket = %s") %(bucket) # shingles : list of sets (sh) shingles.append(set(bucket)) #print("shingles=%s") %(shingles) combinations = list( itertools.combinations([x for x in range(len(shingles))], 2) ) print("combinations=%s") %(combinations) # compare each pair in combinations tuple of shingles for c in combinations: i1 = c[0] i2 = c[1] jac = jaccard_set(shingles[i1], shingles[i2]) print("%s : jaccard=%s") %(c,jac)
click below button to copy the code. By Python tutorial team
  • Output is exactly the same as the one we got using string comparison:
combinations=[(0, 1), (0, 2), (1, 2)]
(0, 1) : jaccard=0.0196078431373
(0, 2) : jaccard=0.0
(1, 2) : jaccard=0.0

Related Searches to Python Hashing (Hash Tables And Hashlib)