python tutorial - Python Hashing (Hash Tables And Hashlib) - learn python - python programming




Hash Tables

  • While an array can be used to construct hash tables, array indexes its elements using integers.
  • However, if we want to store data and use keys other than integer, such as 'string', we may want to use dictionary.
  • Dictionaries in Python are implemented using hash tables. It is an array whose indexes are obtained using a hash function on the keys.
  • We declare an empty dictionary like this:
>>> D = {}
click below button to copy the code. By Python tutorial team
  • Then, we can add its elements:
>>> D['a'] = 1
>>> D['b'] = 2
>>> D['c'] = 3
>>> D
{'a': 1, 'c': 3, 'b': 2}
click below button to copy the code. By Python tutorial team
  • It's a structure with (key, value) pair:
D[key] = value
click below button to copy the code. By Python tutorial team
  • The string used to "index" the hash table D is called the "key". To access the data stored in the table, we need to know the key:
>>> D['b'] 
2
click below button to copy the code. By Python tutorial team

How we loop through the hash table ?

>>> for k in D.keys():
...     print D[k]
... 
1
3
2
click below button to copy the code. By Python tutorial team
  • If we want to print the (key, value) pair:
>>> for k,v in D.items():
...     print k,':',v
... 
a : 1
c : 3
b : 2
click below button to copy the code. By Python tutorial team

Hashing from two arrays

  • Using two Arrays of equal length, create a Hash object where the elements from one array (the keys) are associated with the elements of the other (the values):
>>> keys = ['a', 'b', 'c']
>>> values = [1, 2, 3]
>>> hash = {k:v for k, v in zip(keys, values)}
>>> hash
{'a': 1, 'c': 3, 'b': 2}
click below button to copy the code. By Python tutorial team

Hashing

  • Here are some hashing samples using built-in hash function:
>>> map(hash, [0, 1, 2, 3])
[0, 1, 2, 3]
>>> map(hash, ['0','1','2','3'])
[6144018481, 6272018864, 6400019251, 6528019634]
>>> hash('0')
6144018481
click below button to copy the code. By Python tutorial team
  • As we can see from the example, Python is using different hash() function depending on the type of data.
  • Python provides hashlib for secure hashes and message digests:

md5(), sha*():

>>> import hashlib

>>> hashlib.md5('a')

>>> hashlib.md5('a').digest()
'\x0c\xc1u\xb9\xc0\xf1\xb6\xa81\xc3\x99\xe2iw&a;'
>>> hashlib.md5('a').hexdigest()
'0cc175b9c0f1b6a831c399e269772661'

>>> hashlib.sha512('a')

>>> hashlib.sha512('a').digest()
'\[email protected]\xfc\x92\xda$\x16\x94u\ty\xeel\xf5\x82\xf2\xd5\xd7\xd2\x8e\x183]\xe0Z\xbcT\xd0V\x0e\x0fS\x02\x86\x0ce+\xf0\x8dV\x02R\xaa^t!\x05F\xf3i\xfb\xbb\xce\x8c\x12\xcf\xc7\x95{&R;\xfe\x9au'
>>> hashlib.sha512('a').hexdigest()
'1f40fc92da241694750979ee6cf582f2d5d7d28e18335de05abc54d0560e0f5302860c652bf08d560252aa5e74210546f369fbbbce8c12cfc7957b2652fe9a75'
>>> 
click below button to copy the code. By Python tutorial team

Hashing Sample Code

  • In this section, we used 64 bit integer (hash value from hash()) for the comparison of shingles instead of directly working on the string.
from __future__ import division
import itertools
import re
import hashlib

# a shingle in this code is a string with K-words
K = 4

def jaccard_set(s1, s2):
    " takes two sets and returns Jaccard coefficient"
    u = s1.union(s2)
    i = s1.intersection(s2)
    return len(i)/len(u)

def make_a_set_of_tokens(doc):
    """makes a set of K-tokens"""

    # replace non-alphanumeric char with a space, and then split
    tokens = re.sub("[^\w]", " ",  doc).split()

    sh = set()
    for i in range(len(tokens)-K):
        t = tokens[i]
        for x in tokens[i+1:i+K]:
            t += ' ' + x 
        sh.add(t)
    return sh

if __name__ == '__main__':

    documents = ["The legal system is made up of civil courts, criminal courts and specialty courts
such as family law courts and bankruptcy court...."] shingles = [] # handle documents one by one # makes a list of sets which are compresized of a list of K words string for doc in documents: # makes a set of tokens # sh = set([' ', ..., ' ']) sh = make_a_set_of_tokens(doc) print("sh = %s") %(sh) # hasing bucket = map(hash, sh) # print("bucket = %s") %(bucket) # shingles : list of sets (sh) shingles.append(set(bucket)) #print("shingles=%s") %(shingles) combinations = list( itertools.combinations([x for x in range(len(shingles))], 2) ) print("combinations=%s") %(combinations) # compare each pair in combinations tuple of shingles for c in combinations: i1 = c[0] i2 = c[1] jac = jaccard_set(shingles[i1], shingles[i2]) print("%s : jaccard=%s") %(c,jac)
click below button to copy the code. By Python tutorial team
  • Output is exactly the same as the one we got using string comparison:
combinations=[(0, 1), (0, 2), (1, 2)]
(0, 1) : jaccard=0.0196078431373
(0, 2) : jaccard=0.0
(1, 2) : jaccard=0.0

Related Searches to Python Hashing (Hash Tables And Hashlib)

Adblocker detected! Please consider reading this notice.

We've detected that you are using AdBlock Plus or some other adblocking software which is preventing the page from fully loading.

We don't have any banner, Flash, animation, obnoxious sound, or popup ad. We do not implement these annoying types of ads!

We need money to operate the site, and almost all of it comes from our online advertising.

Please add wikitechy.com to your ad blocking whitelist or disable your adblocking software.

×