{"id":25425,"date":"2017-10-15T20:14:03","date_gmt":"2017-10-15T14:44:03","guid":{"rendered":"https:\/\/www.wikitechy.com\/technology\/?p=25425"},"modified":"2017-10-15T20:14:03","modified_gmt":"2017-10-15T14:44:03","slug":"c-programming-searching-patterns-set-2-kmp-algorithm-2-2","status":"publish","type":"post","link":"https:\/\/www.wikitechy.com\/technology\/c-programming-searching-patterns-set-2-kmp-algorithm-2-2\/","title":{"rendered":"C programming for Searching for Patterns Set 2 KMP Algorithm"},"content":{"rendered":"<p>Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n &gt; m.<\/p>\n<p>Examples:<\/p>\n<pre>Input:  txt[] = \"THIS IS A TEST TEXT\"\r\n        pat[] = \"TEST\"\r\nOutput: Pattern found at index 10\r\n\r\nInput:  txt[] =  \"AABAACAADAABAABA\"\r\n        pat[] =  \"AABA\"\r\nOutput: Pattern found at index 0\r\n        Pattern found at index 9\r\n        Pattern found at index 12\r\n<img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter size-full wp-image-25432\" src=\"https:\/\/www.wikitechy.com\/technology\/wp-content\/uploads\/2017\/05\/Searching-for-Patterns-Set-2-KMP-Algorithm.png\" alt=\"C programming for Searching for Patterns Set 2 KMP Algorithm\" width=\"704\" height=\"384\" srcset=\"https:\/\/www.wikitechy.com\/technology\/wp-content\/uploads\/2017\/05\/Searching-for-Patterns-Set-2-KMP-Algorithm.png 704w, https:\/\/www.wikitechy.com\/technology\/wp-content\/uploads\/2017\/05\/Searching-for-Patterns-Set-2-KMP-Algorithm-300x164.png 300w\" sizes=\"(max-width: 704px) 100vw, 704px\" \/>\r\n<\/pre>\n<p>Pattern searching is an important problem in computer science. When we do search for a string in notepad\/word file or browser or database, pattern searching algorithms are used to show the search results.<\/p>\n<p>We have discussed Naive pattern searching algorithm in the previous post. The worst case complexity of Naive algorithm is O(m(n-m+1)). Time complexity of KMP algorithm is O(n) in worst case.<\/p>\n[ad type=&#8221;banner&#8221;]\n<p>KMP (Knuth Morris Pratt) Pattern Searching<br \/>\nThe Naive pattern searching algorithm doesn\u2019t work well in cases where we see many matching characters followed by a mismatching character. Following are some examples.<\/p>\n<p>txt[] = &#8220;AAAAAAAAAAAAAAAAAB&#8221;<br \/>\npat[] = &#8220;AAAAB&#8221;<\/p>\n<p>txt[] = &#8220;ABABABCABABABCABABABC&#8221;<br \/>\npat[] = &#8220;ABABAC&#8221; (not a worst case, but a bad case for Naive)<br \/>\nThe KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP\u2019s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of next window. We take advantage of this information to avoid matching the characters that we know will anyway match. Let us consider below example to understand this.<\/p>\n<p>Matching Overview<br \/>\ntxt = &#8220;AAAAABAAABA&#8221;<br \/>\npat = &#8220;AAAA&#8221;<\/p>\n<p>We compare first window of txt with pat<br \/>\ntxt = &#8220;AAAAABAAABA&#8221;<br \/>\npat = &#8220;AAAA&#8221; [Initial position]\nWe find a match. This is same as Naive String Matching.<\/p>\n<p>In the next step, we compare next window of txt with pat.<br \/>\ntxt = &#8220;AAAAABAAABA&#8221;<br \/>\npat = &#8220;AAAA&#8221; [Pattern shifted one position]\nThis is where KMP does optimization over Naive. In this<br \/>\nsecond window, we only compare fourth A of pattern<br \/>\nwith fourth character of current window of text to decide<br \/>\nwhether current window matches or not. Since we know<br \/>\nfirst three characters will anyway match, we skipped<br \/>\nmatching first three characters.<\/p>\n[ad type=&#8221;banner&#8221;]\n<p>Need of Preprocessing?<br \/>\nAn important question arises from above explanation,<br \/>\nhow to know how many characters to be skipped. To know<br \/>\nthis, we pre-process pattern and prepare an integer array<br \/>\nlps[] that tells us count of characters to be skipped.<br \/>\nPreprocessing Overview:<\/p>\n<p>KMP algorithm does preproceses pat[] and constructs an auxiliary lps[] of size m (same as size of pattern) which is used to skip characters while matching.<br \/>\nname lps indicates longest proper prefix which is also suffix.. A proper prefix is prefix with whole string not allowed. For example, prefixes of \u201cABC\u201d are \u201c\u201d, \u201cA\u201d, \u201cAB\u201d and \u201cABC\u201d. Proper prefixes are \u201c\u201d, \u201cA\u201d and \u201cAB\u201d. Suffixes of the string are \u201c\u201d, \u201cC\u201d, \u201cBC\u201d and \u201cABC\u201d.<br \/>\nFor each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].<br \/>\nlps[i] = the longest proper prefix of pat[0..i]\nwhich is also a suffix of pat[0..i].<br \/>\nNote : lps[i] could also be defined as longest prefix which is also proper suffix. We need to use proper at one place to make sure that the whole substring is not considered.<\/p>\n<p>Examples of lps[] construction:<br \/>\nFor the pattern \u201cAAAA\u201d,<br \/>\nlps[] is [0, 1, 2, 3]\n<p>For the pattern \u201cABCDE\u201d,<br \/>\nlps[] is [0, 0, 0, 0, 0]\n<p>For the pattern \u201cAABAACAABAA\u201d,<br \/>\nlps[] is [0, 1, 0, 1, 2, 0, 1, 2, 3, 4, 5]\n<p>For the pattern \u201cAAACAAAAAC\u201d,<br \/>\nlps[] is [0, 1, 2, 0, 1, 2, 3, 3, 3, 4]\n<p>For the pattern \u201cAAABAAA\u201d,<br \/>\nlps[] is [0, 1, 2, 0, 1, 2, 3]\nSearching Algorithm:<br \/>\nUnlike Naive algorithm, where we slide the pattern by one and compare all characters at each shift, we use a value from lps[] to decide the next characters to be matched. The idea is to not match character that we know will anyway match.<\/p>\n<p>How to use lps[] to decide next positions (or to know number of characters to be skipped)?<\/p>\n<p>We start comparison of pat[j] with j = 0 with characters of current window of text.<br \/>\nWe keep matching characters txt[i] and pat[j] and keep incrementing i and j while pat[j] and txt[i] keep matching.<br \/>\nWhen we see a mismatch<br \/>\nWe know that characters pat[0..j-1] match with txt[i-j+1\u2026i-1] (Note that j starts with 0 and increment it only when there is a match).<br \/>\nWe also know (from above definition) that lps[j-1] is count of characters of pat[0\u2026j-1] that are both proper prefix and suffix.<br \/>\nFrom above two points, we can conclude that we do not need to match these lps[j-1] characters with txt[i-j\u2026i-1] because we know that these characters will anyway match. Let us consider above example to understand this.<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\nlps[] = {0, 1, 2, 3}<\/p>\n<p>i = 0, j = 0<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j[ match, do i++, j++<\/p>\n<p>i = 1, j = 1<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j[ match, do i++, j++<\/p>\n<p>i = 2, j = 2<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\npat[i] and pat[j[ match, do i++, j++<\/p>\n<p>i = 3, j = 3<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j[ match, do i++, j++<\/p>\n<p>i = 4, j = 4<br \/>\nSince j == M, print pattern found and resset j,<br \/>\nj = lps[j-1] = lps[3] = 3<\/p>\n<p>Here unlike Naive algorithm, we do not match first three<br \/>\ncharacters of this window. Value of lps[j-1] (in above<br \/>\nstep) gave us index of next character to match.<br \/>\ni = 4, j = 3<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j[ match, do i++, j++<\/p>\n<p>i = 5, j = 4<br \/>\nSince j == M, print pattern found and reset j,<br \/>\nj = lps[j-1] = lps[3] = 3<\/p>\n<p>Again unlike Naive algorithm, we do not match first three<br \/>\ncharacters of this window. Value of lps[j-1] (in above<br \/>\nstep) gave us index of next character to match.<br \/>\ni = 5, j = 3<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j] do NOT match and j &gt; 0, change only j<br \/>\nj = lps[j-1] = lps[2] = 2<\/p>\n<p>i = 5, j = 2<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j] do NOT match and j &gt; 0, change only j<br \/>\nj = lps[j-1] = lps[1] = 1<\/p>\n<p>i = 5, j = 1<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j] do NOT match and j &gt; 0, change only j<br \/>\nj = lps[j-1] = lps[0] = 0<\/p>\n<p>i = 5, j = 0<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j] do NOT match and j is 0, we do i++.<\/p>\n<p>i = 6, j = 0<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j] match, do i++ and j++<\/p>\n<p>i = 7, j = 1<br \/>\ntxt[] = &#8220;AAAAABAAABA&#8221;<br \/>\npat[] = &#8220;AAAA&#8221;<br \/>\ntxt[i] and pat[j] match, do i++ and j++<\/p>\n<p>&nbsp;<\/p>\n<div class=\"code-embed-wrapper\"> <div class=\"code-embed-infos\"> <span class=\"code-embed-name\">c<\/span> <\/div> <pre class=\"language-c code-embed-pre line-numbers\"  data-start=\"1\" data-line-offset=\"0\"><code class=\"language-c code-embed-code\">\/\/ C++ program for implementation of KMP pattern searching<br\/>\/\/ algorithm<br\/>#include&lt;bits\/stdc++.h&gt;<br\/> <br\/>void computeLPSArray(char *pat, int M, int *lps);<br\/> <br\/>\/\/ Prints occurrences of txt[] in pat[]<br\/>void KMPSearch(char *pat, char *txt)<br\/>{<br\/>    int M = strlen(pat);<br\/>    int N = strlen(txt);<br\/> <br\/>    \/\/ create lps[] that will hold the longest prefix suffix<br\/>    \/\/ values for pattern<br\/>    int lps[M];<br\/> <br\/>    \/\/ Preprocess the pattern (calculate lps[] array)<br\/>    computeLPSArray(pat, M, lps);<br\/> <br\/>    int i = 0;  \/\/ index for txt[]<br\/>    int j  = 0;  \/\/ index for pat[]<br\/>    while (i &lt; N)<br\/>    {<br\/>        if (pat[j] == txt[i])<br\/>        {<br\/>            j++;<br\/>            i++;<br\/>        }<br\/> <br\/>        if (j == M)<br\/>        {<br\/>            printf(&quot;Found pattern at index %d \\n&quot;, i-j);<br\/>            j = lps[j-1];<br\/>        }<br\/> <br\/>        \/\/ mismatch after j matches<br\/>        else if (i &lt; N &amp;&amp; pat[j] != txt[i])<br\/>        {<br\/>            \/\/ Do not match lps[0..lps[j-1]] characters,<br\/>            \/\/ they will match anyway<br\/>            if (j != 0)<br\/>                j = lps[j-1];<br\/>            else<br\/>                i = i+1;<br\/>        }<br\/>    }<br\/>}<br\/> <br\/>\/\/ Fills lps[] for given patttern pat[0..M-1]<br\/>void computeLPSArray(char *pat, int M, int *lps)<br\/>{<br\/>    \/\/ length of the previous longest prefix suffix<br\/>    int len = 0;<br\/> <br\/>    lps[0] = 0; \/\/ lps[0] is always 0<br\/> <br\/>    \/\/ the loop calculates lps[i] for i = 1 to M-1<br\/>    int i = 1;<br\/>    while (i &lt; M)<br\/>    {<br\/>        if (pat[i] == pat[len])<br\/>        {<br\/>            len++;<br\/>            lps[i] = len;<br\/>            i++;<br\/>        }<br\/>        else \/\/ (pat[i] != pat[len])<br\/>        {<br\/>            \/\/ This is tricky. Consider the example.<br\/>            \/\/ AAACAAAA and i = 7. The idea is similar <br\/>            \/\/ to search step.<br\/>            if (len != 0)<br\/>            {<br\/>                len = lps[len-1];<br\/> <br\/>                \/\/ Also, note that we do not increment<br\/>                \/\/ i here<br\/>            }<br\/>            else \/\/ if (len == 0)<br\/>            {<br\/>                lps[i] = 0;<br\/>                i++;<br\/>            }<br\/>        }<br\/>    }<br\/>}<br\/> <br\/>\/\/ Driver program to test above function<br\/>int main()<br\/>{<br\/>    char *txt = &quot;ABABDABACDABABCABAB&quot;;<br\/>    char *pat = &quot;ABABCABAB&quot;;<br\/>    KMPSearch(pat, txt);<br\/>    return 0;<br\/>}<\/code><\/pre> <\/div>\n<p><strong>Output:<\/strong><br \/>\nFound pattern at index 10<\/p>\n[ad type=&#8221;banner&#8221;]\n<strong>Preprocessing Algorithm:<\/strong><br \/>\nIn the preprocessing part, we calculate values in lps[]. To do that, we keep track of the length of the longest prefix suffix value (we use len variable for this purpose) for the previous index. We initialize lps[0] and len as 0. If pat[len] and pat[i] match, we increment len by 1 and assign the incremented value to lps[i]. If pat[i] and pat[len] do not match and len is not 0, we update len to lps[len-1]. See computeLPSArray () in the below code for details.<\/p>\n<p>Illustration of preprocessing (or construction of lps[])<\/p>\n<p>pat[] = &#8220;AAACAAAA&#8221;<\/p>\n<p>len = 0, i = 0.<br \/>\nlps[0] is always 0, we move<br \/>\nto i = 1<\/p>\n<p>len = 0, i = 1.<br \/>\nSince pat[len] and pat[i] match, do len++,<br \/>\nstore it in lps[i] and do i++.<br \/>\nlen = 1, lps[1] = 1, i = 2<\/p>\n<p>len = 1, i = 2.<br \/>\nSince pat[len] and pat[i] match, do len++,<br \/>\nstore it in lps[i] and do i++.<br \/>\nlen = 2, lps[2] = 2, i = 3<\/p>\n<p>len = 2, i = 3.<br \/>\nSince pat[len] and pat[i] do not match, and len &gt; 0,<br \/>\nset len = lps[len-1] = lps[1] = 1<\/p>\n<p>len = 1, i = 3.<br \/>\nSince pat[len] and pat[i] do not match and len &gt; 0,<br \/>\nlen = lps[len-1] = lps[0] = 0<\/p>\n<p>len = 0, i = 3.<br \/>\nSince pat[len] and pat[i] do not match and len = 0,<br \/>\nSet lps[3] = 0 and i = 4.<\/p>\n<p>len = 0, i = 4.<br \/>\nSince pat[len] and pat[i] match, do len++,<br \/>\nstore it in lps[i] and do i++.<br \/>\nlen = 1, lps[4] = 1, i = 5<\/p>\n<p>len = 1, i = 5.<br \/>\nSince pat[len] and pat[i] match, do len++,<br \/>\nstore it in lps[i] and do i++.<br \/>\nlen = 2, lps[5] = 2, i = 6<\/p>\n<p>len = 2, i = 6.<br \/>\nSince pat[len] and pat[i] match, do len++,<br \/>\nstore it in lps[i] and do i++.<br \/>\nlen = 3, lps[6] = 3, i = 7<\/p>\n<p>len = 3, i = 7.<br \/>\nSince pat[len] and pat[i] do not match and len &gt; 0,<br \/>\nset len = lps[len-1] = lps[2] = 2<\/p>\n<p>len = 2, i = 7.<br \/>\nSince pat[len] and pat[i] match, do len++,<br \/>\nstore it in lps[i] and do i++.<br \/>\nlen = 3, lps[7] = 3, i = 8<\/p>\n[ad type=&#8221;banner&#8221;]\n","protected":false},"excerpt":{"rendered":"<p>C programming for Searching for Patterns Set 2 KMP Algorithm &#8211; Searching and sorting &#8211; Given a text txt[0..n-1] and a pattern pat[0..m-1].<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[69969,69866,1,71670],"tags":[71318,71323,71314,73319,73263,72306,73273,73321,73300,73324,73320,73322,71316,73294,73282,73323,73315,73295,73183,73194,73283,73272,73189,73285,73265,73298,73280,73318,73270,73279,73296,73274,73281,73317,73288,73264,70272,70099,70315,70308,73184,73299,73188,73287,73316,73271,73275,73276,73293],"class_list":["post-25425","post","type-post","status-publish","format-standard","hentry","category-algorithm","category-c-programming","category-coding","category-searching-and-sorting","tag-algorithm-c","tag-algorithm-implementation-in-c","tag-algorithm-in-c","tag-algorithm-in-c-programming","tag-algorithm-to-compare-two-strings","tag-algorithm-visualization","tag-best-pattern-matching-algorithm","tag-boyer-moore-algorithm-example","tag-boyer-moore-algorithm-explanation","tag-boyer-moore-algorithm-in-c","tag-boyer-moore-pattern-matching-algorithm","tag-boyer-moore-pattern-matching-algorithm-example","tag-c-algorithms","tag-clrs-pdf","tag-first-pattern-matching-algorithm","tag-hashing-algorithm-in-c","tag-implementation-of-algorithm-in-c","tag-introduction-to-algorithms-pdf","tag-kmp","tag-kmp-algorithm","tag-kmp-algorithm-explained","tag-kmp-program-in-c","tag-knuth-algorithm","tag-knuth-morris-pratt-example","tag-morris-chair","tag-morris-table","tag-mp-algorithm","tag-online-string-compare","tag-pattern-matching-algorithm","tag-pattern-matching-algorithm-in-c","tag-pattern-matching-algorithm-in-data-structure","tag-pattern-matching-algorithm-ppt","tag-pattern-matching-ppt","tag-pattern-matching-program-in-c","tag-rabin-karp-algorithm-animation","tag-rabin-karp-algorithm-code-in-c","tag-search-algorithms","tag-search-inc","tag-searching-algorithms-in-java","tag-searching-c","tag-string-algorithms","tag-string-algorithms-pdf","tag-string-match","tag-string-pattern-matching-algorithms","tag-string-pattern-programs-in-c","tag-strstr-implementation-in-c","tag-text-comparison-algorithm","tag-what-is-pattern-matching-algorithm","tag-word-matching-algorithm"],"_links":{"self":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/posts\/25425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/comments?post=25425"}],"version-history":[{"count":0,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/posts\/25425\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/media?parent=25425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/categories?post=25425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/tags?post=25425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}