{"id":25500,"date":"2017-10-25T19:53:20","date_gmt":"2017-10-25T14:23:20","guid":{"rendered":"https:\/\/www.wikitechy.com\/technology\/?p=25500"},"modified":"2017-10-25T19:53:20","modified_gmt":"2017-10-25T14:23:20","slug":"c-programming-for-pattern-searching-set-7-boyer-moore-algorithm-bad-character-heuristic","status":"publish","type":"post","link":"https:\/\/www.wikitechy.com\/technology\/c-programming-for-pattern-searching-set-7-boyer-moore-algorithm-bad-character-heuristic\/","title":{"rendered":"C++ programming for Pattern Searching Set 7 Boyer Moore Algorithm \u2013 Bad Character Heuristic"},"content":{"rendered":"<p>Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n &gt; m.<span id=\"more-19262\"><\/span><\/p>\n<p>Examples:<\/p>\n<pre>Input:  txt[] = \"THIS IS A TEST TEXT\"\r\n        pat[] = \"TEST\"\r\nOutput: Pattern found at index 10\r\n\r\nInput:  txt[] =  \"AABAACAADAABAABA\"\r\n        pat[] =  \"AABA\"\r\nOutput: Pattern found at index 0\r\n        Pattern found at index 9\r\n        Pattern found at index 12\r\n<img fetchpriority=\"high\" decoding=\"async\" class=\"aligncenter size-full wp-image-25432\" src=\"https:\/\/www.wikitechy.com\/technology\/wp-content\/uploads\/2017\/05\/Searching-for-Patterns-Set-2-KMP-Algorithm.png\" alt=\"C++ programming for Pattern Searching Set 7 Boyer Moore Algorithm \u2013 Bad Character Heuristic\" width=\"704\" height=\"384\" srcset=\"https:\/\/www.wikitechy.com\/technology\/wp-content\/uploads\/2017\/05\/Searching-for-Patterns-Set-2-KMP-Algorithm.png 704w, https:\/\/www.wikitechy.com\/technology\/wp-content\/uploads\/2017\/05\/Searching-for-Patterns-Set-2-KMP-Algorithm-300x164.png 300w\" sizes=\"(max-width: 704px) 100vw, 704px\" \/><\/pre>\n<p>Pattern searching is an important problem in computer science. When we do search for a string in notepad\/word file or browser or database, pattern searching algorithms are used to show the search results.<\/p>\n[ad type=&#8221;banner&#8221;]\n<p>We have discussed the following algorithms in the previous posts:<\/p>\n<p>Naive Algorithm<br \/>\nKMP Algorithm<br \/>\nRabin Karp Algorithm<br \/>\nFinite Automata based Algorithm<\/p>\n<p>In this post, we will discuss Boyer Moore pattern searching algorithm. Like KMP and Finite Automata algorithms, Boyer Moore algorithm also preprocesses the pattern.<br \/>\nBoyer Moore is a combination of following two approaches.<br \/>\n1) Bad Character Heuristic<br \/>\n2) Good Suffix Heuristic<br \/>\nBoth of the above heuristics can also be used independently to search a pattern in a text. Let us first understand how two independent approaches work together in the Boyer Moore algorithm. If we take a look at the Naive algorithm, it slides the pattern over the text one by one. KMP algorithm does preprocessing over the pattern so that the pattern can be shifted by more than one. The Boyer Moore algorithm does preprocessing for the same reason. It preporcesses the pattern and creates different arrays for both heuristics. At every step, it slides the pattern by max of the slides suggested by the two heuristics. So it uses best of the two heuristics at every step. Unlike the previous pattern searching algorithms, Boyer Moore algorithm starts matching from the last character of the pattern.<\/p>\n[ad type=&#8221;banner&#8221;]\n<p>In this post, we will discuss bad character heuristic, and discuss Good Suffix heuristic in the next post.<\/p>\n<p>The idea of bad character heuristic is simple. The character of the text which doesn\u2019t match with the current character of pattern is called the Bad Character. Whenever a character doesn\u2019t match, we slide the pattern in such a way that aligns the bad character with the last occurrence of it in pattern. We preprocess the pattern and store the last occurrence of every possible character in an array of size equal to alphabet size. If the character is not present at all, then it may result in a shift by m (length of pattern). Therefore, the bad character heuristic takes O(n\/m) time in the best case.<\/p>\n<div class=\"code-embed-wrapper\"> <div class=\"code-embed-infos\"> <span class=\"code-embed-name\">C++<\/span> <\/div> <pre class=\"language-cpp code-embed-pre line-numbers\"  data-start=\"1\" data-line-offset=\"0\"><code class=\"language-cpp code-embed-code\">\/* Program for Bad Character Heuristic of Boyer Moore String Matching Algorithm *\/<br\/> <br\/># include &lt;limits.h&gt;<br\/># include &lt;string.h&gt;<br\/># include &lt;stdio.h&gt;<br\/> <br\/># define NO_OF_CHARS 256<br\/> <br\/>\/\/ A utility function to get maximum of two integers<br\/>int max (int a, int b) { return (a &gt; b)? a: b; }<br\/> <br\/>\/\/ The preprocessing function for Boyer Moore&#039;s bad character heuristic<br\/>void badCharHeuristic( char *str, int size, int badchar[NO_OF_CHARS])<br\/>{<br\/>    int i;<br\/> <br\/>    \/\/ Initialize all occurrences as -1<br\/>    for (i = 0; i &lt; NO_OF_CHARS; i++)<br\/>         badchar[i] = -1;<br\/> <br\/>    \/\/ Fill the actual value of last occurrence of a character<br\/>    for (i = 0; i &lt; size; i++)<br\/>         badchar[(int) str[i]] = i;<br\/>}<br\/> <br\/>\/* A pattern searching function that uses Bad Character Heuristic of<br\/>   Boyer Moore Algorithm *\/<br\/>void search( char *txt,  char *pat)<br\/>{<br\/>    int m = strlen(pat);<br\/>    int n = strlen(txt);<br\/> <br\/>    int badchar[NO_OF_CHARS];<br\/> <br\/>    \/* Fill the bad character array by calling the preprocessing<br\/>       function badCharHeuristic() for given pattern *\/<br\/>    badCharHeuristic(pat, m, badchar);<br\/> <br\/>    int s = 0;  \/\/ s is shift of the pattern with respect to text<br\/>    while(s &lt;= (n - m))<br\/>    {<br\/>        int j = m-1;<br\/> <br\/>        \/* Keep reducing index j of pattern while characters of<br\/>           pattern and text are matching at this shift s *\/<br\/>        while(j &gt;= 0 &amp;&amp; pat[j] == txt[s+j])<br\/>            j--;<br\/> <br\/>        \/* If the pattern is present at current shift, then index j<br\/>           will become -1 after the above loop *\/<br\/>        if (j &lt; 0)<br\/>        {<br\/>            printf(&quot;\\n pattern occurs at shift = %d&quot;, s);<br\/> <br\/>            \/* Shift the pattern so that the next character in text<br\/>               aligns with the last occurrence of it in pattern.<br\/>               The condition s+m &lt; n is necessary for the case when<br\/>               pattern occurs at the end of text *\/<br\/>            s += (s+m &lt; n)? m-badchar[txt[s+m]] : 1;<br\/> <br\/>        }<br\/> <br\/>        else<br\/>            \/* Shift the pattern so that the bad character in text<br\/>               aligns with the last occurrence of it in pattern. The<br\/>               max function is used to make sure that we get a positive<br\/>               shift. We may get a negative shift if the last occurrence<br\/>               of bad character in pattern is on the right side of the<br\/>               current character. *\/<br\/>            s += max(1, j - badchar[txt[s+j]]);<br\/>    }<br\/>}<br\/> <br\/>\/* Driver program to test above funtion *\/<br\/>int main()<br\/>{<br\/>    char txt[] = &quot;ABAAABCD&quot;;<br\/>    char pat[] = &quot;ABC&quot;;<br\/>    search(txt, pat);<br\/>    return 0;<br\/>}<\/code><\/pre> <\/div>\n<p><strong>Output:<\/strong><\/p>\n<p>pattern occurs at shift = 4<br \/>\nThe Bad Character Heuristic may take O(mn) time in worst case. The worst case occurs when all characters of the text and pattern are same. For example, txt[] = \u201cAAAAAAAAAAAAAAAAAA\u201d and pat[] = \u201cAAAAA\u201d.<\/p>\n[ad type=&#8221;banner&#8221;]\n","protected":false},"excerpt":{"rendered":"<p>C++ programming for Pattern Searching Set 7 Boyer Moore Algorithm \u2013 Bad Character Heuristic &#8211; Searching and Sorting &#8211; When we do search for a string in file.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[69969,83515,1,83558,71670],"tags":[71314,73383,73273,73385,73609,73321,73300,73320,73322,73596,7378,73594,70706,73603,73599,73388,73591,73606,73595,73605,73598,73282,73194,73283,73270,73600,73279,73296,73604,73602,73274,73281,73588,73601,73264,73597,70272,70315,73184,73299,73287,73316,73271,73592,73607,73608,73275,73593,73276,73293],"class_list":["post-25500","post","type-post","status-publish","format-standard","hentry","category-algorithm","category-c-programming-3","category-coding","category-moore-algorithm","category-searching-and-sorting","tag-algorithm-in-c","tag-alphabet-pattern-programs-in-c","tag-best-pattern-matching-algorithm","tag-best-searching-algorithm-in-c","tag-boyer-moore-algorithm-c-code","tag-boyer-moore-algorithm-example","tag-boyer-moore-algorithm-explanation","tag-boyer-moore-pattern-matching-algorithm","tag-boyer-moore-pattern-matching-algorithm-example","tag-boyer-moore-program-in-c","tag-c-string","tag-cmore","tag-design-and-analysis-of-algorithms-ppt","tag-dna-matching-algorithm","tag-dna-pattern-matching","tag-dna-pattern-matching-algorithm","tag-example-of-brute-force-algorithm","tag-fast-bowel-movement","tag-fast-search","tag-fast-strings","tag-faststring","tag-first-pattern-matching-algorithm","tag-kmp-algorithm","tag-kmp-algorithm-explained","tag-pattern-matching-algorithm","tag-pattern-matching-algorithm-for-intrusion-detection-system","tag-pattern-matching-algorithm-in-c","tag-pattern-matching-algorithm-in-data-structure","tag-pattern-matching-algorithm-in-data-structure-ppt","tag-pattern-matching-algorithm-in-image-processing","tag-pattern-matching-algorithm-ppt","tag-pattern-matching-ppt","tag-python-string-substring","tag-quick-search-match","tag-rabin-karp-algorithm-code-in-c","tag-ruby-string","tag-search-algorithms","tag-searching-algorithms-in-java","tag-string-algorithms","tag-string-algorithms-pdf","tag-string-pattern-matching-algorithms","tag-string-pattern-programs-in-c","tag-strstr-implementation-in-c","tag-strstr-in-c","tag-substring-swift","tag-text-bm","tag-text-comparison-algorithm","tag-what-is-algorithm-in-c","tag-what-is-pattern-matching-algorithm","tag-word-matching-algorithm"],"_links":{"self":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/posts\/25500","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/comments?post=25500"}],"version-history":[{"count":0,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/posts\/25500\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/media?parent=25500"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/categories?post=25500"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wikitechy.com\/technology\/wp-json\/wp\/v2\/tags?post=25500"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}