Find non-unique characters in a given string in O(n) time with constant space i.e with no extra auxiliary array

Tag: string , algorithm Author: a943509525 Date: 2014-02-01

Given a string s containing only lower case alphabets (a - z), find (i.e print) the characters that are repeated.

For ex, if string s = "aabcacdddec"

Output: a c d

3 approaches to this problem exists:

  1. [brute force] Check every char of string (i.e s[i] with every other char and print if both are same) Time complexity: O(n^2) Space complexity: O(1)

  2. [sort and then compare adjacent elements] After sorting (in O(n log(n) time), traverse the string and check if s[i] ans s[i + 1] are equal Time complexity: O(n logn) + O(n) = O(n logn) Space complexity: O(1)

  3. [store the character count in an array] Create an array of size 26 (to keep track of a - z) and for every s[i], increment value stored at index = s[i] - 26 in the array. Finally traverse the array and print all elements (i.e 'a' + i) with value greater than 1 Time complexity: O(n) Space complexity: O(1) but we have a separate array for storing the frequency of each element.

Is there a O(n) approach that DOES NOT use any array/hash table/map (etc)?

HINT: Use BIT Vectors

Other Answer1

This is the element distinctness problem, so generally speaking - no there is no way to solve it in O(n) without extra space.

However, if you regard the alphabet as constant size (a-z characters only is pretty constant) you can either create a bitset of these characters, in O(1) space [ it is constant!] or check for each character in O(n) if it repeats more than once, it will be O(constant*n), which is still in O(n).

Pseudo code for 1st solution:

bit seen[] = new bit[SIZE_OF_ALPHABET] //contant!
bit printed[] =  new bit[SIZE_OF_ALPHABET] //so is this!
for each i in seen.length: //init:
    seen[i] = 0
    printed[i] = 0
for each character c in string: //traverse the string:
    i = intValue(c)
    //already seen it and didn't print it? print it now!
    if seen[i] == 1 and printed[i] == 0: 
        print c
        printed[i] = 1
    else:
        seen[i] = 1

Pseudo code for 2nd solution:

for each character c from a-z: //constant number of repeats is O(1)
   count = 0
   for each character x in the string: //O(n)
        if x==c:
           count += 1
   if count > 1
        print count

comments:

Using a bitset means you can do this in 1 pass, even better than my answer.

Other Answer2

The size of the character set is a constant, so you could scan the input 26 times. All you need is a counter to store the number of times you've seen the character corresponding to the current iteration. At the end of each iteration, print that character if your counter is greater than 1.

It's O(n) in runtime and O(1) in auxiliary space.

Other Answer3

Implementation in Java

public static void findDuplicate(String str) {
        int checker = 0;
        char c = 'a';
        for (int i = 0; i < str.length(); ++i) {
            int val = str.charAt(i) - c;
            if ((checker & (1 << val)) > 0) {
                System.out.println((char)(c+val));
            }else{
                       checker |= (1 << val);                        
                    }

        }
    }

Uses as int as storage and performs bit wise operator to find the duplicates.

it is in O(n) .. explanation follows

Input as "abddc"

i==0

  STEP #1 : val = 98 - 98  (0)  str.charAt(0) is a and conversion char to int is 98 ( ascii of 'a')

  STEP #2 : 1 << val equal to ( 1 << 0 ) equal to 1  finally 1 & 0 is 0 
  STEP #3 :  checker = 0 | ( 1 << 0)  equal to 0 | 1 equal to 1 checker is 1

i==1

  STEP #1 : val = 99 - 98  (1)  str.charAt(1) is b and conversion char to int is 99 ( ascii of 'b')

  STEP #2 :  1 << val equal to ( 1 << 1 ) equal to 2  finally 1 & 2 is 0 
  STEP #3 :  checker = 2 | ( 1 << 1)  equal to 2 | 1 equal to 2 finally checker is 2

i==2

  STEP #1 : val = 101 - 98  (3)  str.charAt(2) is d and conversion char to int is 101 ( ascii of 'd')

  STEP #2 :  1 << val equal to ( 1 << 3 ) equal to 8  finally 2 & 8 is 0 
  STEP #3 :  checker = 2 | ( 1 << 3)  equal to 2 | 8 equal to 8 checker is 8

i==3

  STEP #1 : val = 101 - 98  (3)  str.charAt(3) is d and conversion char to int is 101 ( ascii of 'd')

  STEP #2 :  1 << val equal to ( 1 << 3 ) equal to 8  finally 8 & 8 is 8
     Now print 'd'  since the value > 0

You can also use the Bit Vector, depends upon the language it would space efficient. In java i would prefer to use int for this fixed ( just 26) constant case