Algorithm get a new list containing no duplicated item by adding any 2 elements in a big array

Tag: algorithm Author: liguo7123 Date: 2009-08-07

I can only think of this naive algorithm. Any better way? C/C++, Ruby ,Haskell is OK.

arry = [1,5,.....4569895] //1000000 elements ,sorted , no duplicated
newArray = Hash.new
for (i = 0 ; i < arry.length ;i++ )
{
       for (j = 0 ; j < arry.length ;j ++ )
       {
                elem = arry[i] + arry[j]
                if (! newArray.key?(elem))
                {
                    newArray [elem] = arry[i] + arry[j]
                }

       }

}

EDIT : sorry. I have discrete value in the array , instead of [1..1000000]

note that you can optimize the inner for: for (j = i; ...)
you should explain better what you have and what you want
In my interpretation: [1,3,5,7] -> [1+3, 1+5, 1+7=3+5, 3+7, 5+7] = [4,6,8,10,12]
Acutally this is an quite interesting problem. Depending on the values of the array you could get less than the worst case 10^12 elements in the new array, for the given example it could be around 9*10^6 elements. Maybe the other way around is an option, first generation all numbers up to the maximum and then finding value paris which sum is the searched value. This has O(N^2*log N) in the worst case using a binary search, but could potentially be (N*log N).

Best Answer

It would be more efficient to separate the algorithm into two distinct steps. (Warning: pseudocode ahead)

First create n-1 lists by adding the rest of the elements to the ith element. This can be done in parallel for each list. Note that the resulting lists will be sorted.

newArray = array(array.length);
for (i = 0 ; i < array.length ;i++ ) {
  newArray[i] = array(array.length - i - 1);
  for (j = 0; j < array.length - i; j++) {
    newArray[i][j] = array[i] + array[j + i];
  }
}

Second use merge sort in to merge the resulted lists. You can do this in parallel, e.g. merge newArray[0] - newArray[i], newArray[2] - newArray[1-i], ... and then again until you only have one list.

Other Answer1

If the condition says that you should be able to add any item in the range, then the only way i can think of is to check if the sum is not yet in the result list. Since for any number x, there are x different additions that lead to x. (Or x/2 if you think that 1 + 2 and 2 + 1 is the same addition).

Other Answer2

There is one obvious optimization: make the second loop start at the indice i, that way you will avoid having x+y and y+x.

Then if you don't want to use a set, you could use the fact that the items are sorted, so you could build N lists, and merge them while removing the duplicates.

Other Answer3

I'm afraid the best worst-case time complexity is O(n2). For input {20, 21, 22, ...}, you won't get any duplicate adding these numbers. Assuming hash insertions are O(1), you already have the best algorithm...

comments:

Just going through the list adding the numbers is O(n2) already, isn't it ?