How to find the difference between two slices of strings in Golang?

Tag: go Author: hsn9933 Date: 2013-09-28

Here is my desired outcome

slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}

difference(slice1, slice2)
=> ["hello"]

I am looking for the difference between the two string slices!

Can we assume order of strings doesn't matter?
@ANisus I presumed that I had to compare each index with the equivalent index. Otherwise it'd involve sorting the slices, or doing a very slow comparison of every slice member to every member of the other slice. Hopefully that's all that's required!
@Intermernet Yes, my answer was with the approach that order index position doesn't matter. Of course, mine is just a simple and "stupid" loop with time O(n*m) . For larger ones, maybe some sort or map solution is better.

Best Answer

Depending on the size of the slices, different solutions might be best.

My answer assumes order doesn't matter.

Using simple loops, only to be used with smaller slices:

package main

import "fmt"

func difference(slice1 []string, slice2 []string) []string {
    var diff []string

    // Loop two times, first to find slice1 strings not in slice2,
    // second loop to find slice2 strings not in slice1
    for i := 0; i < 2; i++ {
        for _, s1 := range slice1 {
            found := false
            for _, s2 := range slice2 {
                if s1 == s2 {
                    found = true
                    break
                }
            }
            // String not found. We add it to return slice
            if !found {
                diff = append(diff, s1)
            }
        }
        // Swap the slices, only if it was the first loop
        if i == 0 {
            slice1, slice2 = slice2, slice1
        }
    }

    return diff
}

func main() {
    slice1 := []string{"foo", "bar", "hello"}
    slice2 := []string{"foo", "world", "bar", "foo"}

    fmt.Printf("%+v\n", difference(slice1, slice2))
}

Output:

[hello world]

Playground: http://play.golang.org/p/KHTmJcR4rg

comments:

I just noticed that if you do something like slice1 := []string{"foo"} slice2 := []string{"foo", "foo"} you don't get any difference. I wish we had some more direction on the finer points of what constitutes a "difference'! Is it data + index, just data, or unique data? Anyway, very nice answer.
@Intermernet In my example, I intentionally added two "foo" in the second slice just to make it clear on how the function behaves in such cases. But yeah, both our answers are correct depending on how OP defines "difference".
Yes I noticed, that was what triggered my thought, and I'm glad that you're illustrating the fact in the example. It's interesting how such a seemingly simple question involves so many caveats :-) Maybe we'll get some elucidation from the OP!

Other Answer1

As mentioned by ANisus, different approaches will suit different sizes of input slices. This solution will work in linear time O(n) independent of input size, but assumes that the "equality" includes index position.

Thus, in the OP's examples of:

slice1 := []string{"foo", "bar","hello"}
slice2 := []string{"foo", "bar"}

The entries foo and bar are equal not just due to value, but also due to their index in the slice.

Given these conditions, you can do something like:

package main

import "fmt"

func difference(s1, s2 []string) string {
    var (
        lenMin  int
        longest []string
        out     string
    )
    // Determine the shortest length and the longest slice
    if len(s1) < len(s2) {
        lenMin = len(s1)
        longest = s2
    } else {
        lenMin = len(s2)
        longest = s1
    }
    // compare common indeces
    for i := 0; i < lenMin; i++ {
        if s1[i] != s2[i] {
            out += fmt.Sprintf("=>\t%s\t%s\n", s1[i], s2[i])
        }
    }
    // add indeces not in common
    for _, v := range longest[lenMin:] {
        out += fmt.Sprintf("=>\t%s\n", v)
    }
    return out
}

func main() {
    slice1 := []string{"foo", "bar", "hello"}
    slice2 := []string{"foo", "bar"}
    fmt.Print(difference(slice1, slice2))
}

Produces:

=> hello

Playground

If you change the slices to be:

func main() {
    slice1 := []string{"foo", "baz", "hello"}
    slice2 := []string{"foo", "bar"}    
    fmt.Print(difference(slice1, slice2))
}

It will produce:

=> baz bar
=> hello

comments:

But what about: slice1 := []string{"foo", "bar", "hello"} and slice2 := []string{"foox","foo", "bar"}? The program outputs foo and bar, which are in both slices.
@topskip Intermernet made the presumption that OP wanted to compare each index with the equivalent index. OP wasn't very clear in what kind of comparison he wanted considering order and index values.
@ANisus you're right, I've missed that. I think I will leave my comment though, so that future users are aware of that issue.
@topskip , Yes, I presumed that the OP wanted to compare index by index. The solution provided by ANisus is probably what the OP was after, although I couldn't think of a way to achieve it while dealing with large slices without degrading performance.

Other Answer2

I use the map to solve this problem

package main

import "fmt"

func main() {
    slice1 := []string{"foo", "bar","hello"}
    slice2 := []string{"foo", "bar","world"}

    diffStr := difference(slice1, slice2)

    for _, diffVal := range diffStr {
        fmt.Println(diffVal)
    }

}

func difference(slice1 []string, slice2 []string) ([]string){
    diffStr := []string{}
    m :=map [string]int{}

    for _, s1Val := range slice1 {
        m[s1Val] = 1
    }
    for _, s2Val := range slice2 {
        m[s2Val] = m[s2Val] + 1
    }

    for mKey, mVal := range m {
        if mVal==1 {
            diffStr = append(diffStr, mKey)
        }
    }

    return diffStr
}

output:
hello
world