zero memory allocation slice filtering

次のように、あるスライスをフィルタリングする関数を書くことがあると思います。

1
2
3
4
5
6
7
8
9
func FilterFoo(arr []string) []string {
    b := []string{}
    for _, e := range arr {
        if IsFoo(e) {
            b = append(b, e)
        }
    }
    return b
}

簡単なベンチマークを書くとわかるように、この関数は返値となるスライスの長さ+1回のメモリアロケーションを行います。一般に、メモリアロケーションの回数は少ない方がパフォーマンスがよく、可能ならばアロケーション回数0を目指したいものです。

今回の場合、次のように書くとメモリアロケーション回数0回の関数を書くことができます。

追記 b := arr[:0]とすると、基底配列に影響が出るので一概に比較できない、とご指摘を受けました。実際に使用する際は副作用に注意しましょう。

このやりかたって引数に副作用あるので、わかってないで使うと危ないような…https://t.co/iKXrXHUD3N https://t.co/CMrAYGJrdA
— Yoichiro Shimizu (@budougumi0617) February 4, 2019

append は引数を弄ってしまうので動作が異なりますね。 / “zero memory allocation slice filtering” https://t.co/JFFDJlfIQA
— mattn (@mattn_jp) February 5, 2019

追記終わり

1
2
3
4
5
6
7
8
9
func FilterFoo(arr []string) []string {
    b := arr[:0]
    for _, e := range arr {
        if IsFoo(e) {
            b = append(b, e)
        }
    }
    return b
}

違うのは一行目だけですが、ベンチマークを取ってみると、速度面では大きな違いがあります。次のようなベンチマークを実行してみます。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
package benchmark_test

import (
	"strings"
	"testing"
)

var a = []string{"Hello", "foo.go", "hoge", "bar.go", "baz.go", "fuga"}

func IsGoFilename(filename string) bool {
	return strings.HasSuffix(filename, ".go")
}

func naive(arr []string) []string {
	var b []string
	for _, x := range arr {
		if IsGoFilename(x) {
			b = append(b, x)
		}
	}
	return b
}

func BenchmarkNaive(b *testing.B) {
	for i := 0; i < b.N; i++ {
		naive(a)
	}
}

func woAlloc(arr []string) []string {
	b := arr[:0]
	for _, x := range arr {
		if IsGoFilename(x) {
			b = append(b, x)
		}
	}
	return b
}

func BenchmarkWithoutAlloc(b *testing.B) {
	for i := 0; i < b.N; i++ {
		woAlloc(a)
	}
}

結果は次のようになります。

1
2
3
4
5
6
7
8
$ go test -bench . -benchmem
goos: darwin
goarch: amd64
pkg: practice/go-filtering-without-allocating
BenchmarkNaive-8                 5000000               252 ns/op             240 B/op          4 allocs/op
BenchmarkWithoutAlloc-8         50000000                34.3 ns/op             0 B/op          0 allocs/op
PASS
ok      practice/go-filtering-without-allocating        3.269s