PostgreSQL
Go Performance Observations
In the course of optimizing the pgx PostgreSQL driver I observed a number of performance characteristics of Go that I hope you will find useful.
Measure First
"Premature optimization is the root of all evil" -- Donald Knuth
Go has two tools that are invaluable in performance tuning: a profiler and a benchmarking tool. The profiler helps find the trouble spots and benchmarks show the results of an optimization. See How to write benchmarks in Go by Dave Cheney and Profiling Go Programs by Russ Cox for introductions to these tools. Below are several specific techniques I found with benchmarks and the profiler. Source code for the benchmarks is on Github.
Reuse Memory
Every allocation of memory has several potential costs. The Go runtime must ensure that the memory is initialized to the zero value. The garbage collector must track the references to value and eventually clean it up. Additional memory usage also makes it less likely to get a CPU cache hit.
This simple example fills a slice of up to 1024 bytes with 1s.
func BenchmarkNewBuffers(b *testing.B) {
for i := 0; i < b.N; i++ {
n := rand.Intn(1024)
buf := make([]byte, n)
// Do something with buffer
for j := 0; j < n; j++ {
buf[j] = 1
}
}
}
func BenchmarkReuseBuffers(b *testing.B) {
sharedBuf := make([]byte, 1024)
for i := 0; i < b.N; i++ {
n := rand.Intn(1024)
buf := sharedBuf[0:n]
// Do something with buffer
for j := 0; j < n; j++ {
buf[j] = 1
}
}
}
Note the -test.benchmem flag for measuring memory allocations.
jack@hk-47~/dev/go/src/github.com/jackc/go_pgx_perf_observations$ go test -test.bench=Buffers -test.benchmem
testing: warning: no tests to run
PASS
BenchmarkNewBuffers 2000000 1033 ns/op 540 B/op 0 allocs/op
BenchmarkReuseBuffers 5000000 436 ns/op 0 B/op 0 allocs/op
ok github.com/jackc/go_pgx_perf_observations 5.704s
Allocating a new buffer each iteration is substantially slower. Obviously, the more work done on the buffer the less relative impact eliminating the allocation would have. Surprisingly, both versions show 0 allocs/op. How can that be? Let's rerun the test with the -gcflags=-m option to ask Go to tell us the details.
jack@hk-47~/dev/go/src/github.com/jackc/go_pgx_perf_observations$ go test -gcflags=-m -test.bench=Buffers -test.benchmem
# github.com/jackc/go_pgx_perf_observations_test
<snip/>
./bench_test.go:15: BenchmarkNewBuffers b does not escape
./bench_test.go:18: BenchmarkNewBuffers make([]byte, n) does not escape
./bench_test.go:27: BenchmarkReuseBuffers b does not escape
./bench_test.go:28: BenchmarkReuseBuffers make([]byte, 1024) does not escape
<snip/>
The Go compiler performs escape analysis. If an allocation does not escape the function it can be stored on the stack and avoid the garbage collector entirely.
So in a real world system where many allocations do escape to the heap reducing the allocations can have an even bigger impact.
Buffered IO
Go does not buffer IO by default. The bufio package provides buffered IO. This can make a massive difference in performance.
func BenchmarkUnbufferedFileWrite(b *testing.B) {
file, err := os.Create("unbuffered.test")
if err != nil {
b.Fatalf("Unable to create file: %v", err)
}
defer func() {
file.Close()
os.Remove(file.Name())
}()
for i := 0; i < b.N; i++ {
fmt.Fprintln(file, "Hello world")
}
}
func BenchmarkBufferedFileWrite(b *testing.B) {
file, err := os.Create("buffered.test")
if err != nil {
b.Fatalf("Unable to create file: %v", err)
}
defer func() {
file.Close()
os.Remove(file.Name())
}()
writer := bufio.NewWriter(file)
defer writer.Flush()
for i := 0; i < b.N; i++ {
fmt.Fprintln(writer, "Hello world")
}
}
jack@hk-47~/dev/go/src/github.com/jackc/go_pgx_perf_observations$ go test -test.bench=Write
testing: warning: no tests to run
PASS
BenchmarkUnbufferedFileWrite 1000000 2588 ns/op
BenchmarkBufferedFileWrite 10000000 271 ns/op
ok github.com/jackc/go_pgx_perf_observations 5.626s
A simple test of writing "Hello, world" repeatedly to a file shows a greater than 9x improvement in performance by using a buffered writer.
Binary vs. Text Formats
PostgreSQL allows the transmission of data in binary or text format. The performance of the binary format is far faster than the text format. This is because the only processing typically needed is converting from network byte order. The binary format should also be more efficient for the PostgreSQL server and it may be a more compact transmission format. However, we will isolate our benchmarks to the parsing of int32 and time.Time values.
func BenchmarkParseInt32Text(b *testing.B) {
s := "12345678"
expected := int32(12345678)
for i := 0; i < b.N; i++ {
n, err := strconv.ParseInt(s, 10, 32)
if err != nil {
b.Fatalf("strconv.ParseInt failed: %v", err)
}
if int32(n) != expected {
b.Fatalf("strconv.ParseInt decoded %v instead of %v", n, expected)
}
}
}
func BenchmarkParseInt32Binary(b *testing.B) {
buf := make([]byte, 4)
binary.BigEndian.PutUint32(buf, 12345678)
expected := int32(12345678)
for i := 0; i < b.N; i++ {
n := int32(binary.BigEndian.Uint32(buf))
if n != expected {
b.Fatalf("Got %v instead of %v", n, expected)
}
}
}
func BenchmarkParseTimeText(b *testing.B) {
s := "2011-10-25 09:12:34.345921-05"
expected, _ := time.Parse("2006-01-02 15:04:05.999999-07", s)
for i := 0; i < b.N; i++ {
t, err := time.Parse("2006-01-02 15:04:05.999999-07", s)
if err != nil {
b.Fatalf("time.Parse failed: %v", err)
}
if t != expected {
b.Fatalf("time.Parse decoded %v instead of %v", t, expected)
}
}
}
// PostgreSQL binary format is an int64 of the number of microseconds since Y2K
func BenchmarkParseTimeBinary(b *testing.B) {
microsecFromUnixEpochToY2K := int64(946684800 * 1000000)
s := "2011-10-25 09:12:34.345921-05"
expected, _ := time.Parse("2006-01-02 15:04:05.999999-07", s)
microsecSinceUnixEpoch := expected.Unix()*1000000 + int64(expected.Nanosecond())/1000
microsecSinceY2K := microsecSinceUnixEpoch - microsecFromUnixEpochToY2K
buf := make([]byte, 8)
binary.BigEndian.PutUint64(buf, uint64(microsecSinceY2K))
for i := 0; i < b.N; i++ {
microsecSinceY2K := int64(binary.BigEndian.Uint64(buf))
microsecSinceUnixEpoch := microsecFromUnixEpochToY2K + microsecSinceY2K
t := time.Unix(microsecSinceUnixEpoch/1000000, (microsecSinceUnixEpoch%1000000)*1000)
if t != expected {
b.Fatalf("Got %v instead of %v", t, expected)
}
}
}
jack@hk-47~/dev/go/src/github.com/jackc/go_pgx_perf_observations$ go test -test.bench=Parse
testing: warning: no tests to run
PASS
BenchmarkParseInt32Text 50000000 62.8 ns/op
BenchmarkParseInt32Binary 500000000 3.40 ns/op
BenchmarkParseTimeText 2000000 775 ns/op
BenchmarkParseTimeBinary 100000000 15.4 ns/op
ok github.com/jackc/go_pgx_perf_observations 9.159s
Parsing an int32 takes over 18x longer than to parse from text than simply to read in binary. Parsing a time takes over 84x longer. The absolute numbers are small, but they add up. In general, binary protocols are vastly faster than text protocols.
More Binary Tricks
When reading or writing a binary stream using binary.Read with an io.Reader or binary.Write with an io.Writer is very convenient. But working directly with a []byte and binary.BigEndian.Get* or binary.BigEndian.Put* is more efficient.
func BenchmarkBinaryWrite(b *testing.B) {
buf := &bytes.Buffer{}
for i := 0; i < b.N; i++ {
buf.Reset()
for j := 0; j < 10; j++ {
binary.Write(buf, binary.BigEndian, int32(j))
}
}
}
func BenchmarkBinaryPut(b *testing.B) {
var writebuf [1024]byte
for i := 0; i < b.N; i++ {
buf := writebuf[0:0]
for j := 0; j < 10; j++ {
b := make([]byte, 4)
binary.BigEndian.PutUint32(b, uint32(j))
buf = append(buf, b...)
}
}
}
jack@hk-47~/dev/go/src/github.com/jackc/go_pgx_perf_observations$ go test -test.bench=BenchmarkBinary -test.benchmem
testing: warning: no tests to run
PASS
BenchmarkBinaryWrite 1000000 1075 ns/op 80 B/op 5 allocs/op
BenchmarkBinaryPut 20000000 113 ns/op 0 B/op 0 allocs/op
ok github.com/jackc/go_pgx_perf_observations 3.485s
Not only is binary.Write much slower, it also incurs additional memory allocations. Just this change made a substantial improvement to pgx performance.
Measure Last
Let me close with another warning to measure before committing optimizations. One use case I wanted to optimize was that of a web API that served JSON produced directly in PostgreSQL. The normal way to do this is to read the JSON into a string then write that string to the HTTP io.Writer. But wouldn't it be so much faster to copy directly from the PostgreSQL io.Reader to the HTTP io.Writer? It's obvious it should be faster, but unfortunately it is incorrect. Benchmarks revealed it was actually slower in the vast majority of cases, and only marginally faster in the best cases.
So once again: measure first and measure last.