利用了 profiling 工具解決了一些 Go-HttpBench 的 bug 並做了些效能優化
以 ./gb -c 100 -n 200000 -k http://localhost/10k.dat 來比較版本 0.1.0 與 0.1.6 的結果(based on Go1.1beta2)
Mem(RES) Requests/s Alloc(Rate) GC-Overhead Pause(Avg) Pause(Max)
0.1.0 354m 12119.99 397.00M/s 25.40% 56.72ms 153.25ms
0.1.6 23m 16976.76 67.48M/s 2.50% 5.12ms 10.07ms
delta(%) ↓93% ↑40% ↓83% ↓22.9% ↓91% ↓93%
Response time 0.1.0 0.1.6
50% 4 4
66% 6 5
75% 7 6
80% 8 7
90% 10 9
95% 12 11
98% 57 13
99% 70 15
100% 179 133
優化過程主要做了下列四項改變,以下將包含總結分五個章節來發佈
- 重複使用 Buffer
- 程式化停止 Timer
- 正常結束 Goroutine
- 快取計算結果
- 結論
1. 重複使用 Buffer
先在程式內加入 CPU Profiling 程式,分析有沒有明顯的效能瓶頸 (用的指令請參考附件 profilingtool.go)
func main() {
StartCPUProfile()
// ignored
StopCPUProfile()
PrintGCSummary()
}
$ ./gb -c 100 -n 200000 -k http://localhost/10k.dat
This is GoHttpBench, Version 0.1.0, https://github.com/parkghost/gohttpbench
Author: Brandon Chen, Email: parkghost@gmail.com
Licensed under the Apache License, Version 2.0
Benchmarking localhost (be patient)
Completed 20000 requests
Completed 40000 requests
Completed 60000 requests
Completed 80000 requests
Completed 100000 requests
Completed 120000 requests
Completed 140000 requests
Completed 160000 requests
Completed 180000 requests
Completed 200000 requests
Finished 200000 requests
Server Software: nginx/1.2.1
Server Hostname: localhost
Server Port: 80
Document Path: /10k.dat
Document Length: 10240 bytes
Concurrency Level: 100
Time taken for tests: 19.00 seconds
Complete requests: 200000
Failed requests: 0
HTML transferred: 2048000000 bytes
Requests per second: 10526.49 [#/sec] (mean)
Time per request: 9.500 [ms] (mean)
Time per request: 0.095 [ms] (mean, across all concurrent requests)
HTML Transfer rate: 105264.94 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Total: 0 0 12.79 5 164
Percentage of the requests served within a certain time (ms)
50% 5
66% 7
75% 8
80% 9
90% 12
95% 16
98% 60
99% 77
100% 164 (longest request)
NumGC:77 Pause:41.17ms Pause(Avg):61.92ms Overhead:24.21% Alloc:48.57M Sys:614.81M Alloc(Rate):346.57M/s Histogram:108.41ms 124.72ms 124.72ms
用 top 搜集 gb 相關系統資源使用狀況
$ echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND"; top -d 1 -b | grep gb
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27047 brandon 20 0 597m 10m 2840 S 4.9 0.1 0:00.05 gb
27047 brandon 20 0 1054m 326m 2848 S 123.9 2.1 0:01.30 gb
27047 brandon 20 0 1343m 349m 2852 S 114.8 2.2 0:02.46 gb
27047 brandon 20 0 1343m 349m 2852 S 100.0 2.2 0:03.47 gb
27047 brandon 20 0 1343m 350m 2852 S 125.1 2.2 0:04.74 gb
27047 brandon 20 0 1343m 350m 2852 S 134.4 2.2 0:06.10 gb
27047 brandon 20 0 1343m 350m 2852 S 132.8 2.2 0:07.44 gb
27047 brandon 20 0 1343m 350m 2852 S 124.7 2.2 0:08.70 gb
27047 brandon 20 0 1343m 351m 2852 S 95.9 2.2 0:09.67 gb
27047 brandon 20 0 1487m 351m 2852 S 99.0 2.2 0:10.67 gb
27047 brandon 20 0 1487m 351m 2852 S 109.6 2.2 0:11.78 gb
27047 brandon 20 0 1487m 351m 2852 S 85.2 2.2 0:12.64 gb
27047 brandon 20 0 1488m 353m 2852 S 104.5 2.2 0:13.70 gb
27047 brandon 20 0 1488m 353m 2852 S 128.1 2.2 0:15.00 gb
27047 brandon 20 0 1488m 353m 2852 S 129.6 2.2 0:16.31 gb
27047 brandon 20 0 1488m 353m 2852 S 129.3 2.2 0:17.62 gb
27047 brandon 20 0 1488m 354m 2852 S 130.6 2.2 0:18.94 gb
27047 brandon 20 0 1488m 354m 2852 S 130.7 2.2 0:20.26 gb
27047 brandon 20 0 1634m 356m 2852 S 127.5 2.3 0:21.55 gb
27047 brandon 20 0 1634m 356m 2852 S 131.5 2.3 0:22.88 gb
用內建的 profiling 工具來找程式熱點
$ go tool pprof ./gb cpu-27047.pprof
Welcome to pprof! For help, type 'help'.
(pprof) top10
Total: 1569 samples
183 11.7% 11.7% 356 22.7% scanblock
74 4.7% 16.4% 74 4.7% runtime.memmove
60 3.8% 20.2% 60 3.8% runtime.casp
59 3.8% 24.0% 59 3.8% runtime.xchg
51 3.3% 27.2% 72 4.6% sweepspan
50 3.2% 30.4% 50 3.2% runtime.memclr
45 2.9% 33.3% 56 3.6% markonly
39 2.5% 35.8% 66 4.2% flushptrbuf
34 2.2% 37.9% 55 3.5% runtime.markallocated
32 2.0% 40.0% 32 2.0% net/textproto.canonicalMIMEHeaderKey
scanblock method 佔用了 11% 的 cpu 資源,來查看看
(pprof) web scanblock
Total: 1569 samples
After focusing on 'scanblock': 356 samples of 1569 (22.7%)
Dropping nodes with <= 1 samples; edges with <= 0 abs(samples)
Loading web page file:////tmp/pprof27099.1.svg
Created new window in existing browser session.
(pprof)
按 scanblock 上方粗線追朔,看起來是 makeslice 建立許多 slice 使的 gc 很忙, 延粗線再往上找到 main package 的 main.(*HttpWorker).send 可能是問題來源
(pprof) list main.*.send
Total: 1569 samples
ROUTINE ====================== main.(*HttpWorker).send in /home/brandon/workspaces/go/src/bench/gb/http.go
7 798 Total samples (flat / cumulative)
. . 49: }
. . 50: }
. . 51:
. . 52: }
. . 53:
---
. . 54: func (h *HttpWorker) send(request *http.Request, executionResult chan<- *Record) {
. . 55:
. 4 56: record := &Record{}
. . 57: sw := &StopWatch{}
. . 58: sw.Start()
. . 59:
. . 60: var contentSize int
. . 61:
. . 62: defer func() {
. . 63: if r := recover(); r != nil {
. . 64: record.Error = &ExceptionError{errors.New(fmt.Sprint(r))}
. . 65: } else {
. . 66: record.contentSize = contentSize
. . 67: record.responseTime = sw.Elapsed
. . 68: }
. . 69:
. . 70: if record.Error != nil {
. . 71: TraceException(record.Error)
. . 72: }
. . 73:
. . 74: executionResult <- record
1 1 75: }()
. . 76:
1 188 77: resp, err := h.client.Do(request)
. . 78: if err != nil {
. 21 79: record.Error = &ConnectError{err}
. . 80: return
. . 81: } else {
. 3 82: defer resp.Body.Close()
. . 83:
. . 84: if resp.StatusCode < 200 || resp.StatusCode > 300 {
. . 85: record.Error = &ResponseError{err}
. . 86: return
. . 87: }
. . 88:
1 575 89: body, err := ioutil.ReadAll(resp.Body)
. . 90:
. . 91: if err != nil {
. . 92: record.Error = &ReceiveError{err}
. . 93: return
. . 94: }
. . 95: contentSize = len(body)
. . 96:
4 4 97: if contentSize != h.config.contentSize {
. . 98: record.Error = &LengthError{invalidContnetSize}
. . 99: return
. . 100: }
. . 101:
. . 102: }
. . 103:
. . 104: sw.Stop()
. . 105:
. 2 106: }
---
. . 107:
. . 108: func detectHost(config *Config) error {
. . 109: var err error
. . 110: defer func() {
. . 111: if r := recover(); r != nil {
(pprof)
ioutil.ReadAll(resp.Body) 內部佔用了最長的時間,可是沒看到有 make([]) 之類的語句
(pprof) list ioutil.ReadAll
(略)
(pprof) list ioutil.readAll
Total: 1569 samples
ROUTINE ====================== io/ioutil.readAll in /usr/lib/go/src/pkg/io/ioutil/ioutil.go
4 574 Total samples (flat / cumulative)
. . 12: "sort"
. . 13: )
. . 14:
. . 15: // readAll reads from r until an error or EOF and returns the data it read
. . 16: // from the internal buffer allocated with a specified capacity.
---
1 1 17: func readAll(r io.Reader, capacity int64) (b []byte, err error) {
1 41 18: buf := bytes.NewBuffer(make([]byte, 0, capacity))
. . 19: // If the buffer overflows, we will get bytes.ErrTooLarge.
. . 20: // Return that as an error. Any other panic remains.
. . 21: defer func() {
. . 22: e := recover()
. . 23: if e == nil {
. . 24: return
. . 25: }
. . 26: if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
. . 27: err = panicErr
. . 28: } else {
. . 29: panic(e)
. . 30: }
1 2 31: }()
1 527 32: _, err = buf.ReadFrom(r)
. 1 33: return buf.Bytes(), err
. 2 34: }
---
. . 35:
. . 36: // ReadAll reads from r until an error or EOF and returns the data it read.
. . 37: // A successful call returns err == nil, not err == EOF. Because ReadAll is
. . 38: // defined to read from src until EOF, it does not treat an EOF from Read
. . 39: // as an error to be reported.
(pprof) list bytes.*ReadFrom
Total: 1569 samples
ROUTINE ====================== bytes.(*Buffer).ReadFrom in /usr/lib/go/src/pkg/bytes/buffer.go
5 526 Total samples (flat / cumulative)
. . 145:
. . 146: // ReadFrom reads data from r until EOF and appends it to the buffer, growing
. . 147: // the buffer as needed. The return value n is the number of bytes read. Any
. . 148: // error except io.EOF encountered during the read is also returned. If the
. . 149: // buffer becomes too large, ReadFrom will panic with ErrTooLarge.
---
. . 150: func (b *Buffer) ReadFrom(r io.Reader) (n int64, err error) {
. . 151: b.lastRead = opInvalid
. . 152: // If buffer is empty, reset to recover space.
. . 153: if b.off >= len(b.buf) {
1 2 154: b.Truncate(0)
. . 155: }
. . 156: for {
. . 157: if free := cap(b.buf) - len(b.buf); free < MinRead {
. . 158: // not enough space at end
. . 159: newBuf := b.buf
. . 160: if b.off+free < MinRead {
. . 161: // not enough space using beginning of buffer;
. . 162: // double buffer capacity
2 379 163: newBuf = makeSlice(2*cap(b.buf) + MinRead)
. . 164: }
. 13 165: copy(newBuf, b.buf[b.off:])
. . 166: b.buf = newBuf[:len(b.buf)-b.off]
. . 167: b.off = 0
. . 168: }
1 131 169: m, e := r.Read(b.buf[len(b.buf):cap(b.buf)])
. . 170: b.buf = b.buf[0 : len(b.buf)+m]
. . 171: n += int64(m)
1 1 172: if e == io.EOF {
. . 173: break
. . 174: }
. . 175: if e != nil {
. . 176: return n, e
. . 177: }
. . 178: }
. . 179: return n, nil // err is EOF, so return nil explicitly
---
. . 180: }
. . 181:
. . 182: // makeSlice allocates a slice of size n. If the allocation fails, it panics
. . 183: // with ErrTooLarge.
. . 184: func makeSlice(n int) []byte {
ioutil.readAll 的 L:18 可以分析問題是 main.(*HttpWorker).send 在每回呼叫 ioutil.ReadAll 都會動態產生一塊 Buffer 若在讀取資料的時候遇到 buffer size 不足,就再 makeSlice 產生更大的 buffer(bytes.ReadFrom:163)
為了避免一直創建 buffer 做了以下修改
type HttpWorker struct {
// ignored
readBuf *bytes.Buffer
}
func NewHttpWorker(...ignored...) *HttpWorker {
buf := bytes.NewBuffer(make([]byte, 0, config.contentSize+bytes.MinRead))
return &HttpWorker{...ignored..., buf}
}
func (h *HttpWorker) send(request *http.Request, executionResult chan<- *Record) {
// ignored
resp, _ := h.client.Do(request)
defer resp.Body.Close()
defer h.readBuf.Reset()
var err error
contentSize, err = h.readBuf.ReadFrom(resp.Body)
// ignored
}
執行結果
$ ./gb -c 100 -n 200000 -k http://localhost/10k.dat
(略)
Document Path: /10k.dat
Document Length: 10240 bytes
Concurrency Level: 100
Time taken for tests: 11.70 seconds
Complete requests: 200000
Failed requests: 0
HTML transferred: 2048000000 bytes
Requests per second: 17094.59 [#/sec] (mean)
Time per request: 5.850 [ms] (mean)
Time per request: 0.058 [ms] (mean, across all concurrent requests)
HTML Transfer rate: 170945.87 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Total: 0 0 5.16 4 118
Percentage of the requests served within a certain time (ms)
50% 4
66% 5
75% 6
80% 7
90% 9
95% 10
98% 12
99% 14
100% 118 (longest request)
NumGC:12 Pause:58.63ms Pause(Avg):40.91ms Overhead:3.96% Alloc:84.97M Sys:611.67M Alloc(Rate):77.23M/s Histogram:99.82ms 99.82ms 99.82ms
$ echo " PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND"; top -d 1 -b | grep gb
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27989 brandon 20 0 753m 176m 2844 S 57.9 1.1 0:00.59 gb
27989 brandon 20 0 1012m 234m 2852 S 116.8 1.5 0:01.78 gb
27989 brandon 20 0 1058m 280m 2852 S 107.6 1.8 0:02.87 gb
27989 brandon 20 0 1105m 327m 2852 S 107.7 2.1 0:03.96 gb
27989 brandon 20 0 1194m 346m 2852 S 116.9 2.2 0:05.14 gb
27989 brandon 20 0 1195m 347m 2852 S 107.0 2.2 0:06.22 gb
27989 brandon 20 0 1195m 347m 2852 S 103.9 2.2 0:07.27 gb
27989 brandon 20 0 1275m 348m 2852 S 116.6 2.2 0:08.45 gb
27989 brandon 20 0 1277m 350m 2852 S 111.7 2.2 0:09.58 gb
27989 brandon 20 0 1277m 350m 2852 S 110.8 2.2 0:10.70 gb
27989 brandon 20 0 1349m 351m 2852 S 113.6 2.2 0:11.85 gb
27989 brandon 20 0 1359m 353m 2852 S 115.9 2.2 0:13.02 gb
$ go tool pprof ./gb cpu-27989.pprof
Welcome to pprof! For help, type 'help'.
(pprof) top10
Total: 967 samples
61 6.3% 6.3% 61 6.3% runtime.xchg
44 4.6% 10.9% 44 4.6% runtime.memmove
34 3.5% 14.4% 34 3.5% net/textproto.canonicalMIMEHeaderKey
29 3.0% 17.4% 31 3.2% runtime.settype_flush
27 2.8% 20.2% 227 23.5% runtime.mallocgc
26 2.7% 22.9% 26 2.7% runtime.casp
25 2.6% 25.4% 87 9.0% runtime.MCache_Alloc
25 2.6% 28.0% 46 4.8% runtime.markallocated
23 2.4% 30.4% 23 2.4% bytes.IndexByte
23 2.4% 32.8% 92 9.5% selectgo
跟改變前相比 Alloc(Rate):346.57M/s 降到 Alloc(Rate):77.23M/s CPU 利用率也略有下降,相關 gc 停頓數值下降,throughput 也有較大的提升, 原先 top10 裡佔據 11.7% scanblock 也消失了
UPDATE
更省記憶體的改良辦法
const MaxBufferSize = 8192
type HTTPWorker struct {
//ignored
discard io.ReaderFrom
}
func NewHttpWorker(...ignored...) *HttpWorker {
var buf []byte
contentSize := context.GetInt(FieldContentSize)
if contentSize < MaxBufferSize {
buf = make([]byte, contentSize)
} else {
buf = make([]byte, MaxBufferSize)
}
return &HTTPWorker{
//ignored
&Discard{buf},
}
}
func (h *HttpWorker) send(request *http.Request, executionResult chan<- *Record) {
// ignored
resp, _ := h.client.Do(request)
defer resp.Body.Close()
contentSize, err = h.discard.ReadFrom(resp.Body)
// ignored
}
type Discard struct {
blackHole []byte
}
func (d *Discard) ReadFrom(r io.Reader) (n int64, err error) {
readSize := 0
for {
readSize, err = r.Read(d.blackHole)
n += int64(readSize)
if err != nil {
if err == io.EOF {
return n, nil
}
return
}
}
}
附件 profiling 工具