Apr 18, 2013

golang - Profiling Go Programs 記錄(Part 2)

2. 程式化停止 Timer

$ go tool pprof ./gb cpu-27989.pprof 
Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 967 samples
      61   6.3%   6.3%       61   6.3% runtime.xchg
      44   4.6%  10.9%       44   4.6% runtime.memmove
      34   3.5%  14.4%       34   3.5% net/textproto.canonicalMIMEHeaderKey
      29   3.0%  17.4%       31   3.2% runtime.settype_flush
      27   2.8%  20.2%      227  23.5% runtime.mallocgc
      26   2.7%  22.9%       26   2.7% runtime.casp
      25   2.6%  25.4%       87   9.0% runtime.MCache_Alloc
      25   2.6%  28.0%       46   4.8% runtime.markallocated
      23   2.4%  30.4%       23   2.4% bytes.IndexByte
      23   2.4%  32.8%       92   9.5% selectgo
$ echo "  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND";  top -d 1 -b | grep gb
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
27989 brandon   20   0  753m 176m 2844 S  57.9  1.1   0:00.59 gb
27989 brandon   20   0 1012m 234m 2852 S 116.8  1.5   0:01.78 gb
27989 brandon   20   0 1058m 280m 2852 S 107.6  1.8   0:02.87 gb
27989 brandon   20   0 1105m 327m 2852 S 107.7  2.1   0:03.96 gb
27989 brandon   20   0 1194m 346m 2852 S 116.9  2.2   0:05.14 gb
27989 brandon   20   0 1195m 347m 2852 S 107.0  2.2   0:06.22 gb
27989 brandon   20   0 1195m 347m 2852 S 103.9  2.2   0:07.27 gb
27989 brandon   20   0 1275m 348m 2852 S 116.6  2.2   0:08.45 gb
27989 brandon   20   0 1277m 350m 2852 S 111.7  2.2   0:09.58 gb
27989 brandon   20   0 1277m 350m 2852 S 110.8  2.2   0:10.70 gb
27989 brandon   20   0 1349m 351m 2852 S 113.6  2.2   0:11.85 gb
27989 brandon   20   0 1359m 353m 2852 S 115.9  2.2   0:13.02 gb

這時的 cpu profiling 已經比較分散平均,看不太出明顯熱點,另外從 top 數據來看 350m 的記憶體用量有些高,改用 heap profiler 觀察看看是否有 memory leak 的狀況

在程式內加入 Mem Profiling 程式

func main() {
    SetMemProfileRate(1) //(請參考附件 profilingtool.go)
    // ignored
    GC()    
    DumpHeap()
    PrintGCSummary()
}
$ ./gb -c 100 -n 200000 -k http://localhost/10k.dat
(略)
Concurrency Level:      100
Time taken for tests:   25.33 seconds
Complete requests:      200000
Failed requests:        0
HTML transferred:       2048000000 bytes
Requests per second:    7896.70 [#/sec] (mean)
Time per request:       12.664 [ms] (mean)
Time per request:       0.127 [ms] (mean, across all concurrent requests)
HTML Transfer rate:     78966.96 [Kbytes/sec] received

Connection Times (ms)
              min   mean[+/-sd] median  max
Total:        0         0   76.91   6   1866

Percentage of the requests served within a certain time (ms)
 50%     6
 66%     8
 75%     10
 80%     11
 90%     14
 95%     16
 98%     18
 99%     21
 100%    1866 (longest request)

NumGC:13 Pause:731.86ms Pause(Avg):604.31ms Overhead:27.92% Alloc:46.39M Sys:696.17M Alloc(Rate):33.91M/s Histogram:1.85s 1.85s 1.85s

這裡用設定的 sample rate 比較小,搜集的資料比較完整,相對也比較多效能消耗,這裡的平均停頓時間就高了許多(604.31ms vs 40.91ms)

brandon@brandon-laptop:~/workspaces/go/src/bench/gb$ echo "  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND";  top -d 1 -b | grep gb
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
29236 brandon   20   0  487m  48m 2944 S  48.1  0.3   0:00.49 gb
29236 brandon   20   0  595m 156m 2944 S 102.1  1.0   0:01.52 gb
29236 brandon   20   0  808m 233m 2956 S 123.9  1.5   0:02.77 gb
29236 brandon   20   0  916m 278m 2956 S 137.8  1.8   0:04.16 gb
29236 brandon   20   0  960m 322m 2956 S 137.7  2.0   0:05.55 gb
29236 brandon   20   0 1004m 366m 2956 S 135.7  2.3   0:06.92 gb
29236 brandon   20   0 1047m 410m 2956 S 136.7  2.6   0:08.30 gb
29236 brandon   20   0 1072m 434m 2956 S 153.2  2.7   0:09.85 gb
29236 brandon   20   0 1072m 434m 2956 S 171.4  2.7   0:11.58 gb
29236 brandon   20   0 1072m 434m 2956 S 141.8  2.7   0:13.01 gb
29236 brandon   20   0 1072m 435m 2956 S 137.7  2.8   0:14.40 gb
29236 brandon   20   0 1072m 436m 2956 S 132.6  2.8   0:15.74 gb
29236 brandon   20   0 1072m 436m 2956 S 135.7  2.8   0:17.11 gb
29236 brandon   20   0 1072m 436m 2956 S 135.3  2.8   0:18.48 gb
29236 brandon   20   0 1072m 436m 2956 S 168.4  2.8   0:20.18 gb
29236 brandon   20   0 1072m 436m 2956 S 153.7  2.8   0:21.73 gb
29236 brandon   20   0 1073m 438m 2956 S 126.6  2.8   0:23.01 gb
29236 brandon   20   0 1073m 438m 2956 S 130.8  2.8   0:24.33 gb
29236 brandon   20   0 1073m 438m 2956 S 132.8  2.8   0:25.67 gb
29236 brandon   20   0 1073m 438m 2956 S 136.6  2.8   0:27.05 gb
29236 brandon   20   0 1073m 438m 2956 S 156.5  2.8   0:28.63 gb
29236 brandon   20   0 1073m 438m 2956 S 143.8  2.8   0:30.08 gb
29236 brandon   20   0 1073m 439m 2956 S 126.5  2.8   0:31.36 gb
29236 brandon   20   0 1076m 441m 2956 S 134.6  2.8   0:32.72 gb
29236 brandon   20   0 1076m 441m 2956 S 140.6  2.8   0:34.14 gb
29236 brandon   20   0 1076m 441m 2956 S 153.5  2.8   0:35.69 gb
29236 brandon   20   0 1076m 441m 2956 S 139.6  2.8   0:37.10 gb
29236 brandon   20   0 1076m 441m 2956 S 131.7  2.8   0:38.43 gb
$ go tool pprof ./gb heap-29236-1.pprof
(pprof) top10
Total: 42.4 MB
    33.6  79.1%  79.1%     35.6  83.9% time.NewTimer
     2.0   4.8%  83.9%      2.0   4.8% addtimer
     1.5   3.6%  87.5%      1.5   3.6% main.NewBenchmark
     1.5   3.6%  91.1%      2.9   6.7% main.(*Benchmark).Run
     1.2   2.8%  93.9%      1.2   2.9% main.NewHttpWorker
     0.6   1.5%  95.4%      0.6   1.5% newdefer
     0.6   1.4%  96.8%      0.6   1.4% bufio.NewReaderSize
     0.4   1.0%  97.8%      0.4   1.0% bufio.NewWriterSize
     0.3   0.7%  98.5%      1.6   3.8% net/http.(*Transport).dialConn
     0.1   0.3%  98.8%      0.1   0.3% runtime.malg

程式要結束前還有 42.4 MB 沒被回收,有 memory leak

(pprof) web time.NewTimer

SVG File

從圖中來看 timer.NewTimer 是 main.(*HttpWorker).Run 裡面的 time.After(d timer.Duration) 建立的

(pprof) list main.\(\*HttpWorker\).Run
Total: 42.4 MB
ROUTINE ====================== main.(*HttpWorker).Run in /home/brandon/workspaces/go/src/bench/gb/http.go
   0.0   35.6 Total MB (flat / cumulative)
     .      .   29: 
     .      .   30: var (
     .      .   31:     invalidContnetSize = errors.New("invalid content size")
     .      .   32: )
     .      .   33: 
---
     .      .   34: func (h *HttpWorker) Run() {
     .      .   35:     <-h.start
     .      .   36: 
     .      .   37:     for job := range h.jobs {
     .      .   38: 
   0.0    0.0   39:         executionResult := make(chan *Record)
     .      .   40: 
     .    0.0   41:         go h.send(job, executionResult)
     .      .   42: 
     .      .   43:         select {
     .      .   44:         case record := <-executionResult:
     .      .   45:             h.collector <- record
     .      .   46: 
     .   35.6   47:         case <-time.After(time.Duration(MAX_RESPONSE_TIMEOUT) * time.Second):
     .      .   48:             h.collector <- &Record{Error: errors.New("execution timeout")}
     .      .   49: 
     .      .   50:         case <-h.stop:
     .      .   51:             return
     .      .   52:         }
     .      .   53:     }
     .      .   54: 
     .      .   55: }
---
     .      .   56: 
     .      .   57: func (h *HttpWorker) send(request *http.Request, executionResult chan<- *Record) {
     .      .   58: 
     .      .   59:     record := &Record{}
     .      .   60:     sw := &StopWatch{}

select 裡包 time.After(Duration)這種 timeout control用法,在官方文件裡有很多地方都有使用,追查過後才知道,這方法使用的 Timer 等時間過期就會被 GC 回收了,但這裡的狀況是需要對每組 http request/response 各別做 timeout control,會大量使用到 Timer 若要放給 GC 回收會有點浪費記憶體,也會額外增加 GC 掃描的時間

修正方法

func (h *HttpWorker) Run() {
    <-h.start

    for job := range h.jobs {

        timeout := time.NewTimer(time.Duration(MAX_RESPONSE_TIMEOUT) * time.Second)
        executionResult := make(chan *Record)

        go h.send(job, executionResult)

        select {
        case record := <-executionResult:
            h.collector <- record

        case <-timeout.C:
            h.collector <- &Record{Error: errors.New("execution timeout")}

        case <-h.stop:
            timeout.Stop()  
            return
        }
        timeout.Stop()      
    }

}

執行結果

$ ./gb -c 100 -n 200000 -k http://localhost/10k.dat 
$ go tool pprof ./gb heap-6438-1.pprof
(pprof) top10
Total: 8.6 MB
     1.5  17.8%  17.8%      3.0  35.3% main.(*Benchmark).Run
     1.5  17.8%  35.6%      1.5  17.8% main.(*Monitor).Run
     1.5  17.8%  53.4%      1.5  17.8% main.NewBenchmark
     1.5  17.5%  70.9%      1.5  17.5% net/textproto.MIMEHeader.Set
     1.0  11.7%  82.5%      1.0  11.7% bufio.NewReaderSize
     1.0  11.7%  94.2%      1.0  11.7% bufio.NewWriterSize
     0.5   5.8% 100.0%      0.5   5.8% newdefer
     0.0   0.0% 100.0%      0.5   5.8% bufio.(*Writer).Flush
     0.0   0.0% 100.0%      1.0  11.7% bufio.NewReader
     0.0   0.0% 100.0%      1.0  11.7% bufio.NewWriter

time.NewTimer 從清單中消失了,接下來還有 8.6 MB 的空間在程式結束前沒被回收,下一篇繼續來追蹤

No comments: