[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory allocation performance



Robert Watson wrote:
Hence my request for drilling down a bit on profiling -- the question I'm asking is whether profiling shows things running or taking time that shouldn't be.

I have not yet understood why does it happend, but hwpmc shows huge amount of "p4-resource-stall"s in UMA functions:
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 45.2    2303.00  2303.00        0  100.00%           uma_zfree_arg [1]
 41.2    4402.00  2099.00        0  100.00%           uma_zalloc_arg [2]
1.4 4472.00 70.00 0 100.00% uma_zone_exhausted_nolock [3]
  0.9    4520.00    48.00        0  100.00%           ng_snd_item [4]
  0.8    4562.00    42.00        0  100.00%           __qdivrem [5]
  0.8    4603.00    41.00        0  100.00%           ether_input [6]
  0.6    4633.00    30.00        0  100.00%           ng_ppp_prepend [7]

Probably it explains why "p4-global-power-events" shows many hits into them
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 20.0   37984.00 37984.00        0  100.00%           uma_zfree_arg [1]
 17.8   71818.00 33834.00        0  100.00%           uma_zalloc_arg [2]
  4.0   79483.00  7665.00        0  100.00%           ng_snd_item [3]
  3.0   85256.00  5773.00        0  100.00%           __mcount [4]
  2.3   89677.00  4421.00        0  100.00%           bcmp [5]
  2.2   93853.00  4176.00        0  100.00%           generic_bcopy [6]

, while "p4-instr-retired" does not.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 11.1    5351.00  5351.00        0  100.00%           ng_apply_item [1]
7.9 9178.00 3827.00 0 100.00% legacy_pcib_alloc_msi [2]
  4.1   11182.00  2004.00        0  100.00%           init386 [3]
  4.0   13108.00  1926.00        0  100.00%           rn_match [4]
  3.5   14811.00  1703.00        0  100.00%           uma_zalloc_arg [5]
  2.6   16046.00  1235.00        0  100.00%           SHA256_Transform [6]
  2.2   17130.00  1084.00        0  100.00%           ng_add_hook [7]
  2.0   18111.00   981.00        0  100.00%           ng_rmhook_self [8]
  2.0   19054.00   943.00        0  100.00%           em_encap [9]

For this moment I have invent two possible explanation. One is that due to UMA's cyclic block allocation order it does not fits CPU caches and another that it is somehow related to critical_exit(), which possibly can cause context switch. Does anybody have better explanation how such small and simple in this part function can cause such results?

--
Alexander Motin
_______________________________________________
freebsd-performance_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscribe_(_at_)_freebsd_(_dot_)_org"


Visit your host, monkey.org