[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory allocation performance



Julian Elischer пишет:
Alexander Motin wrote:
Hi.

While profiling netgraph operation on UP HEAD router I have found that huge amount of time it spent on memory allocation/deallocation:

        0.14  0.05  132119/545292      ip_forward <cycle 1> [12]
        0.14  0.05  133127/545292      fxp_add_rfabuf [18]
        0.27  0.10  266236/545292      ng_package_data [17]
[9]14.1 0.56  0.21  545292         uma_zalloc_arg [9]
        0.17  0.00  545292/1733401     critical_exit <cycle 2> [98]
        0.01  0.00  275941/679675      generic_bzero [68]
        0.01  0.00  133127/133127      mb_ctor_pack [103]

        0.15  0.06  133100/545266      mb_free_ext [22]
        0.15  0.06  133121/545266      m_freem [15]
        0.29  0.11  266236/545266      ng_free_item [16]
[8]15.2 0.60  0.23  545266         uma_zfree_arg [8]
        0.17  0.00  545266/1733401     critical_exit <cycle 2> [98]
        0.00  0.04  133100/133100      mb_dtor_pack [57]
        0.00  0.00  134121/134121      mb_dtor_mbuf [111]

I have already optimized all possible allocation calls and those that left are practically unavoidable. But even after this kgmon tells that 30% of CPU time consumed by memory management.

So I have some questions:
1) Is it real situation or just profiler mistake?
2) If it is real then why UMA is so slow? I have tried to replace it in some places with preallocated TAILQ of required memory blocks protected by mutex and according to profiler I have got _much_ better results. Will it be a good practice to replace relatively small UMA zones with preallocated queue to avoid part of UMA calls? 3) I have seen that UMA does some kind of CPU cache affinity, but does it cost so much that it costs 30% CPU time on UP router?

given this information, I would add an 'item cache' in ng_base.c
(hmm do I already have one?)

That was actually my second question. As there is only 512 items by default and they are small in size I can easily preallocate them all on boot. But is it a good way? Why UMA can't do just the same when I have created zone with specified element size and maximum number of objects? What is the principal difference?

--
Alexander Motin
_______________________________________________
freebsd-performance_(_at_)_freebsd_(_dot_)_org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "freebsd-performance-unsubscribe_(_at_)_freebsd_(_dot_)_org"