[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Memory allocation performance

Alexander Motin wrote:
Julian Elischer пишет:
Alexander Motin wrote:

While profiling netgraph operation on UP HEAD router I have found that huge amount of time it spent on memory allocation/deallocation:

        0.14  0.05  132119/545292      ip_forward <cycle 1> [12]
        0.14  0.05  133127/545292      fxp_add_rfabuf [18]
        0.27  0.10  266236/545292      ng_package_data [17]
[9]14.1 0.56  0.21  545292         uma_zalloc_arg [9]
        0.17  0.00  545292/1733401     critical_exit <cycle 2> [98]
        0.01  0.00  275941/679675      generic_bzero [68]
        0.01  0.00  133127/133127      mb_ctor_pack [103]

        0.15  0.06  133100/545266      mb_free_ext [22]
        0.15  0.06  133121/545266      m_freem [15]
        0.29  0.11  266236/545266      ng_free_item [16]
[8]15.2 0.60  0.23  545266         uma_zfree_arg [8]
        0.17  0.00  545266/1733401     critical_exit <cycle 2> [98]
        0.00  0.04  133100/133100      mb_dtor_pack [57]
        0.00  0.00  134121/134121      mb_dtor_mbuf [111]

I have already optimized all possible allocation calls and those that left are practically unavoidable. But even after this kgmon tells that 30% of CPU time consumed by memory management.

So I have some questions:
1) Is it real situation or just profiler mistake?
2) If it is real then why UMA is so slow? I have tried to replace it in some places with preallocated TAILQ of required memory blocks protected by mutex and according to profiler I have got _much_ better results. Will it be a good practice to replace relatively small UMA zones with preallocated queue to avoid part of UMA calls? 3) I have seen that UMA does some kind of CPU cache affinity, but does it cost so much that it costs 30% CPU time on UP router?

given this information, I would add an 'item cache' in ng_base.c
(hmm do I already have one?)

That was actually my second question. As there is only 512 items by default and they are small in size I can easily preallocate them all on boot. But is it a good way? Why UMA can't do just the same when I have created zone with specified element size and maximum number of objects? What is the principal difference?

who knows what uma does.. but if you do it yourself you know what the overhead is.. :-)

freebsd-performance_(_at_)_freebsd_(_dot_)_org mailing list
To unsubscribe, send any mail to "freebsd-performance-unsubscribe_(_at_)_freebsd_(_dot_)_org"