PWN之堆內存管理

Author：jmpews（知道創宇404安全實驗室)

本文涉及到的相關資料可以參考：https://github.com/jmpews/pwn2exploit/

關于堆的分配原理我覺的這篇文章《glibc內存管理ptmalloc源代碼分析》已經說得很詳細。但是我盡力用 glibc 源碼和自己的理解總結去概述, 本文章在說明時盡可能引用 glibc-2.19 中具體的代碼和介紹, 用一些實例代碼作為驗證, 以及自己對 ptmalloc 的理解。

分析堆的相關工具

在 Phrack 的一篇文章中《Advanced Doug Lea's malloc exploits》, 有一小節講到 Heap layout analysis 作者利用了 main_arena 這個靜態全局變量, 進行 heap dump 工作, 這里需要注意的是, 需要安裝 libc6-dbg 以獲取 debugging symbols, 此細節部分請查看 參考資料/glibc的調試相關.

這里介紹幾個工具, 用于堆空間分配的分析.

下面幾個工具都是解析堆空間分配的

下面幾個工具大同小異, 簡單介紹下原理, 都是采用 python 的 gdb 的 API, 關于 API 有一篇文章: https://sourceware.org/gdb/onlinedocs/gdb/Breakpoints-In-Python.html#Breakpoints-In-Python

之后通過 cat /proc/PID/maps 獲取 heap base, 通過 gdb 的 x/ 查看內存, 通過 debugging symbols 獲取 main_arena 地址

https://github.com/cloudburst/libheap
https://github.com/Mipu94/peda-heap
https://github.com/hugsy/gef
https://github.com/pwndbg/pwndbg

ltrace:
通過 ltrace 函數跟蹤庫函數調用。關于 ltrace 的內部實現有一篇介紹文章, ltrace_internals.pdf, 這里說一下大致原理, 起一個進程執行命令后, 根據 PID 拿到可執行文件, 之后按照 ELF 解析可執行文件, 拿到符號列表, 之后使用 ptrace attach 到 PID 上, 并在所有函數符號上插入斷點。
通過 LD_PRELOAD 的 hook 方式跟蹤內存分配函數, 這也是 Phrack 中《Advanced Doug Lea's malloc exploits》利用的方法, 缺點就是需要重新執行程序: https://github.com/nihilus/HeapTracer/blob/master/linux-native

堆內存分配(ptmalloc設計)的思考

下面僅僅是個人在閱讀完 ptmalloc 的分配和釋放算法之后的一些關于 ptmalloc 設計上的一些想法, 畢竟之前是做開發的,

0. 為什么需要 ptmalloc

首先內存的分配和回收很頻繁的, 這也就是其他語言希望實現高效的 GC, 針對頻繁的操作, 第一個想到的解決方法就是緩存, 這也就是為什么 ptmalloc 存在各種各樣的緩沖區. 假如不存在緩沖區, 每次分配都需要觸發系統調用賊慢. 接下來就要引出 ptmalloc 涉及到的幾種緩存, 這里只是概念性的解釋幾種緩存, 具體會在下文詳細介紹.

1. Bins

為了避免每次觸發系統調用, 首先想到的解決方法就是釋放的內存暫時不歸還給系統, 標記為空閑, 等下一次再需要相同大小時, 直接使用這塊空閑內存即可. （存儲結構是雙向環鏈表, 類似 hash 表, hash 算法就是 chunk 的長度, 用雙向環鏈表解決 hash 沖突)

這就涉及到, 剛剛釋放的內存什么時候加到 Bins ? 相鄰的兩個空閑 chunk 什么時候合并? 怎么合并?

2. Top

另一個應該想到的就是, 可以先利用系統調用 brk() 分配一塊比較大的內存作為緩存, 之后即使沒有在 Bins 中也找不到, 也不需要每次觸發系統調用, 直接切割這塊大的內存即可.

這就涉及到 '這塊大內存' 什么時候重新補充大小(不斷切割會導致 top 變小)? 什么時候需要縮小(歸還給系統)?

3. Fastbins

Bins 和 Top 緩存是最基本的, 如果想要做進一步的優化, 其實就是更細分的緩存, 也就是更準確的命中緩存, 這里 Fastbins 存在的更具體的原因是 避免 chunk 重復切割合并.

如果了解過 Python 源碼的同學可能會更理解, 這里的 Fastbins 類似于 Python 中整數對象 PyIntObject 的小整數 small_ints, 這里也只是理念類似, small_ints 準確的說是預先初始化, 可以一直重復使用而不被釋放.

Ok, 再回到 Fastbins 的討論, 對于長度很小的 chunk 在釋放后不會放到 Bins, 也不會標記為空閑, 這就避免了合并, 下次分配內存時首先查找 Fastbins, 這就避免了切割.

4. Unsorted bin

Unsorted 是更細粒度的緩存, 屬于 '剛剛釋放的內存'與 Bins 之間的緩存.

在 1. Bins 中提到一個問題, 剛剛釋放的內存什么時候加到 Bins ? 這其實就與 Unsorted 有關, 剛剛釋放的內存會先放到 Unsorted 緩存, 在下一次內存分配時, 會優先于 Bins 查找, 如果能命中 Unsorted 緩沖最好, 否則就把 Unsorted 中的 chunk 統一整理到對應 Bins.

5. last_remainder

這其實也是一個緩存, 是針對于切割時使用的, 大致就是希望一直切割同一個 chunk. 在遍歷 Unsorted 時使用, 但是它的使用是有條件的.

以上就是在閱讀完 ptmalloc 分配內存那一部分代碼后對 ptmalloc 的緩存設計上的一些想法. 下面會具體介紹 ptmalloc 在進行堆內存用到的各種具體的數據結構.

chunk 結構

這里先貼出一段 glibc-2.19/malloc/malloc.c 中關于 chunk 的解釋. 不再詳細解釋.

boundary tag 邊界標記, 關于它下文會進行介紹

INTERNAL_SIZE_T 頭部損耗, 參考 eglibc-2.19/malloc/malloc.c:299, 其實也就是 size_t.

eglibc-2.19/malloc/malloc.c:1094
/*
  -----------------------  Chunk representations -----------------------
*/


/*
  This struct declaration is misleading (but accurate and necessary).
  It declares a "view" into memory allowing access to necessary
  fields at known offsets from a given base. See explanation below.
*/

// 一個 chunk 的完整結構體
struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};


/*
   malloc_chunk details:

    (The following includes lightly edited explanations by Colin Plumb.)

    // chunk 的內存管理采用邊界標識的方法, 空閑 chunk 的 size 在該 chunk 的 size 字段和下一個 chunk 的 pre_size 字段都有記錄
    Chunks of memory are maintained using a `boundary tag' method as
    described in e.g., Knuth or Standish.  (See the paper by Paul
    Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
    survey of such techniques.)  Sizes of free chunks are stored both
    in the front of each chunk and at the end.  This makes
    consolidating fragmented chunks into bigger chunks very fast.  The
    size fields also hold bits representing whether chunks are free or
    in use.

    An allocated chunk looks like this:

    // 正在使用的 chunk 布局
    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Size of previous chunk, if allocated            | |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             User data starts here...                          .
      .                                                               .
      .             (malloc_usable_size() bytes)                      .
      .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Size of chunk                                     |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    // 幾個術語規定, 'chunk' 就是整個 chunk 開頭, 'mem' 就是用戶數據的開始, 'Nextchunk' 就是下一個 chunk 的開頭
    Where "chunk" is the front of the chunk for the purpose of most of
    the malloc code, but "mem" is the pointer that is returned to the
    user.  "Nextchunk" is the beginning of the next contiguous chunk.

    // chunk 是雙字長對齊
    Chunks always begin on even word boundaries, so the mem portion
    (which is returned to the user) is also on an even word boundary, and
    thus at least double-word aligned.

    // 空閑 chunk 被存放在雙向環鏈表
    Free chunks are stored in circular doubly-linked lists, and look like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Size of previous chunk                            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                         |P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Forward pointer to next chunk in list             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Back pointer to previous chunk in list            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |             Unused space (may be 0 bytes long)                .
      .                                                               .
      .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    // P 標志位不能放在 size 字段的低位字節, 用于表示前一個 chunk 是否在被使用, 如果為 0, 表示前一個 chunk 空閑, 同時 pre_size 也表示前一個空閑 chunk 的大小, 可以用于找到前一個 chunk 的地址, 方便合并空閑 chunk, 但 chunk 剛一開始分配時默認 P 為 1. 如果 P 標志位被設置, 也就無法獲取到前一個 chunk 的 size, 也就拿不到前一個 chunk 地址, 也就無法修改正在使用的 chunk, 但是這是無法修改前一個 chunk, 但是可以通過本 chunk 的 size 獲得下一個 chunk 的地址. 
    The P (PREV_INUSE) bit, stored in the unused low-order bit of the
    chunk size (which is always a multiple of two words), is an in-use
    bit for the *previous* chunk.  If that bit is *clear*, then the
    word before the current chunk size contains the previous chunk
    size, and can be used to find the front of the previous chunk.
    The very first chunk allocated always has this bit set,
    preventing access to non-existent (or non-owned) memory. If
    prev_inuse is set for any given chunk, then you CANNOT determine
    the size of the previous chunk, and might even get a memory
    addressing fault when trying to do so.

    Note that the `foot' of the current chunk is actually represented
    as the prev_size of the NEXT chunk. This makes it easier to
    deal with alignments etc but can be very confusing when trying
    to extend or adapt this code.

    The two exceptions to all this are

    // 這里的 the trailing size 是指下一個 chunk 的 pre_size, 因為 top 位于最高地址, 不存在相鄰的下一個 chunk, 同時這里也解答了上面關于 top 什么時候重新填滿
     1. The special chunk `top' doesn't bother using the
  trailing size field since there is no next contiguous chunk
  that would have to index off it. After initialization, `top'
  is forced to always exist.  If it would become less than
  MINSIZE bytes long, it is replenished.

     2. Chunks allocated via mmap, which have the second-lowest-order
  bit M (IS_MMAPPED) set in their size fields.  Because they are
  allocated one-by-one, each must contain its own trailing size field.

*/

閱讀文檔, 是理解的最快的方式之一.

P (PREV_INUSE) 標志位表示前一個 chunk 是否在使用, 0 為沒有在使用.

prev_size 表示前一個 chunk 的大小, 僅在 P (PREV_INUSE) 為 0 時有效, 也就是前一個 chunk 為空閑狀態.

size 表示該整個 chunk 大小, 并非 malloc 返回值.

fd, bk, fd_nextsize, fd_nextsize 是對于空閑 chunk 而言, 對于正在使用的 chunk, 從當前位置開始就是 malloc 返回給用戶可用的空間.

fd, bk 組成了 Bins 的雙向環鏈表

對于空閑的 chunk 空間布局, 見上, 是環形雙向鏈表. 存放在空閑 chunk 容器中.

關于 chunk 有一些操作, 判斷前一個是否在使用, 判斷下一個 chunk 是否正在使用, 是不是 mmap 分配的, 以及對標志位 P 等的操作, 可以參考 glibc-2.19/malloc/malloc.c:1206 中 Physical chunk operations 一小節(直接搜素該關鍵字即可).

邊界標示

對于 chunk 的空間布局組織采用邊界標示的方法, chunk 的存儲是一段連續的內存, 其實就是 chunk 頭部保存長度信息, 可以在適當的時候獲取到前一個和后一個 chunk.

這里涉及到 chunk 到用戶請求 mem 的想換轉化操作, 以及對齊操作等. 請參考 glibc-2.19/malloc/malloc.c:1258

空間復用

對于正在使用 chunk, 它的下一個 chunk 的 prev_size 是無效的, 所以這塊內存被當前 chunk 給借用了, 因此對于請求分配 chunk 大小分配公式是 chunk_size = (用戶請求大小 + (2 - 1) * sizeof(INTERNAL_SIZE_T)) align to 2 * sizeof(size_t)

最后請參考 eglibc-2.19/malloc/malloc.c:44, 會指出一些默認參數值, 以及關于 chunk 的最小 size 和對齊的相關說明. 這里列出來了一小部分.

  Supported pointer representation:       4 or 8 bytes
  Supported size_t  representation:       4 or 8 bytes
       Note that size_t is allowed to be 4 bytes even if pointers are 8.
       You can adjust this by defining INTERNAL_SIZE_T

  Alignment:                              2 * sizeof(size_t) (default)
       (i.e., 8 byte alignment with 4byte size_t). This suffices for
       nearly all current machines and C compilers. However, you can
       define MALLOC_ALIGNMENT to be wider than this if necessary.

  Minimum overhead per allocated chunk:   4 or 8 bytes
       Each malloced chunk has a hidden word of overhead holding size
       and status information.

  Minimum allocated size: 4-byte ptrs:  16 bytes    (including 4 overhead)
              8-byte ptrs:  24/32 bytes (including, 4/8 overhead)

       When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
       ptrs but 4 byte size) or 24 (for 8/8) additional bytes are
       needed; 4 (8) for a trailing size field and 8 (16) bytes for
       free list pointers. Thus, the minimum allocatable size is
       16/24/32 bytes.

       Even a request for zero bytes (i.e., malloc(0)) returns a
       pointer to something of the minimum allocatable size.

       The maximum overhead wastage (i.e., number of extra bytes
       allocated than were requested in malloc) is less than or equal
       to the minimum size, except for requests >= mmap_threshold that
       are serviced via mmap(), where the worst case wastage is 2 *
       sizeof(size_t) bytes plus the remainder from a system page (the
       minimal mmap unit); typically 4096 or 8192 bytes.

翻譯幾個關鍵的點, chunk 的大小需要按照 Alignment 進行對齊, 每一個被分配的 chunk 都有一個字的頭部消耗, 包含該 chunk 的大小以及狀態信息, 具體會在 chunk 結構和邊界標示說明.

空閑容器(緩存)

下面會介紹 ptmalloc 中存在的各種空閑容器

Bins

eglibc-2.19/malloc/malloc.c:1341
/*
   -------------------- Internal data structures --------------------

   All internal state is held in an instance of malloc_state defined
   below. There are no other static variables, except in two optional
   cases:
 * If USE_MALLOC_LOCK is defined, the mALLOC_MUTEx declared above.
 * If mmap doesn't support MAP_ANONYMOUS, a dummy file descriptor
     for mmap.

   Beware of lots of tricks that minimize the total bookkeeping space
   requirements. The result is a little over 1K bytes (for 4byte
   pointers and size_t.)
 */

/*
   Bins
    // Bins 就是由空閑 chunk - bin 組成數組, 每一個 bin 都是雙向鏈表. Bin 存放是整理過的 chunks, 并且 bin 中合并過的空閑 chunk 是不存在相鄰的, 所以 bin 中的每一個 chunk 都是可以被使用, 并且都是緊挨著正在使用的 chunk 或者 heap  內存末尾.
    An array of bin headers for free chunks. Each bin is doubly
    linked.  The bins are approximately proportionally (log) spaced.
    There are a lot of these bins (128). This may look excessive, but
    works very well in practice.  Most bins hold sizes that are
    unusual as malloc request sizes, but are more usual for fragments
    and consolidated sets of chunks, which is what these bins hold, so
    they can be found quickly.  All procedures maintain the invariant
    that no consolidated chunk physically borders another one, so each
    chunk in a list is known to be preceeded and followed by either
    inuse chunks or the ends of memory.

    // bins 中的 chunk 是按照大小排序的. FIFO, small bins 是不存在按大小排序的, 因為每一個 small bin 都是相同 size 的. 但是對于 large bin 是需要按照順序插入的. 這樣可以在內存分配時很快查找到合適內存.
    Chunks in bins are kept in size order, with ties going to the
    approximately least recently used chunk. Ordering isn't needed
    for the small bins, which all contain the same-sized chunks, but
    facilitates best-fit allocation for larger chunks. These lists
    are just sequential. Keeping them in order almost never requires
    enough traversal to warrant using fancier ordered data
    structures.

    // FIFO, 從頭部插入節點, 尾部取節點. 這樣有個特定就是更容易內存的合并.
    Chunks of the same size are linked with the most
    recently freed at the front, and allocations are taken from the
    back.  This results in LRU (FIFO) allocation order, which tends
    to give each chunk an equal opportunity to be consolidated with
    adjacent freed chunks, resulting in larger free chunks and less
    fragmentation.

    To simplify use in double-linked lists, each bin header acts
    as a malloc_chunk. This avoids special-casing for headers.
    But to conserve space and improve locality, we allocate
    only the fd/bk pointers of bins, and then use repositioning tricks
    to treat these as the fields of a malloc_chunk*.
 */

ptmalloc 采用分箱式管理空閑 chunk, 也就是 Bins. Bins 本身就是一個數組, 每一個存放的是一個對應長度的 chunk 雙向環鏈表的頭結點和尾節點. 相同 Size 的 chunk 才能組成一個環,Bins 是按大小依次進行存放.

關于 Bins 為什么定義為 mchunkptr bins[NBINS * 2 - 2] 而不是 mchunkptr bins[NBINS * 4 - 2], 是如何少一倍的空間實現的雙向鏈表, 可以參考 glibc內存管理ptmalloc源代碼分析.pdf, 這里大致說一下, 對于雙向環的的標志頭節點, 它的 prev_size 和 size 是無用的, 所以直接省略, 但是還要把它當成正確的 chunk 結構. 這里的 trick 就在于 bin_at 宏, 返回了偽造的 fake chunk 的地址, 這里和 Double Free 以及 unlink繞過的利用手法類似, 之后會在 Double Free 漏洞詳細說明.

/* addressing -- note that bin_at(0) does not exist */
#define bin_at(m, i) \
  (mbinptr) (((char *) &((m)->bins[((i) - 1) * 2]))               \
             - offsetof (struct malloc_chunk, fd))

這里舉一個例子, 只摘取一部分, 完整的例子, 在下方的 ptmalloc 利用部分.

# 查看 unsorted bin 的地址, 其實也就是 bin[1] 的地址
(gdb) heap -b
===================================Heap Dump===================================

unsorted bin @ 0x7ffff7dd1b88
        free chunk @ 0x602160 - size 0x90

        free chunk @ 0x6020b0 - size 0x90

        free chunk @ 0x602000 - size 0x90
# 這里的 0x7ffff7dd1B78 也就是 bin_at 返回的地址, 返回了一個偽造的 chunk 的地址
# 其實這里的 fd 和 bk 才真正屬于 bin[1] 的內容.
(gdb) p *(mfastbinptr)0x7ffff7dd1B78
$17 = {prev_size = 6300176, size = 0, fd = 0x602160, bk = 0x602000, fd_nextsize = 0x7ffff7dd1b88 <main_arena+104>, bk_nextsize = 0x7ffff7dd1b88 <main_arena+104>}

small bins, large bins

對于 chunk size < 512, 是存放在 small bins, 有 64 個, 每個 bin 是以 8 bytes 作為分割邊界, 也就相當于等差序列, 舉個例子: small bins 中存放的第一個 chunk 雙向環鏈表 全部都是由 size 為 16 bytes 大小的 chunk 組成的, 第二個 chunk 雙向環鏈表 都是由 size 為 16+8 bytes 大小的 chunk 組成的. 但是對于 large bins, 分割邊界是遞增的, 舉個簡單例子: 前 32 個 large bins 的分割邊界都是 64 bytes, 之后 16 個 large bins 的分割邊界是 512 bytes. 以上僅為字長為 32 位的情況下, 具體請參考如下.

eglibc-2.19/malloc/malloc.c:1436
/*
   Indexing

    Bins for sizes < 512 bytes contain chunks of all the same size, spaced
    8 bytes apart. Larger bins are approximately logarithmically spaced:

    64 bins of size       8
    32 bins of size      64
    16 bins of size     512
     8 bins of size    4096
     4 bins of size   32768
     2 bins of size  262144
     1 bin  of size what's left

    There is actually a little bit of slop in the numbers in bin_index
    for the sake of speed. This makes no difference elsewhere.

    The bins top out around 1MB because we expect to service large
    requests via mmap.

    Bin 0 does not exist.  Bin 1 is the unordered list; if that would be
    a valid chunk size the small bins are bumped up one.
 */

這涉及到如何根據 Size 在 Bins 中查到對應的 bin, 可以參考 eglibc-2.19/malloc/malloc.c:1460

large bin 有些特殊, 空閑 chunk 的存放需要排序, large_bin->bk 為最小 size 的 chunk, large_bin->fd 為最大 size 的 chunk.

Fastbins

關于 Fastbins 的介紹, 可以參考 eglibc-2.19/malloc/malloc.c:1570

/*
   Fastbins
    // 單向鏈表, LIFO 規則
    An array of lists holding recently freed small chunks.  Fastbins
    are not doubly linked.  It is faster to single-link them, and
    since chunks are never removed from the middles of these lists,
    double linking is not necessary. Also, unlike regular bins, they
    are not even processed in FIFO order (they use faster LIFO) since
    ordering doesn't much matter in the transient contexts in which
    fastbins are normally used.

    Chunks in fastbins keep their inuse bit set, so they cannot
    be consolidated with other free chunks. malloc_consolidate
    releases all chunks in fastbins and consolidates them with
    other free chunks.
 */

當進行內存分配時先從 Fastbins 中進行查找, 之后才在 Bins 進行查找; 釋放內存時, 當chunk size < max_fast 會先存放到 Fastbins.

另一個需要注意的點就是 Fastbins 的合并(清空), 也就是 malloc_consolidate 這個函數的工作.

何時會觸發 malloc_consolidate(僅對 _int_malloc 函數而言) ?
small bins 尚未初始化
需要 size 大于 small bins
malloc_consolidate 如何進行合并 ?

遍歷 Fastbins 中的 chunk, 設置每個 chunk 的空閑標志位為 0, 并合并相鄰的空閑 chunk, 之后把該 chunk 存放到 unsorted bin 中.

Fastbins 是單向鏈表, 可以通過 fastbin->fd 遍歷 Fastbins.

unsorted bin

只有一個 unsorted bin, 進行內存分配查找時先在 Fastbins, small bins 中查找, 之后會在 unsorted bin 中進行查找, 并整理 unsorted bin 中所有的 chunk 到 Bins 中對應的 Bin. unsorted bin 位于 bin[1].

unsorted_bin->fd 指向雙向環鏈表的頭結點, unsorted_bin->bk 指向雙向環鏈表的尾節點, 在頭部插入新的節點.

top chunk

以下引用來自《glibc內存管理ptmalloc源代碼分析》。

對于非主分配區會預先從 mmap 區域分配一塊較大的空閑內存模擬 sub-heap,通過管理 sub-heap 來響應用戶的需求,因為內存是按地址從低向高進行分配的,在空閑內存的最高處,必然存在著一塊空閑 chunk,叫做 top chunk.當 bins 和 fast bins 都不能滿足分配需要的時候,ptmalloc 會設法在 top chunk 中分出一塊內存給用戶,如果 top chunk 本身不夠大, 分配程序會重新分配一個 sub-heap,并將 top chunk 遷移到新的 sub-heap 上, 新的 sub-heap 與已有的 sub-heap 用單向鏈表連接起來,然后在新的 top chunk 上分配所需的內存以滿足分配的需要,實際上,top chunk 在分配時總是在 fast bins 和 bins 之后被考慮,所以,不論 top chunk 有多大,它都不會被放到 fast bins 或者是 bins 中. top chunk 的大小是隨著分配和回收不停變換的,如果從 top chunk 分配內存會導致 top chunk 減小,如果回收的 chunk 恰好與 top chunk 相鄰,那么這兩個 chunk 就會合并成新的 top chunk,從而使 top chunk 變大. 如果在 free 時回收的內存大于某個閾值,并且 top chunk 的大小也超過了收縮閾值,ptmalloc 會收縮 sub-heap,如果 top-chunk 包含了整個 sub-heap,ptmalloc 會調用 munmap 把整個 sub-heap 的內存返回給操作系統.

由于主分配區是唯一能夠映射進程 heap 區域的分配區,它可以通過 sbrk()來增大或是收縮進程 heap 的大小,ptmalloc 在開始時會預先分配一塊較大的空閑內存 (也就是所謂的 heap), 主分配區的 top chunk 在第一次調用 mallocd 時會分配一塊(chunk_size + 128KB) align 4KB 大小的空間作為初始的 heap,用戶從 top chunk 分配內存時,可以直接取出一塊內存給用戶.在回收內存時,回收的內存恰好與 top chunk 相鄰則合并成新的 top chunk,當該次回收的空閑內存大小達到某個閾值,并且 top chunk 的大小也超過了收縮閾值,會執行內存收縮,減小 top chunk 的大小,但至少要保留一個頁大小的空閑內存,從而把內存歸還給操作系統.如果向主分配區的 top chunk 申請內存,而 top chunk 中沒有空閑內存, ptmalloc 會調用 sbrk()將的進程 heap 的邊界 brk 上移, 然后修改 top chunk 的大小.

top chunk 位于最高地址.

mmaped chunk

當需要分配的 chunk 足夠大,而且 fast bins 和 bins 都不能滿足要求,甚至 top chunk 本身也不能滿足分配需求時,ptmalloc 會使用 mmap 來直接使用內存映射來將頁映射到進程空間.這樣分配的 chunk 在被 free 時將直接解除映射,于是就將內存歸還給了操作系統,再次對這樣的內存區的引用將導致 segmentation fault 錯誤.這樣的 chunk 也不會包含在任何 bin 中.

Last remainder

Last remainder 是另外一種特殊的 chunk,就像 top chunk 和 mmaped chunk 一樣,不會在任何 bins 中找到這種 chunk.當需要分配一個 small chunk, 但在 small bins 中找不到合適的 chunk, 如果 last remainder chunk 的大小大于所需的 small chunk 大小,last remainder chunk 被分裂成兩個 chunk, 其中一個 chunk 返回給用戶, 另一個 chunk 變成新的 last remainder chuk.

需要注意的是, 僅在請求 small chunk 才使用. 具體可以參考 eglibc-2.19/malloc/malloc.c:3459

malloc_state

只存在一個主分區, 但是允許多個非主分區, 主分配區域可以訪問 heap 區域和 mmap 區域, 非主分區只能訪問 mmap 區域, 每次用 mmap 分配一塊大小的內存當做 sub-heap, 用于模擬 heap. 具體細節可以參考 <glibc內存管理ptmalloc源代碼分析.pdf>, 每次進行內存分配必須加鎖請求一個分配區.

eglibc-2.19/malloc/malloc.c:1663
/*
   ----------- Internal state representation and initialization -----------
 */

struct malloc_state
{
  /* Serialize access.  */
  mutex_t mutex;

  /* Flags (formerly in max_fast).  */
  int flags;

#if THREAD_STATS
  /* Statistics for locking.  Only used if THREAD_STATS is defined.  */
  long stat_lock_direct, stat_lock_loop, stat_lock_wait;
#endif

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  */
  struct malloc_state *next_free;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

關于 malloc_init_state 的定義在:

eglibc-2.19/malloc/malloc.c:1768
/*
   Initialize a malloc_state struct.

   This is called only from within malloc_consolidate, which needs
   be called in the same contexts anyway.  It is never called directly
   outside of malloc_consolidate because some optimizing compilers try
   to inline it at all call points, which turns out not to be an
   optimization at all. (Inlining it in malloc_consolidate is fine though.)
 */

static void
malloc_init_state (mstate av)
{

在 eglibc-2.19/malloc/malloc.c:1741 有一個已經初始化的主分配區 main_arena, 根據 ELF 的結構解析, 已初始化的全局變量存放在 .data 段, 下圖作為實踐.

# 33 是 Section 的 Index
λ : readelf -s /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so | grep main_arena 
   915: 00000000003c3b20  2192 OBJECT  LOCAL  DEFAULT   33 main_arena
# 對應 33 的 Section 恰好為 .data
λ : readelf -S /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so | grep .data
  [16] .rodata           NOBITS           0000000000174720  000002b4
  [23] .tdata            NOBITS           00000000003bf7c0  001bf7c0
  [29] .data.rel.ro      NOBITS           00000000003bf900  001bf7c0
  [33] .data             NOBITS           00000000003c3080  001bf7c0

_int_malloc() 分析

先獲取分配區指針, 這個過程設計到分配區初始化和分配區加鎖, 之后使用 _int_malloc 進行核心的內存分配.

eglibc-2.19/malloc/malloc.c:3295
/*
   ------------------------------ malloc ------------------------------
 */

static void *
_int_malloc (mstate av, size_t bytes)
{

本來不想閱讀, 發現不讀根本不了解原理, 這一段分析來自《glibc內存管理ptmalloc源代碼分析》但是對其中幾個步驟做了補充和添加, 可以對比看一下 (以下針對 32 位字長)

ptmalloc 的響應用戶內存分配要求的具體步驟為:

獲取分配區的鎖, 為了防止多個線程同時訪問同一個分配區, 在進行分配之前需要取得分配區域的鎖. 線程先查看線程私有實例中是否已經存在一個分配區, 如果存在嘗試對該分配區加鎖, 如果加鎖成功, 使用該分配區分配內存, 否則, 該線程搜索分配區循環鏈表試圖獲得一個空閑（沒有加鎖）的分配區. 如果所有的分配區都已經加鎖, 那么 ptmalloc 會開辟一個新的分配區, 把該分配區加入到全局分配區循環鏈表和線程的私有實例中并加鎖, 然后使用該分配區進行分配操作. 開辟出來的新分配區一定為非主分配區, 因為主分配區是從父進程那里繼承來的. 開辟非主分配區時會調用 mmap()創建一個 sub-heap, 并設置好 top chunk.
將用戶的請求大小轉換為實際需要分配的 chunk 空間大小. 具體查看 request2size 宏 (malloc.c:3332)
判斷所需分配 chunk 的大小是否滿足 chunk_size <= max_fast (max_fast 默認為 64B), 如果是的話, 則轉下一步, 否則跳到第 5 步. (malloc.c:3340)
首先嘗試在 Fastbins 中查找所需大小的 chunk 分配給用戶. 如果可以找到, 則分配結束. 否則轉到下一步. (malloc.c:3340)
判斷所需大小是否處在 small bins 中, 即判斷 chunk_size < 512B 是否成立. 如果 chunk 大小處在 small bins 中, 則轉下一步, 否則轉到第 7 步. (malloc.c:3377)
根據所需分配的 chunk 的大小, 找到具體所在的某個 small bin, 從該 Bin 的尾部摘取一個恰好滿足大小的 chunk. 若成功, 則分配結束, 否則, 轉到 8. (malloc.c:3377)
到了這一步, 說明需要分配的是一塊大的內存, 于是, ptmalloc 首先會遍歷 Fastbins 中的 chunk, 將相鄰的空閑 chunk 進行合并, 并鏈接到 unsorted bin 中. 對于 Fastbins 的合并是由 malloc_consolidate 做處理. (malloc.c:3421)
遍歷 unsorted bin 中的 chunk, 如果請求的 chunk 是一個 small chunk, 且 unsorted bin 只有一個 chunk, 并且這個 chunk 在上次分配時被使用過(也就是 last_remainder), 并且 chunk 的大小大于 (分配的大小 + MINSIZE), 這種情況下就直接將該 chunk 進行切割, 分配結束, 否則繼續遍歷, 如果發現一個 unsorted bin 的 size 恰好等于需要分配的 size, 命中緩存, 分配結束, 否則將根據 chunk 的空間大小將其放入對應的 small bins 或是 large bins 中, 遍歷完成后, 轉入下一步. (malloc.c:3442)
到了這一步說明需要分配的是一塊大的內存, 并且 Fastbins 和 unsorted bin 中所有的 chunk 都清除干凈了. 從 large bins 中按照 “smallest-first, best-fit”(最小&合適, 也就是說大于或等于所需 size 的最小 chunk) 原則, 找一個合適的 chunk, 從中劃分一塊合適大小的 chunk 進行切割, 并將剩下的部分放到 unsorted bin, 若操作成功, 則分配結束, 否則轉到下一步. (malloc.c:3576)
到了這一步說明在對應的 bin 上沒有找到合適的大小, 無論是 small bin 還是 large bin, 對于 small bin, 如果沒有對應大小的 small bin, 只能 idx+1. 對于 large bin,在上一步的 large bin 并不一定能找到合適的 chunk 進行切割, 因為 large bins 間隔是很大的, 假如當前的 idx 的 large bin 只有一個 chunk, 但是所需 size 大于該 chunk, 這就導致找不到合適的, 只能繼續 idx+1, 最后都需要根據 bitmap 找到之后第一個非空閑的 bin. 在這兩種情況下找到的 bin 中的 chunk 一定可以進行切割或者全部分配(剩余的 size < MINSIZE) (malloc.c:3649)
如果仍然都沒有找到合適的 chunk, 那么就需要操作 top chunk 來進行分配了. 判斷 top chunk 大小是否滿足所需 chunk 的大小, 如果是, 則從 top chunk 中分出一塊來. 否則轉到下一步. (malloc.c:3749)
到了這一步, 說明 top chunk 也不能滿足分配要求, 所以, 于是就有了兩個選擇: 如果是主分配區, 調用 sbrk(), 增加 top chunk 大小；如果是非主分配區, 調用 mmap 來分配一個新的 sub-heap, 增加 top chunk 大小；或者使用 mmap()來直接分配. 在這里, 需要依靠 chunk 的大小來決定到底使用哪種方法. 判斷所需分配的 chunk 大小是否大于等于 mmap 分配閾值, 如果是的話, 則轉下一步, 調用 mmap 分配, 否則跳到第 13 步, 增加 top chunk 的大小. (malloc.c:3800)
使用 mmap 系統調用為程序的內存空間映射一塊 chunk_size align 4kB 大小的空間. 然后將內存指針返回給用戶.
判斷是否為第一次調用 malloc, 若是主分配區, 則需要進行一次初始化工作, 分配一塊大小為(chunk_size + 128KB) align 4KB 大小的空間作為初始的 heap. 若已經初始化過了, 主分配區則調用 sbrk()增加 heap 空間, 分主分配區則在 top chunk 中切割出一個 chunk, 使之滿足分配需求, 并將內存指針返回給用戶.

_int_free() 分析

eglibc-2.19/malloc/malloc.c:3808
/*
   ------------------------------ free ------------------------------
 */

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{

下面分析具體過程, 同樣是參考《glibc內存管理ptmalloc源代碼分析》但是對其中幾個步驟做了補充和添加, 可以對比看一下 (以下針對 32 位字長)

free()函數同樣首先需要獲取分配區的鎖, 來保證線程安全.
判斷傳入的指針是否為 0, 如果為 0, 則什么都不做, 直接 return.否則轉下一步.
判斷 chunk 的大小和所處的位置, 若 chunk_size <= max_fast, 并且 chunk 并不位于 heap 的頂部, 也就是說并不與 Top chunk 相鄰, 則轉到下一步, 否則跳到第 5 步.（因為與 top chunk 相鄰的 chunk(fastbin) ,會與 top chunk 進行合并, 所以這里不僅需要判斷大小, 還需要判斷相鄰情況）
將 chunk 放到 Fastbins 中, chunk 放入到 Fastbins 中時, 并不修改該 chunk 使用狀態位 P.也不與相鄰的 chunk 進行合并.只是放進去, 如此而已.這一步做完之后釋放便結束了, 程序從 free()函數中返回.
判斷所需釋放的 chunk 是否為 mmaped chunk, 如果是, 則調用 munmap()釋放 mmaped chunk, 解除內存空間映射, 該該空間不再有效.如果開啟了 mmap 分配閾值的動態調整機制, 并且當前回收的 chunk 大小大于 mmap 分配閾值, 將 mmap 分配閾值設置為該 chunk 的大小, 將 mmap 收縮閾值設定為 mmap 分配閾值的 2 倍, 釋放完成, 否則跳到下一步.
判斷前一個 chunk 是否處在使用中, 如果前一個塊也是空閑塊, 則合并.并轉下一步.
判斷當前釋放 chunk 的下一個塊是否為 top chunk, 如果是, 則轉第 9 步, 否則轉下一步.
判斷下一個 chunk 是否處在使用中, 如果下一個 chunk 也是空閑的, 則合并, 并將合并后的 chunk 放到 unsorted bin 中.注意, 這里在合并的過程中, 要更新 chunk 的大小, 以反映合并后的 chunk 的大小.并轉到第 10 步.
如果執行到這一步, 說明釋放了一個與 top chunk 相鄰的 chunk.則無論它有多大, 都將它與 top chunk 合并, 并更新 top chunk 的大小等信息.轉下一步. (malloc.c:3950)
判斷合并后的 chunk 的大小是否大于 FASTBIN_CONSOLIDATION_THRESHOLD（默認 64KB）, 如果是的話, 則會觸發進行 Fastbins 的合并操作(malloc_consolidate), Fastbins 中的 chunk 將被遍歷, 并與相鄰的空閑 chunk 進行合并, 合并后的 chunk 會被放到 unsorted bin 中. Fastbins 將變為空, 操作完成之后轉下一步.
判斷 top chunk 的大小是否大于 mmap 收縮閾值（默認為 128KB）, 如果是的話, 對于主分配區, 則會試圖歸還 top chunk 中的一部分給操作系統.但是最先分配的 128KB 空間是不會歸還的, ptmalloc 會一直管理這部分內存, 用于響應用戶的分配請求；如果為非主分配區, 會進行 sub-heap 收縮, 將 top chunk 的一部分返回給操作系統, 如果 top chunk 為整個 sub-heap, 會把整個 sub-heap 還回給操作系統.做完這一步之后, 釋放結束, 從 free() 函數退出.可以看出, 收縮堆的條件是當前 free 的 chunk 大小加上前后能合并 chunk 的大小大于 64k, 并且要 top chunk 的大小要達到 mmap 收縮閾值, 才有可能收縮堆.

特殊的分配情況舉例說明

下面幾個的演示例子中沒有使用到一些 heap 分析插件, 會在 ptmalloc 的利用那一步使用到 heap 分析的插件.

下面一段表明小于 Fastbins的size 在釋放后不會進行合并, 如果使用 gdb 查看 chunk 信息可以看到 P 標志位為 1, 這里需要注意的是看下一個 chunk 的 P 標志位, 而不是當前 chunk 的標志位, 這里就不進行演示了.

#include<stdio.h>
#include<stdlib.h>
void main()
{
  void *m1 = malloc(24);
  int t = 0;
  void * ms[200];

  for(t = 0; t < 200; t++)
    ms[t] = malloc(120); // default fastbin size

  malloc(24);

  for(t = 0; t < 200; t++)
    free(ms[t]);
  void *m2 = malloc(24);
  printf("%p\n",m1);
  printf("%p\n",m2);
}

// result:
λ : gcc -g -o test2 test2.c && ./test2
0x17c2010
0x17c8450

下面例子表明, 當 fast bin 的相鄰為空閑 chunk, 以及相鄰 top chunk 的情況, 都不會進行合并, 但是對于 top chunk 的情況有些特殊.

/*
  TRIM_FASTBINS controls whether free() of a very small chunk can
  immediately lead to trimming. Setting to true (1) can reduce memory
  footprint, but will almost always slow down programs that use a lot
  of small chunks.

  Define this only if you are willing to give up some speed to more
  aggressively reduce system-level memory footprint when releasing
  memory in programs that use many small chunks.  You can get
  essentially the same effect by setting MXFAST to 0, but this can
  lead to even greater slowdowns in programs using many small chunks.
  TRIM_FASTBINS is an in-between compile-time option, that disables
  only those chunks bordering topmost memory from being placed in
  fastbins.
*/

當設置 TRIM_FASTBINS=1 fast bin 會與相鄰的 top chunk 進行合并

λ : cat test5.c
#include<stdio.h>
#include<stdlib.h>
void main()
{
    void *m1 = malloc(500);
    void *m2 = malloc(40);
    malloc(1);
    void *m3 = malloc(80);
    free(m1);
    free(m2);
    void *m4 = malloc(40);

    free(m3);
    void *m5 = malloc(80);
    printf("m1, %p\n",m1);
    printf("m2, %p\n",m2);
    printf("m3, %p\n",m3);
    printf("m4, %p\n",m4);
    printf("m5, %p\n",m5);

}
// result:
λ : gcc -g -o test5 test5.c && ./test5
m1, 0x8b1010
m2, 0x8b1210
m3, 0x8b1260
m4, 0x8b1210
m5, 0x8b1260

下面的例子表明 small bin 在釋放后會相鄰合并的例子.

#include<stdio.h>
#include<stdlib.h>
void main()
{
  void *m1 = malloc(24);
  int t = 0;
  void * ms[200];

  for(t = 0; t < 200; t++)
    ms[t] = malloc(121); // small bin size

  malloc(24);

  for(t = 0; t < 200; t++)
    free(ms[t]);
  void *m2 = malloc(24);
  printf("%p\n",m1);
  printf("%p\n",m2);
}

// result:
λ : gcc -g -o test2 test2.c && ./test2
0xeab010
0xeab030

下面舉例說明 malloc_consolidate 的作用, 以及如何觸發 malloc_consolidate. 請仔細理解 m6, m7 和 m8.

#include<stdio.h>
#include<stdlib.h>
void main()
{
        void *m0 = malloc(24);
        void *m1 = malloc(24);
        void *m2 = malloc(0x200);
        void *m3 = malloc(0x100);
        void *m4 = malloc(24);
        void *m5 = malloc(24);
        malloc(121);
        free(m0);
        free(m1);
        free(m2);
        free(m3);
        free(m4);
        free(m5);


        malloc(0x350);
        void *m6 = malloc(0x360);
        malloc(1210); // 觸發 Fastbins 合并
        void *m7 = malloc(0x360);
        void *m8 = malloc(24);

        printf("m0,%p\n", m0);
        printf("m1,%p\n", m1);
        printf("m2,%p\n", m2);
        printf("m3,%p\n", m3);
        printf("m4,%p\n", m4);
        printf("m5,%p\n", m5);
        printf("m6,%p\n", m6);
        printf("m7,%p\n", m7);
        printf("m8,%p\n", m8);
}

result:
λ : gcc -g -o test3 test3.c && ./test3
m0,0x1bf7010
m1,0x1bf7030
m2,0x1bf7050
m3,0x1bf7260
m4,0x1bf7370
m5,0x1bf7390
m6,0x1bf77a0
m7,0x1bf7010
m8,0x1bf7380

下面舉例說明, 當 small bins 和 large bins 沒有找到對應合適 size 的 Bin, 需要切割的情況.

#include <stdio.h>
#include <stdlib.h>

void main()
{
    void * m1 = malloc(0x200);
    malloc(121);
    void * m2 = malloc(0x401);
    malloc(121);
    free(m2);
    void * m3 = malloc(24);
    free(m1);
    void * m4 = malloc(24);
    printf("m1, %p\n", m1);
    printf("m2, %p\n", m2);
    printf("m3, %p\n", m3);
    printf("m4, %p\n", m4);
    printf("sizeof(size_t) = %ld\n", sizeof(size_t));
}

result:
λ : gcc -g -o test1 test1.c && ./test1
m1, 0x1a66010
m2, 0x1a662b0
m3, 0x1a662b0 //切割 small bins
m4, 0x1a66010 //切割 large bins
sizeof(size_t) = 8

exploit 在 ptmalloc 中

首先明確大部分的關注點, 是在 leak infomation 和 aa4bmo.

對于 leak infomation, 需要所 dump 的地址內存放關鍵信息, 比如: 釋放后的 chunk 的 fd 和 bk.

對于 aa4bmo, 這一塊在另一篇《PWN之堆觸發》有完善的介紹和總結.

下面的一些分析實例會用到 heap 的分析插件, 并且會提到一些具體的實踐以對應之前的理論.

Leak Information (泄露關鍵信息)

Q: 什么是關鍵信息?

A: libc 地址, heap 地址

通過 ptmalloc 獲得的內存 chunk 在釋放后會變成上面提到的幾種緩存類型, 這里主要提一下 Fastbins, Bins 能夠泄漏什么關鍵信息.

分配區 main_arena 是已經初始化靜態全局變量存放在 libc.so.6 的 .data 位置, 可以通過 main_arena 泄露 libc 的基址.

下面是一個關于 Fastbins 的例子, Fastbins 是單向鏈表, 通過 fd 指針進行遍歷, 每次插入鏈表頭位置, 可以通過已經釋放的 Fastbin chunk 的 fd 指針 dump 到 heap 地址.

#include<stdio.h>                                                                                                                                                                                                  
#include<stdlib.h>                                                                                                                                                                                                 
#include<string.h>                                                                                                                                                                                                 
void main()                                                                                                                                                                                                        
{                                                                                                                                                                                                                  
    void * m1 = malloc(0x80-8);                                                                                                                                                                                    
    void * m2 = malloc(0x80-8);                                                                                                                                                                                    
    memset(m1, 65, 0x80-8);                                                                                                                                                                                        
    memset(m2, 65, 0x80-8);                                                                                                                                                                                        
    malloc(1);                                                                                                                                                                                                     
    free(m1);                                                                                                                                                                                                      
    free(m2);                                                                                                                                                                                                      
    printf("m1: %p\n", m1);                                                                                                                                                                                        
    printf("m2: %p\n", m2);                                                                                                                                                                                        
    printf("sizeof(size_t): %ld\n", sizeof(size_t));                                                                                                                                                               
}

# 主分配區
(gdb) P &main_arena 
$3 = (struct malloc_state *) 0x7ffff7dd1b20 <main_arena>
(gdb) p main_arena 
$2 = {mutex = 0, flags = 0, fastbinsY = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x602080, 0x0, 0x0, 0x0}, top = 0x602120, last_remainder = 0x0, bins = {...more... }, binmap = {0, 0, 0, 0}, next = 0x7ffff7dd1b20 <main_arena>, next_free = 0x0, attached_threads = 1, system_mem = 135168, max_system_mem = 135168}
# 同上
(gdb) heap
===================================Heap Dump===================================

Arena(s) found:
         arena @ 0x7ffff7dd1b20
# Fastbins 在釋放后, P 標志位不會被清空
(gdb) heap -l
===================================Heap Dump===================================

          ADDR             SIZE         STATUS
sbrk_base 0x602000
chunk     0x602000         0x80         (inuse)
chunk     0x602080         0x80         (inuse)
chunk     0x602100         0x20         (inuse)
chunk     0x602120         0x20ee0      (top)
sbrk_end  0x602001
# 查看 bins
(gdb) heap -b
===================================Heap Dump===================================
fast bin 6 @ 0x602080
        free chunk @ 0x602080 - size 0x80 
        free chunk @ 0x602000 - size 0x80 
# 通過觀察源碼和這里 Fastbins 的順序應該可以發現 Fastbins 是頭插入
(gdb) heap -f
====================================Fastbins====================================

[ fb 0 ] 0x7ffff7dd1b28  -> [ 0x0 ] 
[ fb 1 ] 0x7ffff7dd1b30  -> [ 0x0 ] 
[ fb 2 ] 0x7ffff7dd1b38  -> [ 0x0 ] 
[ fb 3 ] 0x7ffff7dd1b40  -> [ 0x0 ] 
[ fb 4 ] 0x7ffff7dd1b48  -> [ 0x0 ] 
[ fb 5 ] 0x7ffff7dd1b50  -> [ 0x0 ] 
[ fb 6 ] 0x7ffff7dd1b58  -> [ 0x602080 ] (128)
                              [ 0x602000 ] (128)
[ fb 7 ] 0x7ffff7dd1b60  -> [ 0x0 ] 
[ fb 8 ] 0x7ffff7dd1b68  -> [ 0x0 ] 
[ fb 9 ] 0x7ffff7dd1b70  -> [ 0x0 ]
# Fastbins 是根據 fd 指針進行遍歷
(gdb) p *(mchunkptr)0x602080
$4 = {prev_size = 4702111234474983745, size = 129, fd = 0x602000, bk = 0x4141414141414141, fd_nextsize = 0x4141414141414141, bk_nextsize = 0x4141414141414141}
# 這里 dump 之前 chunk 的內容可以拿到 heap 的地址
(gdb) x/wx 0x602090
0x602090:       0x00602000

下面是一個關于 Bins 的例子, Bins 是雙向環鏈表, 頭插入, 可以通過已經釋放的 Bin chunk 泄漏 libc 和 heap 地址.

這里需要理解一下由 malloc(0xB0-8); 的作用, 以及 Unstored bin 轉為 small bins 的過程. 這里如果不清楚可以對應 libc 源碼查看上面提到的 _int_malloc()的過程.

include<stdio.h>                                                                                                                                                                                                  
#include<stdlib.h>                                                                                                                                                                                                 
#include<string.h>                                                                                                                                                                                                 
void main()                                                                                                                                                                                                        
{                                                                                                                                                                                                                  
    void * m1 = malloc(0x90-8);                                                                                                                                                                                    
    malloc(1);                                                                                                                                                                                                     
    void * m2 = malloc(0x90-8);                                                                                                                                                                                    
    malloc(1);                                                                                                                                                                                                     
    void * m3 = malloc(0xA0-8);                                                                                                                                                                                    
    malloc(1);                                                                                                                                                                                                     
    memset(m1, 65, 0x90-8);                                                                                                                                                                                        
    memset(m2, 65, 0x90-8);                                                                                                                                                                                        
    memset(m3, 65, 0xA0-8);                                                                                                                                                                                        

    free(m1);                                                                                                                                                                                                      
    free(m2);                                                                                                                                                                                                      
    free(m3);                                                                                                                                                                                                      
    malloc(0xB0-8);                                                                                                                                                                                                
    printf("m1: %p\n", m1);                                                                                                                                                                                        
    printf("m2: %p\n", m2);                                                                                                                                                                                        
    printf("m3: %p\n", m3);                                                                                                                                                                                        
    printf("sizeof(size_t): %ld\n", sizeof(size_t));                                                                                                                                                               
} 

λ : gdb -q test2
Reading symbols from test2...done.
(gdb) b 19
Breakpoint 1 at 0x4006ac: file test2.c, line 19.
(gdb) r
Starting program: /home/spiderzz/Desktop/pwn/malloc/test2 

Breakpoint 1, main () at test2.c:19
19          malloc(0xB0-8);
(gdb) heap
===================================Heap Dump===================================

Arena(s) found:
         arena @ 0x7ffff7dd1b20

# Unsorted bin 是雙向環鏈表, 這里需要觀察, 雙向環鏈表的兩個端點 chunk 的 FD 和 BK 的地址不同之處, 因為一個在 libc 的空間, 一個在 heap 的空間.
(gdb) heap -l
===================================Heap Dump===================================

          ADDR             SIZE         STATUS
sbrk_base 0x602000
chunk     0x602000         0x90         (F) FD 0x7ffff7dd1b78 BK 0x6020b0 
chunk     0x602090         0x20         (inuse)
chunk     0x6020b0         0x90         (F) FD 0x602000 BK 0x602160 
chunk     0x602140         0x20         (inuse)
chunk     0x602160         0xa0         (F) FD 0x6020b0 BK 0x7ffff7dd1b78 
chunk     0x602200         0x20         (inuse)
chunk     0x602220         0x20de0      (top)
sbrk_end  0x602001
(gdb) heap -b
===================================Heap Dump===================================

unsorted bin @ 0x7ffff7dd1b88
        free chunk @ 0x602160 - size 0xa0

        free chunk @ 0x6020b0 - size 0x90

        free chunk @ 0x602000 - size 0x90
# 這個也就是返回的 fake chunk 的地址, 這地址其實就是 bin_at 的返回值
(gdb) p *(mfastbinptr)0x7ffff7dd1B78
$1 = {prev_size = 6300192, size = 0, fd = 0x602160, bk = 0x602000, fd_nextsize = 0x7ffff7dd1b88 <main_arena+104>, bk_nextsize = 0x7ffff7dd1b88 <main_arena+104>}         
(gdb) n
20          printf("m1: %p\n", m1);
# 這里需要理解 Bins 的 FD 和 BK.
(gdb) heap -l
===================================Heap Dump===================================

          ADDR             SIZE         STATUS
sbrk_base 0x602000
chunk     0x602000         0x90         (F) FD 0x7ffff7dd1bf8 BK 0x6020b0 
chunk     0x602090         0x20         (inuse)
chunk     0x6020b0         0x90         (F) FD 0x602000 BK 0x7ffff7dd1bf8 
chunk     0x602140         0x20         (inuse)
chunk     0x602160         0xa0         (F) FD 0x7ffff7dd1c08 BK 0x7ffff7dd1c08 (LC)
chunk     0x602200         0x20         (inuse)
chunk     0x602220         0xb0         (inuse)
chunk     0x6022d0         0x20d30      (top)
sbrk_end  0x602001
# 這里需要理解 Unsorted bin 是如何變為 small bin
(gdb) heap -b
===================================Heap Dump===================================

small bin 9 @ 0x7ffff7dd1c08
        free chunk @ 0x6020b0 - size 0x90

        free chunk @ 0x602000 - size 0x90
small bin 10 @ 0x7ffff7dd1c18
        free chunk @ 0x602160 - size 0xa0
# bin_at 的返回, 需要聯合上面的兩條命令返回的結果一起理解
(gdb) p *(mfastbinptr)0x7ffff7dd1BF8 
$3 = {prev_size = 140737351850984, size = 140737351850984, fd = 0x6020b0, bk = 0x602000, fd_nextsize = 0x602160, bk_nextsize = 0x602160}
# bin_at 的返回, 需要聯合上面的兩條命令返回的結果一起理解
(gdb) p *(mfastbinptr)0x7ffff7dd1C08
$2 = {prev_size = 6299824, size = 6299648, fd = 0x602160, bk = 0x602160, fd_nextsize = 0x7ffff7dd1c18 <main_arena+248>, bk_nextsize = 0x7ffff7dd1c18 <main_arena+248>}

上面提到如何使用 Bins 泄露 libc 和 heap 的地址, 這一部分其實在 Phrack 的 <Advanced Doug Lea's malloc exploits> 的 4.5 Abusing the leaked information 一小部分有提到. 可以通過 "find a lonely chunk in the heap" 去泄露, 相當于上面例子中的 m3, 位于 small bin 10, 釋放后會修改 FD, BK 為該 Bin 的地址, 進而泄露 libc 的地址. 還有一種方法就是 "find the first or last chunk of a bin", 相當于上面例子中的 m1, m2, 釋放后, 會造成 FD 和 BK 一個在 ptr_2_libc's_memory, 一個在 ptr_2_process'_heap.

下面說明如何使用一個 lonely chunk, 拿到關鍵函數的地址, 在 <Advanced Doug Lea's malloc exploits> 中使用的是 __morecore 這個函數指針, 它指向 __default_morecore, 也就是系統用于增加內存的函數, 默認為 brk(), 這里簡單提一下.

這里直接使用上面的 m3 作為例子舉例, m3 在釋放后變為 lonely chunk, 位于 small bin 10

#0.這里已知該 chunk 所在 bin 的地址 (t0 = 0x7ffff7dd1c08+0x10)(對于為什么需要加 0x10, 是因為 fake chunk, 具體參考上面)
#1.根據 chunk 的 size, 取得對應 bin index, 這里其實也就是 10, 可以查看 bin_index 宏, 查看對應具體實現
#2.根據 bin index, 獲取到該 bin 與 main_arena 的地址差, 從而獲得 main_arena 的地址.
t0 = 0x7ffff7dd1c08 + 0x10
t1 = (long)&main_arena.bins - (long)&main_arena
t2 = (long)&__morecore - (long)&(main_arena)
t3 = (10-1)*2*8 //至于為什么這么算, 請參考源碼 bin_at 宏
&main_arena = t0 - (t3+t1) = 0x7ffff7dd1b20
#3.根據 _morecore 與 main_arena 的地址差, 得到 _morecore 的地址
&__morecore = &main_arena + t2

整個過程用一句話表示 "Using the known size of the chunk, we know in which bin it was placed, so we can get main_arena's address and, finally, __morecore.", 具體的過程也就是上面寫的.

aa4bmo 'almost arbitrary 4 bytes mirrored overwrite' (任意 4 字節寫)

很多情況下 aa4bmo 是由于 chunk overlap ( chunk corruption ) 導致的.

對于 aa4bmo, 這一塊在另一篇《PWN之堆觸發》有完善的介紹和總結.

參考資料

大部分參考資料都在 `refs` 下

#libc源碼, 本文的核心
http://www.eglibc.org/cgi-bin/viewvc.cgi/branches/eglibc-2_19/

#非常詳細, 本文的核心
<refs/glibc內存管理ptmalloc源代碼分析.pdf>

#這篇文章也不錯, 很全面  
http://tyrande000.how/2016/02/20/linux%E4%B8%8B%E7%9A%84%E5%A0%86%E7%AE%A1%E7%90%86/

#阿里聚安全, 只講了及基本的數據結構, 對于具體的分配, 回收算法沒有涉及到
https://jaq.alibaba.com/community/art/show?spm=a313e.7916648.0.0.ZP7WcS&articleid=315

#很多人引用了這篇文章, 關于堆布局的圖都是采用這篇文章里的
https://sploitfun.wordpress.com/2015/02/10/understanding-glibc-malloc/comment-page-1/?spm=a313e.7916648.0.0.H9xzd9

#Phrack
#這篇文章很值的讀, 雖然里面的一些技術不再適用, 但是其中的一些理念很不錯, 比如其中關于如何利用爆破的方法繞過ASLR, 如何跟蹤內存分配, 如何打印堆布局, 如何利用堆泄露關鍵信息.
<Advanced Doug Lea's malloc exploits>

#這篇文章主要講 malloc 原理, 與現在的 glibc 版本有較大差異, 后一部分不建議看, 但是前面一部分舉了一個例子, 如何利用 off-by NUL 的 bug, 總的來說應該算是 chunk corruption, 去完成 a carefully crafted fake chunk, 最終實現 aa4bmo.
<[Phrack]Vudo malloc tricks.pdf>

#glibc的調試相關
http://blog.chinaunix.net/uid-24774106-id-3526766.html
http://blog.chinaunix.net/uid-24774106-id-3642925.html
http://stackoverflow.com/questions/10000335/how-to-use-debug-version-of-libc

Paper 本文由 Seebug Paper 發布，如需轉載請注明來源。本文地址：http://www.bjnorthway.com/255/

Paper - 安全技術精粹

PWN之堆內存管理

目錄

分析堆的相關工具

堆內存分配(ptmalloc設計)的思考

0. 為什么需要 ptmalloc

1. Bins

2. Top

3. Fastbins

4. Unsorted bin

5. last_remainder

chunk 結構

chunk 結構

邊界標示

空間復用

空閑容器(緩存)

Bins

small bins, large bins

Fastbins

unsorted bin

top chunk

mmaped chunk

Last remainder

malloc_state

_int_malloc() 分析

_int_free() 分析

特殊的分配情況舉例說明

exploit 在 ptmalloc 中

Leak Information (泄露關鍵信息)

aa4bmo 'almost arbitrary 4 bytes mirrored overwrite' (任意 4 字節寫)

參考資料

暫無評論