Keywords: KSM, Anonymous Pages, COW, madvise, MERGEABLE, UNMERGEABLE
Kernel Samepage Merging (KSM) is a Linux memory management feature designed to merge identical pages, optimizing memory usage in virtualized environments and other scenarios where duplicate content exists across processes.
How KSM Works
KSM enables the merging of identical anonymous pages—both within a single process and across different processes—without affecting application functionality. This is achieved by:
- Creating a single read-only copy of identical pages
- Freeing up the original physical pages
- Triggering Copy-On-Write (COW) when applications need to modify the merged page
Core Implementation Components
KSM's architecture consists of two main parts:
- The ksmd kernel thread that scans and merges pages when activated
- The madvise system call that triggers ksmd to begin its work
Userspace interaction flow:
madvise(addr, length, MADV_MERGEABLE) → __ksm_enter() → wakes ksmd
madvise(addr, length, MADV_UNMERGEABLE) → stops merging for specified pagesKSM Implementation Details
Page Eligibility
KSM specifically targets anonymous pages allocated by processes. These include:
- Heap allocations
- Stack memory
- Anonymous mmap() regions
Note: KSM does not merge:
- File-backed pages (page cache)
- Filesystem cache
- Kernel buffers (slab allocations)
Data Structures
KSM uses three primary data structures:
- struct rmap_item: Tracks reverse mappings for virtual addresses
- struct mm_slot: Represents process memory structures being scanned
- struct ksm_scan: Maintains current scanning state
Key relationships:
mm_slot → rmap_list → rmap_item
ksm_scan tracks current position in scanning processThe Merging Process
The ksmd thread follows this workflow:
- Scanning: Iterates through eligible memory regions
- Comparison: Checks page content against stable and unstable trees
- Merging: When matches found, creates merged pages and updates mappings
ksm_do_scan()
├── scan_get_next_rmap_item() # Finds next candidate page
└── cmp_and_merge_page() # Attempts to merge page
├── stable_tree_search() # Checks stable tree
├── try_to_merge_with_ksm_page()
├── unstable_tree_search_insert()
└── try_to_merge_two_pages()KSM vs Regular Anonymous Pages
Identification Differences
Anonymous pages and KSM pages are distinguished by their mapping flags:
| Page Type | Mapping Flags |
|---|---|
| Regular Anon | PAGE_MAPPING_ANON |
| KSM Page | PAGE_MAPPING_ANON + PAGE_MAPPING_KSM |
Address Calculation Differences
For regular anonymous pages:
- Virtual address calculated from page->index (offset in VMA)
For KSM pages:
- Virtual address stored in rmap_item->address
- Maintains correct mapping across different VMAs
👉 Learn more about Linux memory management techniques
Enabling and Configuring KSM
Kernel Configuration
KSM must be enabled via kernel config:
CONFIG_KSM=yRuntime Control via sysfs
KSM provides several tunable parameters in /sys/kernel/mm/ksm/:
| Parameter | Description | Default Value |
|---|---|---|
| pages_to_scan | Pages scanned per iteration | 100 |
| sleep_millisecs | Delay between scans | 20 |
| run | Controls KSM operation (0=off, 1=on) | 0 |
| merge_across_nodes | Allow merging across NUMA nodes | 1 |
Performance Considerations
Monitoring KSM Effectiveness
Key metrics to evaluate KSM performance:
pages_shared: Number of shared pages in stable treepages_sharing: Number of sites sharing pagespages_unshared: Pages removed from unstable tree
Performance Indicators:
- High
pages_sharing/pages_sharedratio indicates effective sharing - High
pages_unshared/pages_sharingratio suggests poor merging
FAQ
Q: Does KSM impact application performance?
A: KSM operates transparently with minimal overhead. The main cost comes from scanning pages, which is configurable via pages_to_scan and sleep_millisecs.
Q: When should I disable KSM?
A: Consider disabling KSM when:
- Memory is abundant
- Workloads have few duplicate pages
- The overhead of scanning outweighs memory savings
Q: Can KSM merge pages from different NUMA nodes?
A: Yes, unless merge_across_nodes is set to 0, which restricts merging to pages within the same NUMA node.
Q: How does KSM handle writes to merged pages?
A: Writing to a merged page triggers COW (Copy-On-Write), creating a private copy for the writing process while maintaining other mappings to the original page.
Q: What types of workloads benefit most from KSM?
A: Virtualized environments running multiple similar VMs see the most benefit, as do systems running multiple instances of the same application.
👉 Explore advanced memory optimization techniques
Best Practices
- Monitor effectiveness: Regularly check KSM statistics to ensure it's providing value
- Tune parameters: Adjust
pages_to_scanandsleep_millisecsbased on workload - Consider NUMA: For NUMA systems, evaluate
merge_across_nodessetting - Selective enabling: Use
madvise(MADV_MERGEABLE)for specific memory regions rather than system-wide