Steffen Kaiser put forth on 3/16/2010 5:16 AM:
The "paste" talks about data. You talk about code.
The author's use of the word "data" in that sentence was obviously not in the programmatic context but in the bit bucket context. In many contexts, the word "data" can properly describe the entire contents of RAM, ROM, disk, SSD, thumb drive, tape, etc storage without regard for distinction of binary code or application data. Often "data" merely means contents. Look up the word "data" in the dictionary, and only one of the many meanings might state "the contents of a computer program's variables". You're looking at the word "data" too narrowly, in the wrong context.
If I display a process map of an active imap process, I have: mapped: 2656K writeable/private: 460K shared: 300K Essentially you are talking about 460KB of KSM-able data.
Looking at top and pmap data, at bare minimum, 900KB of code is fully duplicated in each imap process and candidate for KSM de-duplication. I'm guessing there may be more code that's duplicated as well, some of the .so's maybe?
IMHO, all modern Unix-alike system just "map" code pages read-only into the process and share them among all processes. No KSM required at all.
If this were the case with Linux then the KSM project would have never been initiated? You attribute far too much intelligence to the stock *nix kernels WRT memory management, specifically de-duplication. AFAIK none of them in a stock configuration do what your humble opinion suggests.
-- Stan