1. Introduction ---------------- Current version of Linux Kernel Runtime Guard (LKRG) runs and tries to protected kernel from the same TrustLevel. This makes LKRG bypassable by design with a relatively high cost, but still provides a good security layer for a running kernel. There are a few reasons why LKRG was not implemented in more trusted level (e.g. hypervisor) from the beginning: a) Nature of LKRG requires a heavy synchronization with kernel itself. E.g. if we want to poke specific page in the kernel memory, we must be sure when it is safe to do so: - Is this physical page in the kernel's working set? - What is his state? - Is this page not in the middle of reclaiming algorithm? - If attributes of the page allow safe access to it? - Is this page not being actively modified by other thread? - More... b) Regardless of where LKRG is implemented, some of the problems remains the same, like the one mentioned above. c) Similar problem might apply to any dynamic and shared resources which LKRG monitors / protects. d) Currently, there is a lack of "standardization" of hypervisor "world" in Linux environment, comparing to other modern OS. That's a "natural implication" of open-source afford and results of the problems like which hypervisor LKRG can expect from the user? KVM? Xen? VirtualBox? VmWare? Custom? e) "Closed" platforms don't have Linux-like problems. They usually run on a very well defined environment, well defined hardware without various variation etc. That kind of environment favors "standardization" and allows to implement less relaxed restrictions and more targeted validation. f) Not every user runs Linux box with access to the hypervisor. This naturally limits user-base of the project. If someone buy random VPS and install Linux there, most-likely they can't run hypervisor-based security solutions. Some mitigation for that might be a nested virtualization but it is very rarely supported. g) Hypervisor-based solution goes against mass deployment (same as any "kernel-path" solution). LKRG can be massively deployed in the same way as any other random packet in the system. j) Some of the servers / machines can't be rebooted (rebootless) and they will never benefit / install security solution which requires modification of hypervisor Nevertheless, from the beginning of the project better self-protection was in scope. When LKRG will be more mature and stable, it will be possible to add "ring -1" extension for LKRG which will guard Normal World (NW) LKRG. In that case LKRG would be able to run in default mode as it is now, but if "ring -1" is available, LKRG will insert Trusted component (TZ-LKRG) into "ring -1" with necessary assists which will allow to implement strong protection of LKRG and kernel itself. 2. Enhanced LKRG ----------------- I think it might be a right time to start working on a stronger version of LKRG. Current version of LKRG is relatively stable and mature enough. It is possible to close weaknesses in a current model by introducing Trusted component (let's call it TZ-LKRG agent). TZ-LKRG can be a very simple and minimalistic and still provide strong protection for LKRG in Normal World (NW). TZ-LKRG might export only 2 functionalities to be effective: a) Mark page to be Read-Only (RO) - Hypervisor (x86) or TrustZone (ARM) will enforce PTEs attributes and guarantee that page in NW will never be writable b) Update data in the RO page - new secure protocol of communication between LKRG and TZ-LKRG should be designed and implemented (with Sequence numbers). Protocol itself should be done in a way that it makes difficult to call into TZ-LKRG not from LKRG itself. E.g. proposed SQN numbers can be a secret shared between communication which is dynamically rotating for every call. Using these 2 primitives, LKRG might be a very difficult barrier for the attacker for both scenarios, Runtime kernel CI enforcement as well as Exploit Detect / Protection feature. 2.1. Proposed protection ------------------------ During initialization, LKRG calls into TZ-LKRG and request to mark entire LKRG's .text section to be true-RO, as well as other .ro* sections. This will guarantee that attacker with full kernel primitives would never be able to tamper LKRG's execution logic. From the functionality perspective it is equivalent to the HVCI under Windows with enabled VSM, but just for one component - LKRG itself. To prevent LKRG's data corruption, whenever new page needs to be allocated, LKRG will ask TZ-LKRG to enforce true-RO attributes on this page. To be able to write-down any data in that page, LKRG would send what needs to be written to the TZ-LKRG via assists 2 and TZ-LKRG will write in down on behalf of LKRG. In such architecture LKRG will be impossible to corrupt from the NW even by the attacker with full kernel primitives. In such environment attacker would need to find another bug in hypervisor / TrustZone itself to bypass LKRG. 2.1.1. Details -------------- a. Initialization 1. During initalization generate a new dynamic random secret (e.g. SQN number) kept in the per-cpu data 2. Find what is the PA/PFN of the page where integrity_validation() routine lives. Make sure that Runtime CI's and ED's validation routine are on the same physical page 3. Request to mark entire LKRG's .text section to be true-RO, as well as other .ro* sections. Share integrity_validation()'s PA/PFN together with newly generated secret 4. Secret is used during communication between TZ-LKRG <---> LKRG and is being changed during every call. Secret will be in obfuscated form e.g. secret = hash(SQN || PA/PFN of integrity_validation()) 5. Any page marked as RO during initialization, can never be modified from the normal world, even if such a request is comming from the LKRG itself *b. New kmem_cache_alloc* must be implemented which will communicate with TZ-LKRG everytime when new RO-only page is needed b. Dynamic RO-only page allocation: b1. Allocate page from the new kmem_cache_alloc* interface: 1a. Request TZ-LKRG for the new RO-only page. TZ-LKRG write-down PFN of that page and enforces RO attributes. TZ-LKRG maintain own database which keeps tracking of all RO-only allocated pages 1b. On the end of the page where is integrity_validation() routine will be padding where array of PFNs allocated by TZ-LKRG will be snapshoted. Or address of the RO-page where that array is being kept by TZ-LKRG 1c. TZ-LKRG knows where is the page with integrity_validation() and knowns and maintains the array of that PFNs 1d. LKRG during CI validation can compare if any PFN from the arrays is matching any allocation which was done by LKRG. If there is anything extra, it can call panic() b2. Update data on RO-only page: hypercall(hash,VA_RO,what_to_write,length) 2a. Does LKRG's integrity_validation() routine PFN page's hash is expected one? If yes, it's OK, otherwise panic() - protects against PFN remapping attacks. 2b. Does GET_PFN(VA_RO) is in the array at the end of integrity_validation() PFN (or at the page where it refers to)? If yes, it's OK, otherwise panic(). It protects against 'proxy' attack where attacker 'asks' TZ-LKRG to do arbitrary Write on behalf of attacker on a random chunk of memory. 2c. TZ-LKRG will do full stack walk and "knows" that LKRG can only do "write" to RO-page from function X (offfset doesn't matter). Function X at frame-1, should be called from function Y or Z. Frame-2 should have function A or B. If yes, it's OK, otherwise panic(). It protects from the situation where attacker could do arbitrary hypercall. If attacker executes ROP for hypercall, stack_walk validation would not match. 2d. Verify if CPU which generated hypercall is in the middle of executing *kprobes. LKRG does RO-page update only from the *kprobe hook which he placed. If yes, it's OK, otherwise panic(). It protects from the situation where attacker would 'fake' stack layout on some thread in such a way that it will pass verification described at point 2c. Example: 2.d.1. Attacker creates process X and force it to do nothing 2.d.2. Attacker using read primitive, find X's task_struct in memory 2.d.3. From task_struct attacker find stack. 2.d.4. Attacker overwrite stack layout in such a way that verification at 2c. is passed 2.d.5. Last IP on the stack points to the ROP gadget which invokes TZ-LKRG's hypercall for RO-page update 2.d.6. Attacker wakes up process X In such case verification at 2d. should catch that and panic() 2e. LKRG before invoking hypercall, temporarily disables Debug Register Unit. After hypercall, LKRG will reenable Debug Register Unit. 2.1.2. Security Weaknesses -------------------------- 1. Even attacker can't modify RO-page, it can modify stack of the thread running integrity_validation() routine. If integrity_validation() propagates some 'error' or 'corruption', attacker have a race window to overwrite that information and fake it as 'success'. It is difficult attack but possible. 2. Verification at point 2e. is to protect unexpected 'redirection' during normal execution of LKRG's logic which might influance e.g. stack variables. There might be other methods doing similar attacks. All of them should be addressed. 2.1.3. Potential improvement ideas ---------------------------------- 1. Randomize the TZ/Hypercall calling convention then making general purpose ROP chain to invoke gadget will be difficult. E.g. attacker needs to generate 100s of ROP chains for each combination. 2. Make sure that TZ/Hypercall arguments are only handled in registers and are never put into the stack since it will open a race window for arguments overflow. 2.2. Problems ------------- Architecture of enhanced LKRG must be designed in such a way that if NW somehow crashes or restarts (e.g. via kexec) LKRG's RO pages would be safely reclaimed. Another problem is performance impact. It needs to be carefully measured and impact should be as minimal as possible. 2.3. Work --------- We might start implementation of TZ-LKRG as a modification of KVM. Two new hypercalls can be implemented to provide described assists which NW-LKRG can leverage for better protection. If NW-LKRG code is refactored in such a way that can consume new KVM hypercalls, then it doesn't matter how TZ-LKRG will be implemented. KVM can be a starting point but new code will be flexible enough to consume any variation of TZ-LKRG. It can run as a extension to hypervisor (x86) or as a OP-TEE app in TrustZone (ARM). Thanks, Adam