1. Introduction
----------------
Current version of Linux Kernel Runtime Guard (LKRG) runs and tries to 
protected kernel from the same TrustLevel. This makes LKRG bypassable by design 
with a relatively high cost, but still provides a good security layer for a 
running kernel. 
There are a few reasons why LKRG was not implemented in more trusted level 
(e.g. hypervisor) from the beginning:

 a) Nature of LKRG requires a heavy synchronization with kernel itself. E.g. if 
we want to poke specific page in the kernel memory, we must be sure when it is 
safe to do so:
    - Is this physical page in the kernel's working set?
    - What is his state?
    - Is this page not in the middle of reclaiming algorithm?
    - If attributes of the page allow safe access to it?
    - Is this page not being actively modified by other thread? 
    - More...
 b) Regardless of where LKRG is implemented, some of the problems remains the 
same, like the one mentioned above.
 c) Similar problem might apply to any dynamic and shared resources which LKRG 
monitors / protects.
 d) Currently, there is a lack of "standardization" of hypervisor "world" in 
Linux environment, comparing to other modern OS. That's a "natural implication" 
of open-source afford and results of the problems like which hypervisor LKRG 
can expect from the user? KVM? Xen? VirtualBox? VmWare? Custom? 
 e) "Closed" platforms don't have Linux-like problems. They usually run on a 
very well defined environment, well defined hardware without various variation 
etc. That kind of environment favors "standardization" and allows to implement 
less relaxed restrictions and more targeted validation.
 f) Not every user runs Linux box with access to the hypervisor. This naturally 
limits user-base of the project. If someone buy random VPS and install Linux 
there, most-likely they can't run hypervisor-based security solutions. Some 
mitigation for that might be a nested virtualization but it is very rarely 
supported.
 g) Hypervisor-based solution goes against mass deployment (same as any 
"kernel-path" solution). LKRG can be massively deployed in the same way as any 
other random packet in the system.
 j) Some of the servers / machines can't be rebooted (rebootless) and they will 
never benefit / install security solution which requires modification of 
hypervisor

Nevertheless, from the beginning of the project better self-protection was in 
scope. When LKRG will be more mature and stable, it will be possible to add 
"ring -1" extension for LKRG which will guard Normal World (NW) LKRG. In that 
case LKRG would be able to run in default mode as it is now, but if "ring -1" 
is available, LKRG will insert Trusted component (TZ-LKRG) into "ring -1" with 
necessary assists which will allow to implement strong protection of LKRG and 
kernel itself.

2. Enhanced LKRG
-----------------
I think it might be a right time to start working on a stronger version of LKRG.
Current version of LKRG is relatively stable and mature enough. It is possible 
to close weaknesses in a current model by introducing Trusted component (let's 
call it TZ-LKRG agent). TZ-LKRG can be a very simple and minimalistic and still 
provide strong protection for LKRG in Normal World (NW).
TZ-LKRG might export only 2 functionalities to be effective:
 a) Mark page to be Read-Only (RO) - Hypervisor (x86) or TrustZone (ARM) will 
enforce PTEs attributes and guarantee that page in NW will never be writable
 b) Update data in the RO page - new secure protocol of communication between 
LKRG and TZ-LKRG should be designed and implemented (with Sequence numbers). 
Protocol itself should be done in a way that it makes difficult to call into 
TZ-LKRG not from LKRG itself. E.g. proposed SQN numbers can be a secret shared 
between communication which is dynamically rotating for every call.

Using these 2 primitives, LKRG might be a very difficult barrier for the 
attacker for both scenarios, Runtime kernel CI enforcement as well as Exploit 
Detect / Protection feature.

2.1. Proposed protection
------------------------
During initialization, LKRG calls into TZ-LKRG and request to mark entire 
LKRG's .text section to be true-RO, as well as other .ro* sections. This will 
guarantee that attacker with full kernel primitives would never be able to 
tamper LKRG's execution logic. From the functionality perspective it is 
equivalent to the HVCI under Windows with enabled VSM, but just for one 
component - LKRG itself.
To prevent LKRG's data corruption, whenever new page needs to be allocated, 
LKRG will ask TZ-LKRG to enforce true-RO attributes on this page. To be able to 
write-down any data in that page, LKRG would send what needs to be written to 
the TZ-LKRG via assists 2 and TZ-LKRG will write in down on behalf of LKRG.
In such architecture LKRG will be impossible to corrupt from the NW even by the 
attacker with full kernel primitives.
In such environment attacker would need to find another bug in hypervisor / 
TrustZone itself to bypass LKRG.

2.1.1. Details
--------------

a. Initialization
  1. During initalization generate a new dynamic random secret (e.g. SQN number)
     kept in the per-cpu data
  2. Find what is the PA/PFN of the page where integrity_validation() routine
     lives. Make sure that Runtime CI's and ED's validation routine are on the
     same physical page
  3. Request to mark entire LKRG's .text section to be true-RO, as well as other
     .ro* sections. Share integrity_validation()'s PA/PFN together with newly
     generated secret
  4. Secret is used during communication between TZ-LKRG <---> LKRG and is being
     changed during every call. Secret will be in obfuscated form e.g.
        secret = hash(SQN || PA/PFN of integrity_validation())
  5. Any page marked as RO during initialization, can never be modified from the
     normal world, even if such a request is comming from the LKRG itself

*b. New kmem_cache_alloc* must be implemented which will communicate with TZ-LKRG
    everytime when new RO-only page is needed
b. Dynamic RO-only page allocation:
  b1. Allocate page from the new kmem_cache_alloc* interface:
   1a. Request TZ-LKRG for the new RO-only page. TZ-LKRG write-down PFN of that
       page and enforces RO attributes. TZ-LKRG maintain own database which keeps
       tracking of all RO-only allocated pages
   1b. On the end of the page where is integrity_validation() routine will be padding
       where array of PFNs allocated by TZ-LKRG will be snapshoted. Or address
       of the RO-page where that array is being kept by TZ-LKRG
   1c. TZ-LKRG knows where is the page with integrity_validation() and knowns and
       maintains the array of that PFNs
   1d. LKRG during CI validation can compare if any PFN from the arrays is matching
       any allocation which was done by LKRG. If there is anything extra, it can call
       panic()

  b2. Update data on RO-only page: hypercall(hash,VA_RO,what_to_write,length)
   2a. Does LKRG's integrity_validation() routine PFN page's hash is expected one?
       If yes, it's OK, otherwise panic() - protects against PFN remapping attacks.
   2b. Does GET_PFN(VA_RO) is in the array at the end of integrity_validation() PFN
       (or at the page where it refers to)? If yes, it's OK, otherwise panic().
       It protects against 'proxy' attack where attacker 'asks' TZ-LKRG to do
       arbitrary Write on behalf of attacker on a random chunk of memory.
   2c. TZ-LKRG will do full stack walk and "knows" that LKRG can only do "write" to
       RO-page from function X (offfset doesn't matter). Function X at frame-1,
       should be called from function Y or Z. Frame-2 should have function A or B.
       If yes, it's OK, otherwise panic(). It protects from the situation where attacker
       could do arbitrary hypercall. If attacker executes ROP for hypercall, stack_walk
       validation would not match.
   2d. Verify if CPU which generated hypercall is in the middle of executing *kprobes.
       LKRG does RO-page update only from the *kprobe hook which he placed. If yes,
       it's OK, otherwise panic(). It protects from the situation where attacker would
       'fake' stack layout on some thread in such a way that it will pass verification
       described at point 2c. Example:
         2.d.1. Attacker creates process X and force it to do nothing
         2.d.2. Attacker using read primitive, find X's task_struct in memory
         2.d.3. From task_struct attacker find stack.
         2.d.4. Attacker overwrite stack layout in such a way that verification at 2c.
                is passed
         2.d.5. Last IP on the stack points to the ROP gadget which invokes TZ-LKRG's
                hypercall for RO-page update
         2.d.6. Attacker wakes up process X
       In such case verification at 2d. should catch that and panic()
    2e. LKRG before invoking hypercall, temporarily disables Debug Register Unit.
        After hypercall, LKRG will reenable Debug Register Unit.

2.1.2. Security Weaknesses
--------------------------
1. Even attacker can't modify RO-page, it can modify stack of the thread running
   integrity_validation() routine. If integrity_validation() propagates some 'error'
   or 'corruption', attacker have a race window to overwrite that information and
   fake it as 'success'. It is difficult attack but possible.
2. Verification at point 2e. is to protect unexpected 'redirection' during normal
   execution of LKRG's logic which might influance e.g. stack variables. There might
   be other methods doing similar attacks. All of them should be addressed.

2.1.3. Potential improvement ideas
----------------------------------
1. Randomize the TZ/Hypercall calling convention then making general purpose ROP chain
   to invoke gadget will be difficult. E.g. attacker needs to generate 100s of ROP
   chains for each combination.
2. Make sure that TZ/Hypercall arguments are only handled in registers and are never
   put into the stack since it will open a race window for arguments overflow.

2.2. Problems
-------------
Architecture of enhanced LKRG must be designed in such a way that if NW somehow 
crashes or restarts (e.g. via kexec) LKRG's RO pages would be safely reclaimed.
Another problem is performance impact. It needs to be carefully measured and 
impact should be as minimal as possible.

2.3. Work
---------
We might start implementation of TZ-LKRG as a modification of KVM. Two new 
hypercalls can be implemented to provide described assists which NW-LKRG can 
leverage for better protection. If NW-LKRG code is refactored in such a way 
that can consume new KVM hypercalls, then it doesn't matter how TZ-LKRG will be 
implemented. KVM can be a starting point but new code will be flexible enough 
to consume any variation of TZ-LKRG. It can run as a extension to hypervisor 
(x86) or as a OP-TEE app in TrustZone (ARM).


Thanks,
Adam