[svsm-devel] X86-64 Syscall API/ABI Document
Dong, Chuanxiao
chuanxiao.dong at intel.com
Fri Jun 21 10:35:03 CEST 2024
Hi Joerg,
Thank you for sharing the design document! It has greatly enhanced our understanding of the user mode design philosophy. Here are some questions and comments from our side:
THREAD_CREATE:
a) From the input parameter, seems it only supports creating thread on the current CPU. Is there a plan to allow creating thread on a remote CPU? Or we should avoid this case?
b) Is there any plan to add CPU affinity to threads?
BATCH:
a) Is the user_data in SysCallIn an identifier, to pair the SysCallOut and SysCallIn which have the same value for the user_data? Would like to better understand the usage of the user_data.
OPEN_PHYSMEM:
a) Suppose the SVMS kernel memory range is excluded, right?
b) Is there any mechanism to let the user-space process know the valid memory range represented by the object handle? E.g., the end of the memory addresses
CAPABILITIES:
a) For SEV, VMPL0 is svsm and VM starts from VMPL1, and for TDX, vm-id 0 is svsm and VM starts from vm-id 1. From this point of view, looks like SEV and TDX are similar. So maybe can use a unified VM index bitmap format for them?
VM_OPEN:
a) If one VM index is opened by a user-space process, the same index cannot be opened again by another user-space process which guarantees that only one process can operate with this VM object handle. But the all threads created by this user-space process can operate with this VM object handle. Is this a correct understanding?
VM_CAPABILITIES:
a) Should report the VM memory range? Or always assume VM can see the entire range supported by the ObjHandle created by OPEN_PHYSMEM?
MMIO/IOIO_EVENT:
a) Is there any way to know the length of the mappable area provided by the evt_obj? Or the length is some fixed size defined by the kernel mode?
b) Would like to have a better understanding of the usage. We assume that, the MMIO/IOIO range set by the syscalls are monitored by some device model thread in the user space. When VM occurs vmexit due to MMIO/IOIO accessing, the kernel vcpu thread decodes the MMIO/IOIO address and if the address is located in the range set by these syscalls, the kernel vcpu thread wakes up the monitoring device model thread and then waits. The device model thread in the user mode will get the address/data via the mappable area and do emulation. After the emulation is completed, the device model thread wakes up the kernel vcpu thread. If the decoding part is not in the kernel-space but in the user-space, then the vcpu should back to the user mode to decode instruction and wait to be waked up by the device model thread. Is this the expected flow for handling MMIO/IOIO in the user mode?
SET_MEM_STATE:
a) Any consideration for using paddr + page_size instead of start_paddr + end_paddr as the input parameters? Using start_paddr + end_paddr may be more efficient to set state for a memory region.
VCPU_CREATE:
a) Regarding "The VCPU can be run with the WAIT_FOR_EVENT() system call", the expectation is that the WAIT_FOR_EVENT(vcpu_obj) will perform vmenter in the kernel mode, is this understanding correct?
GET/SET_STATE:
a) The type struct VmsaGpa is defined as AMD only. Does this mean the user mode should have knowledge about if the platform is SEV or TDX? Or there will be a per-architecture user mode VMM executable binary?
b) For TDX, all the VCPU states are stored in the VMCS. Probably need to introduce a new type for TDX to set/get vcpu state:
Type Data Structure Description
--------------------------------------------------------------------------------------------------------
VMCS Struct VmcsField The VMCS field encoding and the
corresponding data (Intel Only)
Struct VmcsField {
encoding: u32, // VMCS field encoding
data: u64, // VMCS field data
}
GET/SET_STATE syscall can be batched so that can set/get multiple VCPU states by one time.
c) The vLAPIC is emulated in the kernel mode. In this case, is it necessary to allow user mode get the vLAPIC state? The potential user case is for the user mode to emulate vCPUID 01 EBX which requires the vLAPIC ID.
Possible new syscalls for Class0: to help the user mode to execute some privilege instructions:
a) The user mode may need to read/write L1's MSR to emulate the vMSR for VM, but rdmsr/wrmsr instructions are not allowed in the user mode for both SEV and TDX. The RDMSR/WRMSR syscalls can help the user mode to achieve this:
ACCESS_MSR
Read data from a MSR, or write data to a MSR
System Call number: 11
Parameters
-------------------------------------------------------------------------------------------------
Parameter Type Description
-------------------------------------------------------------------------------------------------
1. MSR index u32 The MSR to access
2. read bool Indicate if this is a MSR read or write.
2. data VirtAddr The virtual address to store the data
reading from MSR, or the data written to
the MSR.
Returns 0 on success, error code on failure
b) The user mode may need to access some MMIO or IO port emulated by L0 to emulate certain MMIO/IOIO event for VM. The accessing in user mode can be done via MOV/IOIO instructions, but it can trigger #VC/#VE from the user mode and requires instruction decoding/emulating. To simplify, the MMIO/IOIO accessing can be done via the enlighten way (VMGEXIT for SEV and TDCALL for TDX). But for TDX, the TDCALL is privileged command which is not allowed in the user mode. So it has to be done in the kernel mode. If using the enlighten way for the user mode to access MMIO/IOIO is preferred, then new syscalls are necessary for TDX. Shall we?
ACCESS_IO
Read data from an IO port, or write data to an IO port.
System Call number: 12
Parameters
-------------------------------------------------------------------------------------------------
Parameter Type Description
-------------------------------------------------------------------------------------------------
1. port u16 The IO port to access.
2. size u8 The IO port size. Only 1(u8)/2(u16)/4(u32)
are supported.
3. read bool Indicate if this is an IO read or write.
4. data VirtAddr The virtual address to store the data
reading from the IO port, or the data written
to the IO port.
Returns 0 on success, error code on failure
ACCESS_MMIO
Read a MMIO, or write data to a MMIO.
System Call number: 13
Parameters
-------------------------------------------------------------------------------------------------
Parameter Type Description
-------------------------------------------------------------------------------------------------
1. paddr u64 The MMIO address to access.
2. size u8 The MMIO size. Only 1(u8)/2(u16)/4(u32)/
8(u64) are supported.
3. read bool Indicate if this is a MMIO read or write.
4. data VirtAddr The virtual address to store the data
reading from the MMIO, or the data written
to the MMIO
Returns 0 on success, error code on failure
Questions for ObjHandle:
a) Although the ObjHandle seems a common type, a given ObjHandle should only be used as an input for specific syscalls. For example, the ObjHandle returned by OPEN cannot be used as input for the VM_CAPABILITIES. Is this a correct understanding?
b) If a) is true, sounds like for a given ObjHandle, the user should know how the ObjHandle is created, otherwise the it is hard to distinguish which syscalls can use this ObjHandle as an input?
Thanks
Chuanxiao
> -----Original Message-----
> From: Svsm-devel <svsm-devel-bounces at coconut-svsm.dev> On Behalf Of J?rg R?del
> Sent: Wednesday, June 12, 2024 8:24 PM
> To: svsm-devel at coconut-svsm.dev
> Subject: [svsm-devel] X86-64 Syscall API/ABI Document
>
> Hi,
>
> as promised here is an initial draft of the Syscall document I worked on. It is not ready yet, lacks some
> details and is definitely subject to change.
>
> But I think the general ideas are in there already and it is a good enough base to define TDX and SNP
> specific VM management APIs on top of it.
>
> So please review the document and provide your feedback. If time allows we can have some initial
> discussion around it in todays meeting. A more general discussion has to wait for the next meeting,
> when everyone had time to review it.
>
> Regards,
>
> Joerg
More information about the Svsm-devel
mailing list