[svsm-devel] X86-64 Syscall API/ABI Document

Dong, Chuanxiao chuanxiao.dong at intel.com
Fri Jun 21 10:35:03 CEST 2024


Hi Joerg,

Thank you for sharing the design document! It has greatly enhanced our understanding of the user mode design philosophy. Here are some questions and comments from our side:

THREAD_CREATE:
a) From the input parameter, seems it only supports creating thread on the current CPU. Is there a plan to allow creating thread on a remote CPU? Or we should avoid this case?
b) Is there any plan to add CPU affinity to threads?

BATCH:
a) Is the user_data in SysCallIn an identifier, to pair the SysCallOut and SysCallIn which have the same value for the user_data? Would like to better understand the usage of the user_data.

OPEN_PHYSMEM:
a) Suppose the SVMS kernel memory range is excluded, right?
b) Is there any mechanism to let the user-space process know the valid memory range represented by the object handle? E.g., the end of the memory addresses

CAPABILITIES:
a) For SEV, VMPL0 is svsm and VM starts from VMPL1, and for TDX, vm-id 0 is svsm and VM starts from vm-id 1. From this point of view, looks like SEV and TDX are similar. So maybe can use a unified VM index bitmap format for them?

VM_OPEN:
a) If one VM index is opened by a user-space process, the same index cannot be opened again by another user-space process which guarantees that only one process can operate with this VM object handle. But the all threads created by this user-space process can operate with this VM object handle. Is this a correct understanding?

VM_CAPABILITIES:
a) Should report the VM memory range? Or always assume VM can see the entire range supported by the ObjHandle created by OPEN_PHYSMEM?

MMIO/IOIO_EVENT:
a) Is there any way to know the length of the mappable area provided by the evt_obj? Or the length is some fixed size defined by the kernel mode?
b) Would like to have a better understanding of the usage. We assume that, the MMIO/IOIO range set by the syscalls are monitored by some device model thread in the user space. When VM occurs vmexit due to MMIO/IOIO accessing, the kernel vcpu thread decodes the MMIO/IOIO address and if the address is located 	in the range set by these syscalls, the kernel vcpu thread wakes up the monitoring device model thread and then waits. The device model thread in the user mode will get the address/data via the mappable area and do emulation. After the emulation is completed, the device model thread wakes up the kernel vcpu thread. If the decoding part is not in the kernel-space but in the user-space, then the vcpu should back to the user mode to decode instruction and wait to be waked up by the device model thread. Is this the expected flow for handling MMIO/IOIO in the user mode?

SET_MEM_STATE:
a) Any consideration for using paddr + page_size instead of start_paddr + end_paddr as the input parameters? Using start_paddr + end_paddr may be more efficient to set state for a memory region.

VCPU_CREATE:
a) Regarding "The VCPU can be run with the WAIT_FOR_EVENT() system call",  the expectation is that the WAIT_FOR_EVENT(vcpu_obj) will perform vmenter in the kernel mode, is this understanding correct?

GET/SET_STATE:
a) The type struct VmsaGpa is defined as AMD only. Does this mean the user mode should have knowledge about if the platform is SEV or TDX? Or there will be a per-architecture user mode VMM executable binary?
b) For TDX, all the VCPU states are stored in the VMCS. Probably need to introduce a new type for TDX to set/get vcpu state:
	Type		Data Structure		Description
	--------------------------------------------------------------------------------------------------------
	VMCS		Struct VmcsField	The VMCS field encoding and the
						corresponding data (Intel Only)
	Struct VmcsField {
		encoding: u32,		// VMCS field encoding
		data: u64,		// VMCS field data
	}
	GET/SET_STATE syscall can be batched so that can set/get multiple VCPU states by one time.
	c) The vLAPIC is emulated in the kernel mode. In this case, is it necessary to allow user mode get the vLAPIC state? The potential user case is for the user mode to emulate vCPUID 01 EBX which requires the vLAPIC ID.

Possible new syscalls for Class0: to help the user mode to execute some privilege instructions:
a) The user mode may need to read/write L1's MSR to emulate the vMSR for VM, but rdmsr/wrmsr instructions are not allowed in the user mode for both SEV and TDX. The RDMSR/WRMSR syscalls can help the user mode to achieve this:
	ACCESS_MSR
	Read data from a MSR, or write data to a MSR
	System Call number: 11

	Parameters
	-------------------------------------------------------------------------------------------------
	Parameter	Type		Description
	-------------------------------------------------------------------------------------------------
	1. MSR index	u32		The MSR to access
	2. read		bool		Indicate if this is a MSR read or write.
	2. data		VirtAddr	The virtual address to store the data
					reading from MSR, or the data written to
					the MSR.
	
	Returns  0 on success, error code on failure
	
b) The user mode may need to access some MMIO or IO port emulated by L0 to emulate certain MMIO/IOIO event for VM. The accessing in user mode can be done via MOV/IOIO instructions, but it can trigger #VC/#VE from the user mode and requires instruction decoding/emulating. To simplify, the MMIO/IOIO accessing can be done via the enlighten way (VMGEXIT for SEV and TDCALL for TDX). But for TDX, the TDCALL is privileged command which is not allowed in the user mode. So it has to be done in the kernel mode. If using the enlighten way for the user mode to access MMIO/IOIO is preferred, then new syscalls are necessary for TDX. Shall we?
	ACCESS_IO
	Read data from an IO port, or write data to an IO port.
	System Call number: 12
	
	Parameters
	-------------------------------------------------------------------------------------------------
	Parameter	Type		Description
	-------------------------------------------------------------------------------------------------
	1. port		u16		The IO port to access.
	2. size		u8		The IO port size. Only 1(u8)/2(u16)/4(u32)
					are supported.
	3. read		bool		Indicate if this is an IO read or write.
	4. data		VirtAddr	The virtual address to store the data
					reading from the IO port, or the data written
					to the IO port.
	
	Returns  0 on success, error code on failure
		
	ACCESS_MMIO
	Read a MMIO, or write data to a MMIO.
	System Call number: 13
	
	Parameters
	-------------------------------------------------------------------------------------------------
	Parameter	Type		Description
	-------------------------------------------------------------------------------------------------
	1. paddr	u64		The MMIO address to access.
	2. size		u8		The MMIO size. Only 1(u8)/2(u16)/4(u32)/
					8(u64) are supported.
	3. read		bool		Indicate if this is a MMIO read or write.
	4. data		VirtAddr	The virtual address to store the data
					reading from the MMIO, or the data written
					to the MMIO
	
	Returns  0 on success, error code on failure
	
Questions for ObjHandle:
a) Although the ObjHandle seems a common type, a given ObjHandle should only be used as an input for specific syscalls. For example, the ObjHandle returned by OPEN cannot be used as input for the VM_CAPABILITIES. Is this a correct understanding?
b) If a) is true, sounds like for a given ObjHandle, the user should know how the ObjHandle is created, otherwise the it is hard to distinguish which syscalls can use this ObjHandle as an input?

Thanks
Chuanxiao

> -----Original Message-----
> From: Svsm-devel <svsm-devel-bounces at coconut-svsm.dev> On Behalf Of J?rg R?del
> Sent: Wednesday, June 12, 2024 8:24 PM
> To: svsm-devel at coconut-svsm.dev
> Subject: [svsm-devel] X86-64 Syscall API/ABI Document
> 
> Hi,
> 
> as promised here is an initial draft of the Syscall document I worked on. It is not ready yet, lacks some
> details and is definitely subject to change.
> 
> But I think the general ideas are in there already and it is a good enough base to define TDX and SNP
> specific VM management APIs on top of it.
> 
> So please review the document and provide your feedback. If time allows we can have some initial
> discussion around it in todays meeting. A more general discussion has to wait for the next meeting,
> when everyone had time to review it.
> 
> Regards,
> 
> 	Joerg


More information about the Svsm-devel mailing list