This project provides a Nova-compatible compute driver for PowerVM systems.
The project aims to integrate into OpenStack’s Nova project. Initial development is occurring in a separate project until it has matured and met the Nova core team’s requirements. As such, all development practices should mirror those of the Nova project.
Documentation on Nova can be found at the Nova Devref.
The IBM PowerVM hypervisor provides virtualization on POWER hardware. PowerVM admins can see benefits in their environments by making use of OpenStack. This driver (along with a Neutron ML2 compatible agent and Ceilometer agent) provides the capability for operators of PowerVM to use OpenStack natively.
As ecosystems continue to evolve around the POWER platform, a single OpenStack driver does not meet all of the needs for the various hypervisors. The standard libvirt driver provides support for KVM on POWER systems. This nova driver provides PowerVM support to OpenStack environment.
This driver meets the following:
This driver makes the following use cases available for PowerVM:
To use the driver, install the nova-powervm project on your NovaLink-based PowerVM system. The nova-powervm project has a minimal set of configuration. See the configuration options section of the dev-ref for more information.
It is recommended that operators also make use of the networking-powervm project. The project ensures that the network bridge supports the VLAN-based networks required for the workloads.
There is also a ceilometer-powervm project that can be included.
Future work will be done to include PowerVM into the various OpenStack deployment models.
The driver enables the following:
The intention is that this driver follows the OpenStack Nova model.
The driver is being promoted into the nova core project in stages, the first of which is represented by blueprint powervm-nova-compute-driver. The coexistence of these two incarnations of the driver raises some Upgrade Considerations.
No REST API impacts.
No known security impacts.
No new notifications. The driver does expect that the Neutron agent will return an event when the VIF plug has occurred, assuming that Neutron is the network service.
The administrator may notice new logging messages in the nova compute logs.
The driver has a similar deployment speed and agility to other hypervisors. It has been tested with up to 10 concurrent deploys with several hundred VMs on a given server.
Most operations are comparable in speed. Deployment, attach/detach volumes, lifecycle, etc… are quick.
Due to the nature of the project, any performance impacts are limited to the Compute Driver. The API processes for instance are not impacted.
The cloud administrator will need to refer to documentation on how to configure OpenStack for use with a PowerVM hypervisor.
A ‘powervm’ configuration group is used to contain all the PowerVM specific configuration settings. Existing configuration file attributes will be reused as much as possible (e.g. vif_plugging_timeout). This reduces the number of PowerVM specific items that will be needed.
It is the goal of the project to only require minimal additional attributes. The deployer may specify additional attributes to fit their configuration.
The code for this driver is currently contained within a powervm project. The driver is within the /nova/virt/powervm_ext/ package and extends the nova.virt.driver.ComputeDriver class.
The code interacts with PowerVM through the pypowervm library. This python binding is a wrapper to the PowerVM REST API. All hypervisor operations interact with the PowerVM REST API via this binding. The driver is maintained to support future revisions of the PowerVM REST API as needed.
For ephemeral disk support, either a Virtual I/O Server hosted local disk or a Shared Storage Pool (a PowerVM clustered file system) is supported. For volume attachments, the driver supports Cinder-based attachments via protocols supported by the hypervisor (e.g. Fibre Channel).
For networking, the networking-powervm project provides Neutron ML2 Agents. The agents provide the necessary configuration on the Virtual I/O Server for networking. The PowerVM Nova driver code creates the VIF for the client VM, but the Neutron agent creates the VIF for VLANs.
Automated functional testing is provided through a third party continuous integration system. It monitors for incoming Nova change sets, runs a set of functional tests (lifecycle operations) against the incoming change, and provides a non-gating vote (+1 or -1).
Developers should not be impacted by these changes unless they wish to try the driver.
The intent of this project is to bring another driver to OpenStack that aligns with the ideals and vision of the community. The intention is to promote this to core Nova.
No alternatives appear viable to bring PowerVM support into the OpenStack community.
Prior to Ocata, only the out-of-tree nova_powervm driver existed. The in-tree driver is introduced in Ocata.
In Liberty and Mitaka, the namespace of the out-of-tree driver is
nova_powervm.virt.powervm
. In Newton, it was moved to
nova.virt.powervm
. In Ocata, the new in-tree driver occupies the
nova.virt.powervm
namespace, and the out-of-tree driver is moved to
nova.virt.powervm_ext
. Ocata consumers have the option of using the
in-tree driver, which will provide limited functionality until it is fully
integrated; or the out-of-tree driver, which provides full functionality.
Refer to the documentation for the nova.conf
settings required to load
the desired driver.
In order to use live migration prior to Ocata, it was necessary to run the
customized nova_powervm conductor to bring in the PowerVMLiveMigrateData
object. In Ocata, this object is included in core nova, so no custom conductor
is necessary.
Since the tempest tests should be implementation agnostic, the existing tempest tests should be able to run against the PowerVM driver without issue.
Tempest tests that require function that the platform does not yet support (e.g. iSCSI or Floating IPs) will not pass. These should be ommitted from the Tempest test suite.
A sample Tempest test configuration for the PowerVM driver has been provided.
Thorough unit tests exist within the project to validate specific functions within this implementation.
A third party functional test environment has been created. It monitors for incoming nova change sets. Once it detects a new change set, it will execute the existing lifecycle API tests. A non-gating vote (+1 or -1) will be provided with information provided (logs) based on the result.
Existing APIs should be valid. All testing is planned within the functional testing system and via unit tests.
See the dev-ref for documentation on how to configure, contribute, use, etc. this driver implementation.
The existing Nova developer documentation should typically suffice. However, until merge into Nova, we will maintain a subset of dev-ref documentation.
Warning
Please note, while this document is still being maintained, this is slowly being updated to re-group and classify features
When considering which capabilities should be marked as mandatory the following general guiding principles were applied
Status: optional.
CLI commands:
nova volume-attach <server> <volume>
Notes: The attach volume operation provides a means to hotplug additional block storage to a running instance. This allows storage capabilities to be expanded without interruption of service. In a cloud model it would be more typical to just spin up a new instance with large storage, so the ability to hotplug extra storage is for those cases where the instance is considered to be more of a pet than cattle. Therefore this operation is not considered to be mandatory to support.
Driver Support:
complete
Status: optional.
CLI commands:
nova volume-attach <server> <volume> [--tag <tag>]
Notes: Attach a block device with a tag to an existing server instance. See “Device tags” for more information.
Driver Support:
missing
Status: optional.
CLI commands:
nova volume-detach <server> <volume>
Notes: See notes for attach volume operation.
Driver Support:
complete
Status: optional.
CLI commands:
cinder extend <volume> <new_size>
Notes: The extend volume operation provides a means to extend the size of an attached volume. This allows volume size to be expanded without interruption of service. In a cloud model it would be more typical to just spin up a new instance with large storage, so the ability to extend the size of an attached volume is for those cases where the instance is considered to be more of a pet than cattle. Therefore this operation is not considered to be mandatory to support.
Driver Support:
partial
Notes: Not supported for rbd volumes.
Status: optional.
CLI commands:
nova interface-attach <server>
Notes: The attach interface operation provides a means to hotplug additional interfaces to a running instance. Hotplug support varies between guest OSes and some guests require a reboot for new interfaces to be detected. This operation allows interface capabilities to be expanded without interruption of service. In a cloud model it would be more typical to just spin up a new instance with more interfaces.
Driver Support:
complete
Status: optional.
CLI commands:
nova interface-attach <server> [--tag <tag>]
Notes: Attach a virtual network interface with a tag to an existing server instance. See “Device tags” for more information.
Driver Support:
missing
Status: optional.
CLI commands:
nova interface-detach <server> <port_id>
Notes: See notes for attach-interface operation.
Driver Support:
complete
Status: optional.
CLI commands:
nova host-update <host>
Notes: This operation allows a host to be placed into maintenance mode, automatically triggering migration of any running instances to an alternative host and preventing new instances from being launched. This is not considered to be a mandatory operation to support. The driver methods to implement are “host_maintenance_mode” and “set_host_enabled”.
Driver Support:
complete
Status: optional.
CLI commands:
nova evacuate <server>
nova host-evacuate <host>
Notes: A possible failure scenario in a cloud environment is the outage of one of the compute nodes. In such a case the instances of the down host can be evacuated to another host. It is assumed that the old host is unlikely ever to be powered back on, otherwise the evacuation attempt will be rejected. When the instances get moved to the new host, their volumes get re-attached and the locally stored data is dropped. That happens in the same way as a rebuild. This is not considered to be a mandatory operation to support.
Driver Support:
complete
Status: optional.
CLI commands:
nova rebuild <server> <image>
Notes: A possible use case is additional attributes need to be set to the instance, nova will purge all existing data from the system and remakes the VM with given information such as ‘metadata’ and ‘personalities’. Though this is not considered to be a mandatory operation to support.
Driver Support:
complete
Status: mandatory.
Notes: Provides realtime information about the power state of the guest instance. Since the power state is used by the compute manager for tracking changes in guests, this operation is considered mandatory to support.
Driver Support:
complete
Status: optional.
Notes: Returns the result of host uptime since power on, it’s used to report hypervisor status.
Driver Support:
complete
Status: optional.
Notes: Returns the ip of this host, it’s used when doing resize and migration.
Driver Support:
complete
Status: optional.
CLI commands:
nova live-migration <server>
nova host-evacuate-live <host>
Notes: Live migration provides a way to move an instance off one compute host, to another compute host. Administrators may use this to evacuate instances from a host that needs to undergo maintenance tasks, though of course this may not help if the host is already suffering a failure. In general instances are considered cattle rather than pets, so it is expected that an instance is liable to be killed if host maintenance is required. It is technically challenging for some hypervisors to provide support for the live migration operation, particularly those built on the container based virtualization. Therefore this operation is not considered mandatory to support.
Driver Support:
complete
Status: optional.
CLI commands:
nova live-migration-force-complete <server> <migration>
Notes: Live migration provides a way to move a running instance to another compute host. But it can sometimes fail to complete if an instance has a high rate of memory or disk page access. This operation provides the user with an option to assist the progress of the live migration. The mechanism used to complete the live migration depends on the underlying virtualization subsystem capabilities. If libvirt/qemu is used and the post-copy feature is available and enabled then the force complete operation will cause a switch to post-copy mode. Otherwise the instance will be suspended until the migration is completed or aborted.
Driver Support:
missing
Status: mandatory.
Notes: Importing pre-existing running virtual machines on a host is considered out of scope of the cloud paradigm. Therefore this operation is mandatory to support in drivers.
Driver Support:
complete
Status: optional.
CLI commands:
nova pause <server>
Notes: Stopping an instances CPUs can be thought of as roughly equivalent to suspend-to-RAM. The instance is still present in memory, but execution has stopped. The problem, however, is that there is no mechanism to inform the guest OS that this takes place, so upon unpausing, its clocks will no longer report correct time. For this reason hypervisor vendors generally discourage use of this feature and some do not even implement it. Therefore this operation is considered optional to support in drivers.
Driver Support:
missing
Status: optional.
CLI commands:
nova reboot <server>
Notes: It is reasonable for a guest OS administrator to trigger a graceful reboot from inside the instance. A host initiated graceful reboot requires guest co-operation and a non-graceful reboot can be achieved by a combination of stop+start. Therefore this operation is considered optional.
Driver Support:
complete
Status: optional.
CLI commands:
nova rescue <server>
Notes: The rescue operation starts an instance in a special configuration whereby it is booted from an special root disk image. The goal is to allow an administrator to recover the state of a broken virtual machine. In general the cloud model considers instances to be cattle, so if an instance breaks the general expectation is that it be thrown away and a new instance created. Therefore this operation is considered optional to support in drivers.
Driver Support:
complete
Status: optional.
CLI commands:
nova resize <server> <flavor>
Notes: The resize operation allows the user to change a running instance to match the size of a different flavor from the one it was initially launched with. There are many different flavor attributes that potentially need to be updated. In general it is technically challenging for a hypervisor to support the alteration of all relevant config settings for a running instance. Therefore this operation is considered optional to support in drivers.
Driver Support:
complete
Status: optional.
CLI commands:
nova resume <server>
Notes: See notes for the suspend operation
Driver Support:
missing
Status: optional.
CLI commands:
nova set-password <server>
Notes: Provides a mechanism to (re)set the password of the administrator account inside the instance operating system. This requires that the hypervisor has a way to communicate with the running guest operating system. Given the wide range of operating systems in existence it is unreasonable to expect this to be practical in the general case. The configdrive and metadata service both provide a mechanism for setting the administrator password at initial boot time. In the case where this operation were not available, the administrator would simply have to login to the guest and change the password in the normal manner, so this is just a convenient optimization. Therefore this operation is not considered mandatory for drivers to support.
Driver Support:
missing
Status: optional.
CLI commands:
nova image-create <server> <name>
Notes: The snapshot operation allows the current state of the instance root disk to be saved and uploaded back into the glance image repository. The instance can later be booted again using this saved image. This is in effect making the ephemeral instance root disk into a semi-persistent storage, in so much as it is preserved even though the guest is no longer running. In general though, the expectation is that the root disks are ephemeral so the ability to take a snapshot cannot be assumed. Therefore this operation is not considered mandatory to support.
Driver Support:
complete
Status: optional.
CLI commands:
nova suspend <server>
Notes: Suspending an instance can be thought of as roughly equivalent to suspend-to-disk. The instance no longer consumes any RAM or CPUs, with its live running state having been preserved in a file on disk. It can later be restored, at which point it should continue execution where it left off. As with stopping instance CPUs, it suffers from the fact that the guest OS will typically be left with a clock that is no longer telling correct time. For container based virtualization solutions, this operation is particularly technically challenging to implement and is an area of active research. This operation tends to make more sense when thinking of instances as pets, rather than cattle, since with cattle it would be simpler to just terminate the instance instead of suspending. Therefore this operation is considered optional to support.
Driver Support:
missing
Status: optional.
CLI commands:
nova volume-update <server> <attachment> <volume>
Notes: The swap volume operation is a mechanism for changing a running instance so that its attached volume(s) are backed by different storage in the host. An alternative to this would be to simply terminate the existing instance and spawn a new instance with the new storage. In other words this operation is primarily targeted towards the pet use case rather than cattle, however, it is required for volume migration to work in the volume service. This is considered optional to support.
Driver Support:
missing
Status: mandatory.
CLI commands:
nova delete <server>
Notes: The ability to terminate a virtual machine is required in order for a cloud user to stop utilizing resources and thus avoid indefinitely ongoing billing. Therefore this operation is mandatory to support in drivers.
Driver Support:
complete
Status: optional.
CLI commands:
nova trigger-crash-dump <server>
Notes: The trigger crash dump operation is a mechanism for triggering a crash dump in an instance. The feature is typically implemented by injecting an NMI (Non-maskable Interrupt) into the instance. It provides a means to dump the production memory image as a dump file which is useful for users. Therefore this operation is considered optional to support.
Driver Support:
missing
Status: optional.
CLI commands:
nova unpause <server>
Notes: See notes for the “Stop instance CPUs” operation
Driver Support:
missing
Status: optional.
Notes: This allows users to boot a guest with uefi firmware.
Driver Support:
missing
Status: optional.
CLI commands:
nova boot
Notes: This allows users to set tags on virtual devices when creating a server instance. Device tags are used to identify virtual device metadata, as exposed in the metadata API and on the config drive. For example, a network interface tagged with “nic1” will appear in the metadata along with its bus (ex: PCI), bus address (ex: 0000:00:02.0), MAC address, and tag (nic1). If multiple networks are defined, the order in which they appear in the guest operating system will not necessarily reflect the order in which they are given in the server boot request. Guests should therefore not depend on device order to deduce any information about their network devices. Instead, device role tags should be used. Device tags can be applied to virtual network interfaces and block devices.
Driver Support:
missing
Status: optional.
Notes: Quiesce the specified instance to prepare for snapshots. For libvirt, guest filesystems will be frozen through qemu agent.
Driver Support:
missing
Status: optional.
Notes: See notes for the quiesce operation
Driver Support:
missing
Status: optional.
CLI commands:
nova volume-attach <server> <volume>
Notes: The multiattach volume operation is an extension to the attach volume operation. It allows to attach a single volume to multiple instances. This operation is not considered to be mandatory to support. Note that for the libvirt driver, this is only supported if qemu<2.10 or libvirt>=3.10.
Driver Support:
missing
In the Policies Guide, you will find documented policies for developing with Nova-PowerVM. This includes the processes we use for blueprints and specs, bugs, contributor onboarding, and other procedural items.
Nova-PowerVM maintains all of its bugs in Launchpad. All of the current open Nova-PowerVM bugs can be found in that link.
The process of bug triaging consists of the following steps:
If you would like to contribute to the development of OpenStack, you must follow the steps in the “If you’re a developer” section of this page:
Once those steps have been completed, changes to OpenStack should be submitted for review via the Gerrit tool, following the workflow documented at:
Pull requests submitted through GitHub will be ignored.
Bugs should be filed on Launchpad, not GitHub:
Code reviews are a critical component of all OpenStack projects. Code reviews provide a way to enforce a level of consistency across the project, and also allow for the careful onboarding of contributions from new contributors.
Nova-PowerVM follows the code review guidelines as set forth for all OpenStack projects. It is expected that all reviewers are following the guidelines set forth on that page.
In the Developer Guide, you will find information on how to develop for Nova-PowerVM and how it interacts with Nova compute. You will also find information on setup and usage of Nova-PowerVM
Since nova-powervm strives to be integrated into the upstream Nova project, the source code structure matches a standard driver.
nova_powervm/
virt/
powervm/
disk/
tasks/
volume/
...
tests/
virt/
powervm/
disk/
tasks/
volume/
...
The main directory for the overall driver. Provides the driver implementation, image support, and some high level classes to interact with the PowerVM system (ex. host, vios, vm, etc…)
The driver attempts to utilize TaskFlow for major actions such as spawn. This allows the driver to create atomic elements (within the tasks) to drive operations against the system (with revert capabilities).
The disk folder contains the various ‘nova ephemeral’ disk implementations. These are basic images that do not involve Cinder.
Two disk implementations exist currently.
nova.conf
setting: image_cache_manager_interval. Also supports file-backed ephemeral
storage, which is specified by using the QCOW VG - default
volume group.
Note: Resizing instances with file-backed ephemeral is not currently
supported.The standard interface between these two implementations is defined in the driver.py. This ensures that the nova-powervm compute driver does not need to know the specifics about which disk implementation it is using.
The task folder contains TaskFlow classes. These implementations simply wrap around other methods, providing logical units that the compute driver can use when building a string of actions.
The tasks in this directory encapsulate this. If anything fails, they have corresponding reverts. The logic to perform these operations is contained elsewhere; these are simple wrappers that enable embedding into Taskflow.
The volume folder contains the Cinder volume connectors. A volume connector is the code that connects a Cinder volume (which is visible to the host) to the Virtual Machine.
The PowerVM Compute Driver has an interface for the volume connectors defined in this folder’s driver.py.
The PowerVM Compute Driver provides two implementations for Fibre Channel attached disks.
- Virtual SCSI (vSCSI): The disk is presented to a Virtual I/O Server and the data is passed through to the VM through a virtualized SCSI connection.
- N-Port ID Virtualization (NPIV): The disk is presented directly to the VM. The VM will have virtual Fibre Channel connections to the disk, and the Virtual I/O Server will not have the disk visible to it.
This page describes how to setup a working Python development environment that can be used in developing Nova-PowerVM.
These instructions assume you’re already familiar with Git and Gerrit, which is a code repository mirror and code review toolset, however if you aren’t please see this Git tutorial for an introduction to using Git and this guide for a tutorial on using Gerrit and Git for code contribution to OpenStack projects.
Grab the code:
git clone git://git.openstack.org/openstack/nova-powervm
cd nova-powervm
The purpose of this project is to provide the ‘glue’ between OpenStack Compute (Nova) and PowerVM. The pypowervm project is used to control PowerVM systems.
It is recommended that you clone down the OpenStack Nova project along with pypowervm into your respective development environment.
Running the tox python targets for tests will automatically clone these down via the requirements.
Additional project requirements may be found in the requirements.txt file.
To make use of the PowerVM drivers, a PowerVM system set up with NovaLink is required. The nova-powervm driver should be installed on the management VM.
Note: Installing the NovaLink software creates the pvm_admin
group. In
order to function properly, the user executing the Nova compute service must
be a member of this group. Use the usermod
command to add the user. For
example, to add the user stacker
to the pvm_admin
group, execute:
sudo usermod -a -G pvm_admin stacker
The user must re-login for the change to take effect.
The NovaLink architecture is such that the compute driver runs directly on the PowerVM system. No external management element (e.g. Hardware Management Console or PowerVC) is needed. Management of the virtualization is driven through a thin virtual machine running on the PowerVM system.
Configuration of the PowerVM system and NovaLink is required ahead of time. If the operator is using volumes or Shared Storage Pools, they are required to be configured ahead of time.
After nova-powervm has been installed the user must enable PowerVM as the
compute driver. To do so, set the compute_driver
value in the nova.conf
file to compute_driver = powervm_ext.driver.PowerVMDriver
.
The standard nova configuration options are supported. In particular, to use
PowerVM SR-IOV vNIC for networking, the pci_passthrough_whitelist
option
must be set. See the networking-powervm usage devref for details.
Additionally, a [powervm]
section is used to provide additional
customization to the driver.
By default, no additional inputs are needed. The base configuration allows for a Nova driver to support ephemeral disks to a local volume group (only one can be on the system in the default config). Connecting Fibre Channel hosted disks via Cinder will use the Virtual SCSI connections through the Virtual I/O Servers.
Operators may change the disk driver (nova based disks - NOT Cinder) via the
disk_driver
property.
All of these values are under the [powervm]
section. The tables are broken
out into logical sections.
To generate a sample config file for [powervm]
run:
oslo-config-generator --namespace nova_powervm > nova_powervm_sample.conf
The [powervm]
section of the sample can then be edited and pasted into the
full nova.conf file.
Configuration option = Default Value | Description |
---|---|
proc_units_factor = 0.1 | (FloatOpt) Factor used to calculate the processor units per vcpu. Valid values are: 0.05 - 1.0 |
uncapped_proc_weight = 64 | (IntOpt) The processor weight to assign to newly created VMs. Value should be between 1 and 255. Represents the relative share of the uncapped processor cycles the Virtual Machine will receive when unused processor cycles are available. |
Configuration option = Default Value | Description |
---|---|
disk_driver = localdisk | (StrOpt) The disk driver to use for PowerVM disks. Valid options are: localdisk, ssp If localdisk is specified and only one non-rootvg Volume Group exists on one of the Virtual I/O Servers, then no further config is needed. If multiple volume groups exist, then further specification can be done via the volume_group_name option. Live migration is not supported with a localdisk config. If ssp is specified, then a Shared Storage Pool will be used. If only one SSP exists on the system, no further configuration is needed. If multiple SSPs exist, then the cluster_name property must be specified. Live migration can be done within a SSP cluster. |
cluster_name = None | (StrOpt) Cluster hosting the Shared Storage Pool to use for storage operations. If none specified, the host is queried; if a single Cluster is found, it is used. Not used unless disk_driver option is set to ssp. |
volume_group_name = None | (StrOpt) Volume Group to use for block device operations. Must not be rootvg. If disk_driver is localdisk, and more than one non-rootvg volume group exists across the Virtual I/O Servers, then this attribute must be specified. |
Configuration option = Default Value | Description |
---|---|
fc_attach_strategy = vscsi | (StrOpt) The Fibre Channel Volume Strategy defines how FC Cinder volumes should be attached to the Virtual Machine. The options are: npiv or vscsi. It should be noted that if NPIV is chosen, the WWPNs will not be active on the backing fabric during the deploy. Some Cinder drivers will operate without issue. Others may query the fabric and thus will fail attachment. It is advised that if an issue occurs using NPIV, the operator fall back to vscsi based deploys. |
vscsi_vios_connections_required = 1 | (IntOpt) Indicates a minimum number of Virtual I/O Servers that are required to support a Cinder volume attach with the vSCSI volume connector. |
ports_per_fabric = 1 | (IntOpt) (NPIV only) The number of physical ports that should be connected directly to the Virtual Machine, per fabric. Example: 2 fabrics and ports_per_fabric set to 2 will result in 4 NPIV ports being created, two per fabric. If multiple Virtual I/O Servers are available, will attempt to span ports across I/O Servers. |
fabrics = A | (StrOpt) (NPIV only) Unique identifier for each physical FC fabric that is available. This is a comma separated list. If there are two fabrics for multi-pathing, then this could be set to A,B. The fabric identifiers are used for the ‘fabric_<identifier>_port_wwpns’ key. |
fabric_<name>_port_wwpns | (StrOpt) (NPIV only) A comma delimited list of all the physical FC port WWPNs that support the specified fabric. Is tied to the NPIV ‘fabrics’ key. |
Configuration option = Default Value | Description |
---|---|
vopt_media_volume_group = root_vg | (StrOpt) The volume group on the system that should be used to store the config drive metadata that will be attached to the VMs. |
vopt_media_rep_size = 1 | (IntOpt) The size of the media repository (in GB) for the metadata for config drive. Only used if the media repository needs to be created. |
image_meta_local_path = /tmp/cfgdrv/ | (StrOpt) The location where the config drive ISO files should be built. |
This page describes how to run the Nova-PowerVM tests. This page assumes you have already set up an working Python environment for Nova-PowerVM development.
Nova-PowerVM, like other OpenStack projects, uses tox for managing the virtual environments for running test cases. It uses Testr for managing the running of the test cases.
Tox handles the creation of a series of virtualenvs that target specific versions of Python.
Testr handles the parallel execution of series of test cases as well as the tracking of long-running tests and other things.
For more information on the standard tox-based test infrastructure used by OpenStack and how to do some common test/debugging procedures with Testr, see this wiki page:
Running pep8 and unit tests is as easy as executing this in the root directory of the Nova-PowerVM source code:
tox
To run only pep8:
tox -e pep8
To restrict the pylint check to only the files altered by the latest patch changes:
tox -e pep8 HEAD~1
To run only the unit tests:
tox -e py27,py34
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/nova-powervm/+spec/example
Introduction paragraph – why are we doing anything? A single paragraph of prose that operators can understand. The title and this first paragraph should be used as the subject line and body of the commit message respectively.
Some notes about the nova-powervm spec and blueprint process:
Some notes about using this template:
Your spec should be in ReSTructured text, like this template.
Please wrap text at 79 columns.
The filename in the git repository should match the launchpad URL, for example: https://blueprints.launchpad.net/nova-powervm/+spec/awesome-thing should be named awesome-thing.rst
Please do not delete any of the sections in this template. If you have nothing to say for a whole section, just write: None
For help with syntax, see http://sphinx-doc.org/rest.html
To test out your formatting, build the docs using tox and see the generated HTML file in doc/build/html/specs/<path_of_your_file>
If you would like to provide a diagram with your spec, ascii diagrams are required. http://asciiflow.com/ is a very nice tool to assist with making ascii diagrams. The reason for this is that the tool used to review specs is based purely on plain text. Plain text will allow review to proceed without having to look at additional files which can not be viewed in gerrit. It will also allow inline feedback on the diagram itself.
If your specification proposes any changes to the Nova REST API such as changing parameters which can be returned or accepted, or even the semantics of what happens when a client calls into the API, then you should add the APIImpact flag to the commit message. Specifications with the APIImpact flag can be found with the following query:
https://review.openstack.org/#/q/status:open+project:openstack/nova-powervm+message:apiimpact,n,z
A detailed description of the problem. What problem is this blueprint addressing?
What use cases does this address? What impact on actors does this change have? Ensure you are clear about the actors in each use case: Developer, End User, Deployer etc.
Here is where you cover the change you propose to make in detail. How do you propose to solve this problem?
If this is one part of a larger effort make it clear where this piece ends. In other words, what’s the scope of this effort?
At this point, if you would like to just get feedback on if the problem and proposed change fit in nova-powervm, you can stop here and post this for review to get preliminary feedback. If so please say: Posting to get preliminary feedback on the scope of this spec.
What other ways could we do this thing? Why aren’t we using those? This doesn’t have to be a full literature review, but it should demonstrate that thought has been put into why the proposed solution is an appropriate one.
Describe any potential security impact on the system. Some of the items to consider include:
For more detailed guidance, please see the OpenStack Security Guidelines as a reference (https://wiki.openstack.org/wiki/Security/Guidelines). These guidelines are a work in progress and are designed to help you identify security best practices. For further information, feel free to reach out to the OpenStack Security Group at openstack-security@lists.openstack.org.
How would the end user be impacted by this change? The “End User” is defined as the users of the deployed cloud.
Describe any potential performance impact on the system, for example how often will new code be called, and is there a major change to the calling pattern of existing code.
Examples of things to consider here include:
Discuss things that will affect how you deploy and configure OpenStack that have not already been mentioned, such as:
Discuss things that will affect other developers working on the driver or OpenStack in general.
Describe any potential upgrade impact on the system, such as:
Who is leading the writing of the code? Or is this a blueprint where you’re throwing it out there to see who picks it up?
If more than one person is working on the implementation, please designate the primary author and contact.
Work items or tasks – break the feature up into the things that need to be done to implement it. Those parts might end up being done by different people, but we’re mostly trying to understand the timeline for implementation.
Please discuss the important scenarios needed to test here, as well as specific edge cases we should be ensuring work correctly. For each scenario please specify if this requires specialized hardware, a full openstack environment, or can be simulated inside the nova-powervm tree.
Please discuss how the change will be tested. We especially want to know what tempest tests will be added. It is assumed that unit test coverage will be added so that doesn’t need to be mentioned explicitly, but discussion of why you think unit tests are sufficient and we don’t need to add more tempest tests would need to be included.
Is this untestable in gate given current limitations (specific hardware / software configurations available)? If so, are there mitigation plans (3rd party testing, gate enhancements, etc).
Which audiences are affected most by this change, and which documentation titles on nova-powervm.readthedocs.io should be updated because of this change? Don’t repeat details discussed above, but reference them here in the context of documentation for multiple audiences. For example, the Operations Guide targets cloud operators, and the End User Guide would need to be updated if the change offers a new feature available through the CLI or dashboard. If a config option changes or is deprecated, note here that the documentation needs to be updated to reflect this specification’s change.
Please add any useful references here. You are not required to have any reference. Moreover, this specification should still make sense when your references are unavailable. Examples of what you could include are:
Contents:
Currently the PowerVM driver requires a PowerVM specific Neutron agent. This blueprint will add support for additional agent types - specifically the Open vSwitch and Linux Bridge agents provided by Neutron.
PowerVM has support for virtualizing an Ethernet port using the Virtual I/O Server and Shared Ethernet. This is provided using networking-powervm Shared Ethernet Agent. This agent provides key PowerVM use cases such as I/O redundancy.
There are a subset of operators that have asked for VIF support in line with other hypervisors. This would be support for the Neutron Linux Bridge Agent and Open vSwitch agent. While these agents do not provide use cases such as I/O redundancy, they do enable operators to utilize common upstream networking solutions when deploying PowerVM with OpenStack
An operator should be able to deploy an environment using Linux Bridge or Open vSwitch Neutron agents. In order to do this, the physical I/O must be assigned to the NovaLink partition on the PowerVM system (the partition with virtualization admin authority).
A user should be able to do the standard VIF use cases with either of these agents:
The existing Neutron agents should be used without any changes from PowerVM. All of the changes that should occur will be in nova-powervm. Any limitations of the agents themselves will be limitations to the PowerVM implementation.
There is one exception to the use case support. The Open vSwitch support will enable live migration. There is no plan for Linux Bridges live migration support.
It should be noted that Hybrid VIF plugging will not be supported. Instead, PowerVM will use the conntrack integration in Ubuntu 16.04/OVS 2.5 to support the OVSFirewallDriver. As of OVS 2.5, that allows the firewall function without needing Hybrid VIF Plugging.
None.
None.
None.
Performance will not be impacted for the deployment of VMs. However, the end user performance may change as it is a new networking technology. Both the Linux Bridge and Open vSwitch support should operate with similar performance characteristics as other platforms that support these technologies.
The deployer will need to do the following:
No major changes are anticipated outside of this. The Shared Ethernet Adapter Neutron agent will not work in conjunction with this on the same system.
None
See Proposed Change
Testing will be done on live systems. Future work will be done to integrate into the PowerVM Third-Party CI, however this will not be done initially as the LB and OVS agents are heavily tested. The SEA Agent continues to need to be tested.
Deployer documentation will be built around how to configure this.
https://blueprints.launchpad.net/nova-powervm/+spec/powervm-sriov-nova
This blueprint will address support of SR-IOV in conjunction with SR-IOV VF attached to VM via PowerVM vNIC into nova-powervm. SR-IOV support was added to Juno release of OpenStack, this blueprint will fit this scenario implementation into it.
A separate blueprint for networking-powervm has been made available for design elements regarding networking-powervm.
These blueprints will be implemented during Newton cycle of OpenStack development. Referring to Newton schedule, development should be completed during newton-3.
Refer to glossary section for explanation of terms.
OpenStack PowerVM drivers currently support networking aspect of PowerVM virtualization using Shared Ethernet Adapter, Open vSwitch and Linux Bridge. There is a need for supporting SR-IOV ports with redundancy/failover and migration. It is possible to associate SR-IOV VF to a VM directly, but this path will not be supported by this design. Such a setup will not provide migration support anyway. Support for this configuration will be added in future. This path also does not utilize advantages of hardware level virtualization offered by SR-IOV architecture.
Users should be able to manage a VM with SR-IOV vNIC as a network interface. This management should include migration of VM with SR-IOV vNIC attached to it.
PowerVM has a feature called vNIC which can is tied in with SR-IOV. By using vNIC the following use cases are supported: - Fail over I/O to a different I/O Server and physical function - Live Migration with SR-IOV, without significant intervention The vNIC is exposed to the VM, and the mac address of the client vNIC will match the neutron port.
In summary, this blueprint will solve support of SR-IOV in nova-powervm for these scenarios:
Ability to associate a SR-IOV VF directly to a VM will be done in future.
Refer to separate blueprint for networking-powervm for changes in networking-powervm component. This blueprint will focus on changes to nova-powervm only.
The changes will be made in two areas:
1. Compute virt driver. PowerVM compute driver is in nova.virt.powervm.driver.PowerVMDriver and it will be enhanced for SR-IOV vNIC support. A dictionary is maintained in virt driver vif code to map between vif type and vif driver class. Based on vif type of vif object that needs to be plugged, appropriate vif driver will be invoked. This dictionary will be modified to include a new vif driver class and its vif type (pvm_sriov).
The PCI Claims process expects to be able to “claim” a VF from the
pci_passthrough_devices
list each time a vNIC is plugged, and return it to
the pool on unplug. Thus the get_available_resource
API will be enhanced to
populate this device list with a suitable number of VFs.
2. VIF driver. PowerVM VIF driver is in nova_powervm.virt.powervm.vif.PvmVifDriver. A VIF driver to attach network interface via vNIC (PvmVnicSriovVifDriver) and plug/ unplug methods will be implemented. Plug and unplug methods will use pypowervm code to create VF/vNIC server/vNIC clients and attach/detach them. Neutron port carries binding:vif_type and binding:vnic_type attributes. The vif type for this implementation will be pvm_sriov. The vnic_type will be ‘direct’.
A VIF driver (PvmVFSriovVifDriver) for directly attached to VM will get implemented in future.
Deployment of VM with SR-IOV vNIC will involve picking Physical Port(s), VIOS(es) and a VM and invoking pypowervm library. Similarly, attachment of the same to an existing VM will be implemented. RMC will be required. Evacuate and migration of VM will be supported with changes to compute virt driver and VIF driver via pypowervm library.
Physical Port information will be derived from port label attribute of physical ports on SR-IOV adapters. Port label attribute of physical ports will have to be updated with ‘physical network’ names during configuration of the environment. During attachment of SR-IOV backed vNIC to a VM, physical network attribute of neutron network will be matched with port labels of physical ports to gather a list of physical ports.
Failover/redundancy: VIF plug during deploy (or attach of network interface
to a VM) will pass more than one Physical port and VIOS(es) (as stated above in
deploy scenario) to pypowervm library to create vNIC on VIOS with redundancy. It
should be noted that failover is handled automatically by the platform when a
vNIC is backed by multiple VFs. The redundancy level will be controlled by an
AGENT
option vnic_required_vfs
in the ML2 configuration file (see the
blueprint for networking-powervm). It will have a default of 2.
Quality of Service: Each VF backing a vNIC can be configured with a capacity
value, dictating the minimum percentage of the physical port’s total bandwidth
that will be available to that VF. The ML2 configuration file allows a
vnic_vf_capacity
option in the AGENT
section to set the capacity for all
vNIC-backing VFs. If omitted, the platform defaults to the capacity granularity
for each physical port. See the blueprint for networking-powervm for
details of the configuration option; and see section 1.3.3 of the IBM Power
Systems SR-IOV Technical Overview and Introduction for details on VF
capacity.
For future implementation of VF - VM direct attach of SR-IOV to a VM, the request will include physical network name. PvmVFSriovVifDriver can lookup devname(s) associated with it from port label, get physical port information and create a SR-IOV logical port on the corresponding VM. Or may include a configuration option to allow the user to dictate how many ports to attach. Using NIB technique, users can setup redundancy.
For VF - vNIC - VM attach of SR-IOV port to a VM, the corresponding neutron network object will include physical network name, PvmVnicSriovVifDriver can lookup devname(s) associated with it from port label, get physical port information. Along with adapter ID and physical port ID, VIOS information will be added and a VNIC dedicated port on the corresponding VM will be created.
For migration scenario, physical network names should match on source and destination compute nodes, and accordingly in the physical port labels. On the destination, vNICs will be rebuilt based on the SR-IOV port configuration. The platform decides how to reconstruct the vNIC on the destination in terms of number and distribution of backing VFs, etc.
None
None
None
Since the number of VMs deployed on the host will depend on number of VFs offered by SR-IOV cards in the environment, scale tests will be limited in density of VMs.
Sriov
mode. This can be done via the
pvmctl
command, e.g.:pvmctl sriov update -i phys_loc=U78C7.001.RCH0004-P1-C1 -s mode=Sriov
pvmctl
command, e.g.:pvmctl sriov update --port-loc U78C7.001.RCH0004-P1-C1-T1 -s label=prod_net
pci_passthrough_whitelist
option in the nova configuration file must
include entries for each neutron physical network to be enabled for vNIC.
Only the physical_network
key is required. For example:pci_passthrough_whitelist = [{"physical_network": "default"}, {"physical_network": "prod_net"}]
Configuration is also required on the networking side - see the blueprint for networking-powervm for details.
To deploy a vNIC to a VM, the neutron port(s) must be pre-created with vnic
type direct
, e.g.:
neutron port-create --vnic-type direct
None
nova-powervm changes:
VIF driver for SR-IOV VF connected to VM directly will be a future work item.
1. Unit test All developed code will accompany structured unit test around them. These tests validate granular function logic.
2. Function test Function test will be performed along with CI infrastructure. Changes implemented for this blueprint will be tested via CI framework that exists and used by IBM team. CI framework needs to be enhanced with SR-IOV hardware. The tests can be executed in batch mode, probably as nightly jobs.
All use-cases need to be documented in developer docs that accompany nova-powervm.
SR-IOV: | Single Root I/O Virtualization, used for virtual environments where VMs need direct access to network interface without any hypervisor overheads. |
---|---|
Physical Port: | Represents Physical port in SR-IOV adapter. This is not same as Physical Function. A Physical Port can have many physical functions associated with it. To clarify further, if a Physical Port supports RCoE, then it will have two Physical Functions. In other words, one Physical Function per protocol that port supports. |
Virtual Function (VF): | |
Represents Virtual port belonging to a Physical Port (PF). Either directly or indirectly (using vNIC) a Virtual Function (VF) is connected to a VM. This is otherwise called SR-IOV logical port. |
|
Dedicated SR-IOV: | |
This is equivalent to any regular ethernet card and it can be used with SEA. A logical port of a physical port can be assigned as a backing device for SEA. |
|
Shared SR-IOV: | A VF to VM is not supported in Newton release. But an SR-IOV card in sriov mode is what we will be used for vNIC as described in this blueprint. Also, a SR-IOV in Sriov mode can have a promiscous VF assigned to the VIOS and configured for SEA(said configuration to be done outside of the auspices of OpenStack), which can then be used just like any other SEA configuration, and is supported (as described in next item below). |
Shared Ethernet Adapter: | |
Alternate technique to provide network interface to a VM. This involves attachment to a physical interface on PowerVM host and one or many virtual interfaces that are connected to VMs. A VF of PF in SR-IOV based environment can be a physical interface to Shared Ethernet Adapter. Existing support for this configuration in nova-powervm and networking-powervm will continue. |
|
vNIC: | A vNIC is an intermediary between VF of PF and VM. This resides on VIOS and connects to a VF one one end and vNIC client adapter inside a VM. This is mainly to support migration of VMs across hosts. |
vNIC failover/redundancy: | |
Multiple vNIC servers (connected to as many VFs that belong to as many PFs either on same SR-IOV card or across) connected to same VM as one network interface. Failure of one vNIC/VF/PF path will result in activation of other such path. |
|
VIOS: | A partition in PowerVM systems dedicated for i/o operations. In the context of this blueprint, vNIC server will be created on VIOS. For redundancy management purposes, a specific PowerVM system may employ more than one VIOS partitions. |
VM migration types: | |
|
|
pypowervm: | A python library that runs on the PowerVM management VM and allows virtualization control of the system. This is similar to the python library for libvirt. |
Release Name | Description |
---|---|
Newton | Introduced |
https://blueprints.launchpad.net/nova-powervm/+spec/image-cache-powervm
The image cache allows for a nova driver to pull an image from glance once, then use a local copy of that image for future VM creation. This saves bandwidth between the compute host and glance. It also improves VM deployment speed and reduces the stress on the overall infrastructure.
Deploy times on PowerVM can be high when using the localdisk driver. This is partially due to not having linked clones. The image cache offers a way to reduce those deploy times by transferring the image to the host once, and then subsequent deploys will reuse that image rather than streaming from glance.
There are complexities with this of course. The cached images take up disk space, but the overall image cache from core Nova takes that into account. The value of using the nova image cache design is that it has hooks in the code to help solve these problems.
Create a subclass of nova.virt.imagecache.ImageManager in the nova-powervm project. It should implement the necessary methods of the cache:
The nova-powervm driver will need to be updated to utilize the cache. This includes:
The localdisk driver within nova-powervm will be updated to have the following logic. It will check the volume group backing the instance. If the volume group has a disk with the name ‘i_<partial uuid of image>’, it will simply copy that disk into a new disk named after the UUID of the instance. Otherwise, it will create a disk with the name ‘i_<partial uuid of image>’ that contains the image.
The image cache manager’s purpose is simply to clean out old images that are not needed by any instances anymore.
Further extension, not part of this blueprint, can be done to manage overall disk space in the volume group to make sure that the image cache is not overwhelming the backing disks.
None
None
Performance of subsequent deploys of the same image should be faster. The deploys will have improved image copy times and reduced network bandwith requirements.
Performance of single deploys using different images will be slower.
This change will take effect without any deployer impact immediately after merging. The deployer will not need to take any specific upgrade actions to make use of it; however the deployer may need to tune the image cache to make sure it is not using too much disk space.
A conf option may be added to force the image cache off if deemed necessary. This will be based off of operator feedback in the event that we need a way to reduce disk usage.
None
None
The deployer docs will be updated to reflect this.
None
https://blueprints.launchpad.net/nova-powervm/+spec/file-io-cinder-connector
There are several Cinder drivers that support having the file system mounted locally and then connecting in to the VM as a volume (ex. GPFS, NFS, etc…). There is the ability to support this type of volume in PowerVM, if the user has mounted the file system to the NovaLink. This blueprint adds support to the PowerVM driver to support such Cinder volumes.
The PowerVM driver supports Fibre Channel and iSCSI based volumes. It does not currently support volumes that are presented on a file system as files.
The recent release of PowerVM NovaLink has added support for this in the REST API. This blueprint looks to take advantage of that support.
Add nova_powervm/virt/powervm/volume/fileio.py. This would extend the existing volume drivers. It would store the LUN ID on the scsi bus.
This does not support traditional VIOS. Like the iSCSI change, it would require running through the NovaLink partition.
None
None.
One may consider the permission of the file presented by Cinder. The Cinder driver’s BDM will provide a path to a file. The hypervisor will map that file as the root user. So file permissions of the volume should not be a concern. This seems consistent with the other hypervisors utilizing these types of Cinder drivers.
None
None
Deployer must set up the backing Cinder driver and connect the file systems to the NovaLink partition in their environment.
None
Unit Testing is obvious.
Manual testing will be driven via connecting to a GPFS back-end.
CI environments will be evaluated to determine if there is a way to add this to the current CI infrastructure.
None. Will update the nova-powervm dev-ref to reflect that ‘file I/O drivers’ are supported, but the support matrix doesn’t go into the detail of what cinder drivers work with nova drivers.
https://blueprints.launchpad.net/nova-powervm/+spec/file-io-driver
The PowerVM driver currently uses logical volumes for localdisk ephemeral storage. This blueprint will add support for using file-backed disks as a localdisk ephemeral storage option.
The PowerVM driver only supports logical volumes for localdisk ephemeral storage. It does not currently support storage that is presented as a file.
Add nova_powervm/virt/powervm/disk/fileio.py. This would extend the existing disk driver. Use the DISK_DRIVER powervm conf option to select file I/O. Will utilize the nova.conf option instances_path.
None
None
None
Performance may change as the backing storage methods of VMs will be different.
The deployer must set the DISK_DRIVER conf option to fileio and ensure that the instances_path conf option is set in order to utilize the changes described in the blueprint.
None
Novalink 1.0.0.5
Will update the nova-powervm dev-ref to include File I/O as an additional ephemeral disk option.
None
Include the URL of your launchpad blueprint:
https://blueprints.launchpad.net/nova-powervm/+spec/srr-capability-dynamic-toggle
Currently to enable or disable the SRR capability on the VM we need to have the VM in shutoff state. We should be able to toggle this field dynamically so that the shutdown of a VM is not needed.
The simplified remote restart (SRR) capability governs whether a VM can be rebuilt (remote restarted) on a different host when the host on which the VM resides is down. Currently this attribute can be changed only when the VM is in shut-off state. This blueprint addresses that by enabling toggle of simplified remote restart capability dynamically (while the VM is still active).
The end user would like to : - Enable the srr capability on the VM without shutting it down so that any workloads on the VM are unaffected. - Disable the srr capability for a VM which need not be rebuilt to another host while the VM is still up and running.
The SRR capability is a VM level attribute and can be changed using the resize operation. In case of a resize operation for an active VM - Check if the hypervisor supports dynamic toggle of srr capability. - If it is supported proceed with updating of srr capability if it has been changed. - Throw a warning if update of srr capability is not supported.
None
None
None
The change is srr capability is not likely to happen very frequently so this should not have a major impact. When the change happens the impact on the performance of any other component (the VM, the compute service, the REST service, etc.) should be negligible.
End user will be able to dynamically the toggle the srr capability for the VM. The changes can be utilized immediately once they are deployed.
None
NA
Need to work with PowerVM platform team to ensure that the srr toggle capability is exposed for the Compute driver to consume.
The testing of the change requires full Openstack environment with Compute resources configured. - Ensure srr state for VM can be toggled when it is up and running. - Ensure srr state for VM can be toggled when it is shut-off. - Perform rebuild operations to ensure that the capability is indeed getting utilized.
None
None
https://blueprints.launchpad.net/nova-powervm/+spec/device-passthrough
Provide a generic way to identify hardware devices such as GPUs and attach them to VMs.
Deployers want to be able to attach accelerators and other adapters to their VMs. Today in Nova this is possible only in very restricted circumstances. The goal of this blueprint is to enable generic passthrough of devices for consumers of the nova-powervm driver.
While these efforts may enable more, and should be extensible going forward, the primary goal for the current release is to pass through entire physical GPUs. That is, we are not attempting to pass through:
As an admin, I wish to be able to configure my host and flavors to allow passthrough of whole physical GPUs to VMs.
As a user, I wish to make use of appropriate flavors to create VMs with GPUs attached.
The administrator can identify and allow (explicitly) or deny (by omission) passthrough of devices by way of a YAML file per compute host.
Note
Future: We may someday figure out a way to support a config file on the controller. This would allow e.g. cloud-wide whitelisting and specification for particular device types by vendor/product ID, which could then be overridden (or not) by the files on the compute nodes.
The path to the config will be hardcoded as /etc/nova/inventory.yaml
.
The file shall contain paragraphs, each of which will:
IOSlot
NovaLink REST object. In pypowervm, given a ManagedSystem wrapper
sys_w
, a list of IOSlot
wrappers is available via
sys_w.asio_config.io_slots
. See identification. Any device not
identified by any paragraph in the file is denied for passthrough. But see
the allow section for future plans around supporting explicit denials.CUSTOM_IOSLOT
is used. See resource_class.A formal schema is proposed for review.
Here is a summary description of each section.
Each paragraph will be introduced by a key which is a human-readable name for the paragraph. The name has no programmatic significance other than to separate paragraphs. Each paragraph’s name must be unique within the file.
Each paragraph will have an identification
section, which is an object
containing one or more keys corresponding to IOSlot
properties, as follows:
YAML key IOSlot property Description vendor_id pci_vendor_id X{4} (four uppercase hex digits) device_id pci_dev_id X{4} “ subsys_vendor_id pci_subsys_vendor_id X{4} “ subsys_device_id pci_subsys_dev_id X{4} “ class pci_class X{4} “ revision_id pci_rev_id X{2} (two uppercase hex digits) drc_index drc_index X{8} (eight uppercase hex digits) drc_name drc_name String (physical location code)
The values are expected to match those produced by pvmctl ioslot list -d
<property>
for a given property.
The identification
section is required, and must contain at least one of
the above keys.
When multiple keys are provided in a paragraph, they are matched with AND
logic.
Note
It is a stretch goal of this blueprint to allow wildcards in (some
of) the values. E.g. drc_name: U78CB.001.WZS0JZB-P1-*
would
allow everything on the P1
planar of the U78CB.001.WZS0JZB
enclosure. If we get that far, a spec amendment will be proposed with
the specifics (what syntax, which fields, etc.).
Note
The allow
section will not be supported initially, but is
documented here because we thought through what it should look like.
In the initial implementation, any device encompassed by a paragraph
is allowed for passthrough.
Each paragraph will support a boolean allow
keyword.
If omitted, the default is true
- i.e. devices identified by this
paragraph’s identification
section are permitted for passthrough. (Note,
however, that devices not encompassed by the union of all the
identification
paragraphs in the file are denied for passthrough.)
If allow
is false
, the only other section allowed is
identification
, since the rest don’t make sense.
A given device can only be represented once across all allow=true
paragraphs (implicit or explicit); an “allowed” device found more than once
will result in an error.
A given device can be represented zero or more times across all allow=false
paragraphs.
We will first apply the allow=true
paragraphs to construct a preliminary
list of devices; and then apply each allow=false
paragraph and remove
explicitly denied devices from that list.
Note
Again, we’re not going to support the allow
section at all
initially. It will be a stretch goal to add it as part of this
release, or it may be added in a subsequent release.
If allow
is omitted or true
, an optional resource_class
key is
supported. Its string value allows the author to designate the resource class
to be used for the inventory unit representing the device on the resource
provider. If omitted, CUSTOM_IOSLOT
will be used as the default.
Note
Future: We may be able to get smarter about dynamically defaulting the resource class based on inspecting the device metadata. For now, we have to rely on the author of the config file to tell us what kind of device we’re looking at.
If allow
is omitted or true
, an optional traits
subsection is
supported. Its value is an array of strings, each of which is the name of a
trait to be added to the resource providers of each device represented by this
paragraph. If the traits
section is included, it must have at least one
value in the list. (If no additional traits are desired, omit the section.)
The values must be valid trait names (either standard from os-traits
or
custom, matching CUSTOM_[A-Z0-9_]*
). These will be in addition to the
traits automatically added by the driver - see Generated Traits below.
Traits which conflict with automatically-generated traits will result in an
error: the driver must be the single source of truth for the traits it
generates.
Traits may be used to indicate any static attribute of a device - for example,
a capability (CUSTOM_CAPABILITY_WHIZBANG
) not otherwise indicated by
Generated Traits.
The driver shall create nested resource providers, one per device (slot), as children of the compute node provider generated by Nova.
The provider name shall be generated as PowerVM IOSlot %(drc_index)08X
e.g.
PowerVM IOSlot 1C0FFEE1
. We shall let the placement service generate the
UUID. This naming scheme allows us to identify the full set of providers we
“own”. This includes identifying providers we may have created on a previous
iteration (potentially in a different process) which now need to be purged
(e.g. because the slot no longer exists on the system). It also helps us
provide a clear migration path in the future, if, for example, Cyborg takes
over generating these providers. It also paves the way for providers
corresponding to things smaller than a slot; e.g. PFs might be namespaced
PowerVM PF %(drc_index)08X
.
Each device RP shall have an inventory of:
total: 1
reserved: 0
min_unit: 1
max_unit: 1
step_size: 1
allocation_ratio: 1.0
of the resource_class specified in the config file for the paragraph
matching this device (CUSTOM_IOSLOT
by default).
Note
Future: Some day we will provide SR-IOV VFs, vGPUs, FPGA regions/functions, etc. At that point we will conceivably have inventory of multiple units of multiple resource classes, etc.
The provider for a device shall be decorated with the following automatically-generated traits:
CUSTOM_POWERVM_IOSLOT_VENDOR_ID_%(vendor_id)04X
CUSTOM_POWERVM_IOSLOT_DEVICE_ID_%(device_id)04X
CUSTOM_POWERVM_IOSLOT_SUBSYS_VENDOR_ID_%(subsys_vendor_id)04X
CUSTOM_POWERVM_IOSLOT_SUBSYS_DEVICE_ID_%(subsys_device_id)04X
CUSTOM_POWERVM_IOSLOT_CLASS_%(class)04X
CUSTOM_POWERVM_IOSLOT_REVISION_ID_%(revision_id)02X
CUSTOM_POWERVM_IOSLOT_DRC_INDEX_%(drc_index)08X
CUSTOM_POWERVM_IOSLOT_DRC_NAME_%(drc_name)s
where drc_name
is
normalized via os_traits.normalize_name
.In addition, the driver shall decorate the provider with any traits specified in the config file paragraph identifying this device. If that paragraph specifies any of the above generated traits, an exception shall be raised (we’ll blow up the compute service).
The above provider tree structure/data shall be provided to Nova by overriding
the ComputeDriver.update_provider_tree
method. The algorithm shall be as
follows:
GET /ManagedSystem
, pull out
.asio_config.io_slots
).PowerVM
IOSlot
and delete the resulting “orphans”.This is in addition to the standard update_provider_tree
contract of
ensuring appropriate VCPU
, MEMORY_MB
, and DISK_GB
resources on the
compute node provider.
Note
It is a stretch goal of this blueprint to implement caching and/or other enhancements to the above algorithm to optimize performance by minimizing the need to call PowerVM REST and/or process whitelist files every time.
Existing Nova support for generic resource specification via flavor extra specs should “just work”. For example, a flavor requesting two GPUs might look like:
resources:VCPU=1
resources:MEMORY_MB=2048
resources:DISK_GB=100
resources1:CUSTOM_GPU=1
traits1:CUSTOM_POWERVM_IOSLOT_VENDOR_ID_G00D=required
traits1:CUSTOM_POWERVM_IOSLOT_PRODUCT_ID_F00D=required
resources2:CUSTOM_GPU=1
traits2:CUSTOM_POWERVM_IOSLOT_DRC_INDEX_1C0FFEE1=required
During spawn
, we will query placement to retrieve the resource provider
records listed in the allocations
parameter. Any provider names which are
prefixed with PowerVM IOSlot
will be parsed to extract the DRC index (the
last eight characters of the provider name). The corresponding slots will be
extracted from the ManagedSystem
payload and added to the
LogicalPartition
payload for the instance as it is being created.
IOSlots are detached automatically when we DELETE
the LogicalPartition
,
so no changes should be required here.
Since we can’t migrate the state of an active GPU, we will block live migration of a VM with an attached IOSlot.
We should get these for free, but need to make sure they’re tested.
This is not in the scope of the current effort. For now, attaching/detaching devices to/from existing VMs can only be accomplished via resize (Cold Migration).
Use Nova’s PCI passthrough subsystem. We’ve all agreed this sucks and is not the way forward.
Use oslo.config instead of a YAML file. Experience with the
[pci]passthrough_whitelist
has led us to conclude that config format is too
restrictive/awkward. The direction for Nova (as discussed in the Queens PTG in
Denver) will be toward some kind of YAML format; we’re going to be the pioneers
on this front.
It is the operator’s responsibility to ensure that the passthrough YAML config file has appropriate permissions, and lists only devices which do not themselves pose a security risk if attached to a malicious VM.
Users get acceleration for their workloads o/
For the update_provider_tree flow, we’re adding the step of loading and
parsing the passthrough YAML config file. This should be negligible compared to
e.g. retrieving the ManagedSystem
object (which we’re already doing, so no
impact there).
There’s no impact from the community side. It may take longer to create or destroy a LogicalPartition with attached IOSlots.
None.
None.
None.
See Proposed change.
os-traits 0.9.0 to pick up the normalize_name
method.
Testing this in the CI will be challenging, given that we are not likely to score GPUs for all of our nodes.
We will likely need to rely on manual testing and PowerVC to cover the code paths described under PowerVMDriver with a handful of various device configurations.
None.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.