Welcome to nova-powervm’s documentation!

This project provides a Nova-compatible compute driver for PowerVM systems.

The project aims to integrate into OpenStack’s Nova project. Initial development is occurring in a separate project until it has matured and met the Nova core team’s requirements. As such, all development practices should mirror those of the Nova project.

Documentation on Nova can be found at the Nova Devref.

Overview

PowerVM Nova Driver

The IBM PowerVM hypervisor provides virtualization on POWER hardware. PowerVM admins can see benefits in their environments by making use of OpenStack. This driver (along with a Neutron ML2 compatible agent and Ceilometer agent) provides the capability for operators of PowerVM to use OpenStack natively.

Problem Description

As ecosystems continue to evolve around the POWER platform, a single OpenStack driver does not meet all of the needs for the various hypervisors. The standard libvirt driver provides support for KVM on POWER systems. This nova driver provides PowerVM support to OpenStack environment.

This driver meets the following:

  • Built within the community
  • Fits the OpenStack model
  • Utilizes automated functional and unit tests
  • Enables use of PowerVM systems through the OpenStack APIs
  • Allows attachment of volumes from Cinder over supported protocols

This driver makes the following use cases available for PowerVM:

  • As a deployer, all of the standard lifecycle operations (start, stop, reboot, migrate, destroy, etc.) should be supported on a PowerVM based instance.
  • As a deployer, I should be able to capture an instance to an image.
  • VNC console to instances deployed.

Usage

To use the driver, install the nova-powervm project on your NovaLink-based PowerVM system. The nova-powervm project has a minimal set of configuration. See the configuration options section of the dev-ref for more information.

It is recommended that operators also make use of the networking-powervm project. The project ensures that the network bridge supports the VLAN-based networks required for the workloads.

There is also a ceilometer-powervm project that can be included.

Future work will be done to include PowerVM into the various OpenStack deployment models.

Overview of Architecture

The driver enables the following:

  • Provide deployments that work with the OpenStack model.
  • Driver is implemented using a new version of the PowerVM REST API.
  • Ephemeral disks are supported either with Virtual I/O Server (VIOS) hosted local disks or via Shared Storage Pools (a PowerVM cluster file system).
  • Volume support is provided via Cinder through supported protocols for the Hypervisor (virtual SCSI and N-Port ID Virtualization).
  • Live migration support is available when using Shared Storage Pools or boot from volume.
  • Network integration is supported via the ML2 compatible Neutron Agent. This is the openstack/networking-powervm project.
  • Automated Functional Testing is provided to validate changes from the broader OpenStack community against the PowerVM driver.
  • Thorough unit, syntax, and style testing is provided and enforced for the driver.

The intention is that this driver follows the OpenStack Nova model.

The driver is being promoted into the nova core project in stages, the first of which is represented by blueprint powervm-nova-compute-driver. The coexistence of these two incarnations of the driver raises some Upgrade Considerations.

Data Model Impact
  • The evacuate API is supported as part of the PowerVM driver. It optionally allows for the NVRAM data to be stored to a Swift database. However this does not impact the data model itself. It simply provides a location to optionally store the VM’s NVRAM metadata in the event of a rebuild, evacuate, shelve, migration or resize.
REST API Impact

No REST API impacts.

Security Impact

No known security impacts.

Notifications Impact

No new notifications. The driver does expect that the Neutron agent will return an event when the VIF plug has occurred, assuming that Neutron is the network service.

Other End User Impact

The administrator may notice new logging messages in the nova compute logs.

Performance Impact

The driver has a similar deployment speed and agility to other hypervisors. It has been tested with up to 10 concurrent deploys with several hundred VMs on a given server.

Most operations are comparable in speed. Deployment, attach/detach volumes, lifecycle, etc… are quick.

Due to the nature of the project, any performance impacts are limited to the Compute Driver. The API processes for instance are not impacted.

Other Deployer Impact

The cloud administrator will need to refer to documentation on how to configure OpenStack for use with a PowerVM hypervisor.

A ‘powervm’ configuration group is used to contain all the PowerVM specific configuration settings. Existing configuration file attributes will be reused as much as possible (e.g. vif_plugging_timeout). This reduces the number of PowerVM specific items that will be needed.

It is the goal of the project to only require minimal additional attributes. The deployer may specify additional attributes to fit their configuration.

Developer Impact

The code for this driver is currently contained within a powervm project. The driver is within the /nova/virt/powervm_ext/ package and extends the nova.virt.driver.ComputeDriver class.

The code interacts with PowerVM through the pypowervm library. This python binding is a wrapper to the PowerVM REST API. All hypervisor operations interact with the PowerVM REST API via this binding. The driver is maintained to support future revisions of the PowerVM REST API as needed.

For ephemeral disk support, either a Virtual I/O Server hosted local disk or a Shared Storage Pool (a PowerVM clustered file system) is supported. For volume attachments, the driver supports Cinder-based attachments via protocols supported by the hypervisor (e.g. Fibre Channel).

For networking, the networking-powervm project provides Neutron ML2 Agents. The agents provide the necessary configuration on the Virtual I/O Server for networking. The PowerVM Nova driver code creates the VIF for the client VM, but the Neutron agent creates the VIF for VLANs.

Automated functional testing is provided through a third party continuous integration system. It monitors for incoming Nova change sets, runs a set of functional tests (lifecycle operations) against the incoming change, and provides a non-gating vote (+1 or -1).

Developers should not be impacted by these changes unless they wish to try the driver.

Community Impact

The intent of this project is to bring another driver to OpenStack that aligns with the ideals and vision of the community. The intention is to promote this to core Nova.

Alternatives

No alternatives appear viable to bring PowerVM support into the OpenStack community.

Implementation

Assignee(s)
Primary assignees:
adreznec efried kyleh thorst
Other contributors:
multiple

Dependencies

  • Utilizes the PowerVM REST API specification for management. Will utilize future versions of this specification as it becomes available: http://ibm.co/1lThV9R
  • Builds on top of the pypowervm library. This is a prerequisite to utilizing the driver.

Upgrade Considerations

Prior to Ocata, only the out-of-tree nova_powervm driver existed. The in-tree driver is introduced in Ocata.

Namespaces

In Liberty and Mitaka, the namespace of the out-of-tree driver is nova_powervm.virt.powervm. In Newton, it was moved to nova.virt.powervm. In Ocata, the new in-tree driver occupies the nova.virt.powervm namespace, and the out-of-tree driver is moved to nova.virt.powervm_ext. Ocata consumers have the option of using the in-tree driver, which will provide limited functionality until it is fully integrated; or the out-of-tree driver, which provides full functionality. Refer to the documentation for the nova.conf settings required to load the desired driver.

Live Migrate Data Object

In order to use live migration prior to Ocata, it was necessary to run the customized nova_powervm conductor to bring in the PowerVMLiveMigrateData object. In Ocata, this object is included in core nova, so no custom conductor is necessary.

Testing

Tempest Tests

Since the tempest tests should be implementation agnostic, the existing tempest tests should be able to run against the PowerVM driver without issue.

Tempest tests that require function that the platform does not yet support (e.g. iSCSI or Floating IPs) will not pass. These should be ommitted from the Tempest test suite.

A sample Tempest test configuration for the PowerVM driver has been provided.

Thorough unit tests exist within the project to validate specific functions within this implementation.

Functional Tests

A third party functional test environment has been created. It monitors for incoming nova change sets. Once it detects a new change set, it will execute the existing lifecycle API tests. A non-gating vote (+1 or -1) will be provided with information provided (logs) based on the result.

API Tests

Existing APIs should be valid. All testing is planned within the functional testing system and via unit tests.

Documentation Impact

User Documentation

See the dev-ref for documentation on how to configure, contribute, use, etc. this driver implementation.

Developer Documentation

The existing Nova developer documentation should typically suffice. However, until merge into Nova, we will maintain a subset of dev-ref documentation.

References

Feature Support Matrix

Warning

Please note, while this document is still being maintained, this is slowly being updated to re-group and classify features

When considering which capabilities should be marked as mandatory the following general guiding principles were applied

  • Inclusivity - people have shown ability to make effective use of a wide range of virtualization technologies with broadly varying featuresets. Aiming to keep the requirements as inclusive as possible, avoids second-guessing what a user may wish to use the cloud compute service for.
  • Bootstrapping - a practical use case test is to consider that starting point for the compute deploy is an empty data center with new machines and network connectivity. The look at what are the minimum features required of a compute service, in order to get user instances running and processing work over the network.
  • Competition - an early leader in the cloud compute service space was Amazon EC2. A sanity check for whether a feature should be mandatory is to consider whether it was available in the first public release of EC2. This had quite a narrow featureset, but none the less found very high usage in many use cases. So it serves to illustrate that many features need not be considered mandatory in order to get useful work done.
  • Reality - there are many virt drivers currently shipped with Nova, each with their own supported feature set. Any feature which is missing in at least one virt driver that is already in-tree, must by inference be considered optional until all in-tree drivers support it. This does not rule out the possibility of a currently optional feature becoming mandatory at a later date, based on other principles above.

Summary

Feature Status PowerVM
Attach block volume to instance optional
Attach tagged block device to instance optional
Detach block volume from instance optional
Extend block volume attached to instance optional
Attach virtual network interface to instance optional
Attach tagged virtual network interface to instance optional
Detach virtual network interface from instance optional
Set the host in a maintenance mode optional
Evacuate instances from a host optional
Rebuild instance optional
Guest instance status mandatory
Guest host uptime optional
Guest host ip optional
Live migrate instance across hosts optional
Force live migration to complete optional
Launch instance mandatory
Stop instance CPUs (pause) optional
Reboot instance optional
Rescue instance optional
Resize instance optional
Restore instance optional
Set instance admin password optional
Save snapshot of instance disk optional
Suspend instance optional
Swap block volumes optional
Shutdown instance mandatory
Trigger crash dump optional
Resume instance CPUs (unpause) optional
uefi boot optional
Device tags optional
quiesce optional
unquiesce optional
Attach block volume to multiple instances optional

Details

  • Attach block volume to instance

    Status: optional.

    CLI commands:

    • nova volume-attach <server> <volume>

    Notes: The attach volume operation provides a means to hotplug additional block storage to a running instance. This allows storage capabilities to be expanded without interruption of service. In a cloud model it would be more typical to just spin up a new instance with large storage, so the ability to hotplug extra storage is for those cases where the instance is considered to be more of a pet than cattle. Therefore this operation is not considered to be mandatory to support.

    Driver Support:

    • PowerVM: complete

  • Attach tagged block device to instance

    Status: optional.

    CLI commands:

    • nova volume-attach <server> <volume> [--tag <tag>]

    Notes: Attach a block device with a tag to an existing server instance. See “Device tags” for more information.

    Driver Support:

    • PowerVM: missing

  • Detach block volume from instance

    Status: optional.

    CLI commands:

    • nova volume-detach <server> <volume>

    Notes: See notes for attach volume operation.

    Driver Support:

    • PowerVM: complete

  • Extend block volume attached to instance

    Status: optional.

    CLI commands:

    • cinder extend <volume> <new_size>

    Notes: The extend volume operation provides a means to extend the size of an attached volume. This allows volume size to be expanded without interruption of service. In a cloud model it would be more typical to just spin up a new instance with large storage, so the ability to extend the size of an attached volume is for those cases where the instance is considered to be more of a pet than cattle. Therefore this operation is not considered to be mandatory to support.

    Driver Support:

    • PowerVM: partial

      Notes: Not supported for rbd volumes.

  • Attach virtual network interface to instance

    Status: optional.

    CLI commands:

    • nova interface-attach <server>

    Notes: The attach interface operation provides a means to hotplug additional interfaces to a running instance. Hotplug support varies between guest OSes and some guests require a reboot for new interfaces to be detected. This operation allows interface capabilities to be expanded without interruption of service. In a cloud model it would be more typical to just spin up a new instance with more interfaces.

    Driver Support:

    • PowerVM: complete

  • Attach tagged virtual network interface to instance

    Status: optional.

    CLI commands:

    • nova interface-attach <server> [--tag <tag>]

    Notes: Attach a virtual network interface with a tag to an existing server instance. See “Device tags” for more information.

    Driver Support:

    • PowerVM: missing

  • Detach virtual network interface from instance

    Status: optional.

    CLI commands:

    • nova interface-detach <server> <port_id>

    Notes: See notes for attach-interface operation.

    Driver Support:

    • PowerVM: complete

  • Set the host in a maintenance mode

    Status: optional.

    CLI commands:

    • nova host-update <host>

    Notes: This operation allows a host to be placed into maintenance mode, automatically triggering migration of any running instances to an alternative host and preventing new instances from being launched. This is not considered to be a mandatory operation to support. The driver methods to implement are “host_maintenance_mode” and “set_host_enabled”.

    Driver Support:

    • PowerVM: complete

  • Evacuate instances from a host

    Status: optional.

    CLI commands:

    • nova evacuate <server>
    • nova host-evacuate <host>

    Notes: A possible failure scenario in a cloud environment is the outage of one of the compute nodes. In such a case the instances of the down host can be evacuated to another host. It is assumed that the old host is unlikely ever to be powered back on, otherwise the evacuation attempt will be rejected. When the instances get moved to the new host, their volumes get re-attached and the locally stored data is dropped. That happens in the same way as a rebuild. This is not considered to be a mandatory operation to support.

    Driver Support:

    • PowerVM: complete

  • Rebuild instance

    Status: optional.

    CLI commands:

    • nova rebuild <server> <image>

    Notes: A possible use case is additional attributes need to be set to the instance, nova will purge all existing data from the system and remakes the VM with given information such as ‘metadata’ and ‘personalities’. Though this is not considered to be a mandatory operation to support.

    Driver Support:

    • PowerVM: complete

  • Guest instance status

    Status: mandatory.

    Notes: Provides realtime information about the power state of the guest instance. Since the power state is used by the compute manager for tracking changes in guests, this operation is considered mandatory to support.

    Driver Support:

    • PowerVM: complete

  • Guest host uptime

    Status: optional.

    Notes: Returns the result of host uptime since power on, it’s used to report hypervisor status.

    Driver Support:

    • PowerVM: complete

  • Guest host ip

    Status: optional.

    Notes: Returns the ip of this host, it’s used when doing resize and migration.

    Driver Support:

    • PowerVM: complete

  • Live migrate instance across hosts

    Status: optional.

    CLI commands:

    • nova live-migration <server>
    • nova host-evacuate-live <host>

    Notes: Live migration provides a way to move an instance off one compute host, to another compute host. Administrators may use this to evacuate instances from a host that needs to undergo maintenance tasks, though of course this may not help if the host is already suffering a failure. In general instances are considered cattle rather than pets, so it is expected that an instance is liable to be killed if host maintenance is required. It is technically challenging for some hypervisors to provide support for the live migration operation, particularly those built on the container based virtualization. Therefore this operation is not considered mandatory to support.

    Driver Support:

    • PowerVM: complete

  • Force live migration to complete

    Status: optional.

    CLI commands:

    • nova live-migration-force-complete <server> <migration>

    Notes: Live migration provides a way to move a running instance to another compute host. But it can sometimes fail to complete if an instance has a high rate of memory or disk page access. This operation provides the user with an option to assist the progress of the live migration. The mechanism used to complete the live migration depends on the underlying virtualization subsystem capabilities. If libvirt/qemu is used and the post-copy feature is available and enabled then the force complete operation will cause a switch to post-copy mode. Otherwise the instance will be suspended until the migration is completed or aborted.

    Driver Support:

    • PowerVM: missing

  • Launch instance

    Status: mandatory.

    Notes: Importing pre-existing running virtual machines on a host is considered out of scope of the cloud paradigm. Therefore this operation is mandatory to support in drivers.

    Driver Support:

    • PowerVM: complete

  • Stop instance CPUs (pause)

    Status: optional.

    CLI commands:

    • nova pause <server>

    Notes: Stopping an instances CPUs can be thought of as roughly equivalent to suspend-to-RAM. The instance is still present in memory, but execution has stopped. The problem, however, is that there is no mechanism to inform the guest OS that this takes place, so upon unpausing, its clocks will no longer report correct time. For this reason hypervisor vendors generally discourage use of this feature and some do not even implement it. Therefore this operation is considered optional to support in drivers.

    Driver Support:

    • PowerVM: missing

  • Reboot instance

    Status: optional.

    CLI commands:

    • nova reboot <server>

    Notes: It is reasonable for a guest OS administrator to trigger a graceful reboot from inside the instance. A host initiated graceful reboot requires guest co-operation and a non-graceful reboot can be achieved by a combination of stop+start. Therefore this operation is considered optional.

    Driver Support:

    • PowerVM: complete

  • Rescue instance

    Status: optional.

    CLI commands:

    • nova rescue <server>

    Notes: The rescue operation starts an instance in a special configuration whereby it is booted from an special root disk image. The goal is to allow an administrator to recover the state of a broken virtual machine. In general the cloud model considers instances to be cattle, so if an instance breaks the general expectation is that it be thrown away and a new instance created. Therefore this operation is considered optional to support in drivers.

    Driver Support:

    • PowerVM: complete

  • Resize instance

    Status: optional.

    CLI commands:

    • nova resize <server> <flavor>

    Notes: The resize operation allows the user to change a running instance to match the size of a different flavor from the one it was initially launched with. There are many different flavor attributes that potentially need to be updated. In general it is technically challenging for a hypervisor to support the alteration of all relevant config settings for a running instance. Therefore this operation is considered optional to support in drivers.

    Driver Support:

    • PowerVM: complete

  • Restore instance

    Status: optional.

    CLI commands:

    • nova resume <server>

    Notes: See notes for the suspend operation

    Driver Support:

    • PowerVM: missing

  • Set instance admin password

    Status: optional.

    CLI commands:

    • nova set-password <server>

    Notes: Provides a mechanism to (re)set the password of the administrator account inside the instance operating system. This requires that the hypervisor has a way to communicate with the running guest operating system. Given the wide range of operating systems in existence it is unreasonable to expect this to be practical in the general case. The configdrive and metadata service both provide a mechanism for setting the administrator password at initial boot time. In the case where this operation were not available, the administrator would simply have to login to the guest and change the password in the normal manner, so this is just a convenient optimization. Therefore this operation is not considered mandatory for drivers to support.

    Driver Support:

    • PowerVM: missing

  • Save snapshot of instance disk

    Status: optional.

    CLI commands:

    • nova image-create <server> <name>

    Notes: The snapshot operation allows the current state of the instance root disk to be saved and uploaded back into the glance image repository. The instance can later be booted again using this saved image. This is in effect making the ephemeral instance root disk into a semi-persistent storage, in so much as it is preserved even though the guest is no longer running. In general though, the expectation is that the root disks are ephemeral so the ability to take a snapshot cannot be assumed. Therefore this operation is not considered mandatory to support.

    Driver Support:

    • PowerVM: complete

  • Suspend instance

    Status: optional.

    CLI commands:

    • nova suspend <server>

    Notes: Suspending an instance can be thought of as roughly equivalent to suspend-to-disk. The instance no longer consumes any RAM or CPUs, with its live running state having been preserved in a file on disk. It can later be restored, at which point it should continue execution where it left off. As with stopping instance CPUs, it suffers from the fact that the guest OS will typically be left with a clock that is no longer telling correct time. For container based virtualization solutions, this operation is particularly technically challenging to implement and is an area of active research. This operation tends to make more sense when thinking of instances as pets, rather than cattle, since with cattle it would be simpler to just terminate the instance instead of suspending. Therefore this operation is considered optional to support.

    Driver Support:

    • PowerVM: missing

  • Swap block volumes

    Status: optional.

    CLI commands:

    • nova volume-update <server> <attachment> <volume>

    Notes: The swap volume operation is a mechanism for changing a running instance so that its attached volume(s) are backed by different storage in the host. An alternative to this would be to simply terminate the existing instance and spawn a new instance with the new storage. In other words this operation is primarily targeted towards the pet use case rather than cattle, however, it is required for volume migration to work in the volume service. This is considered optional to support.

    Driver Support:

    • PowerVM: missing

  • Shutdown instance

    Status: mandatory.

    CLI commands:

    • nova delete <server>

    Notes: The ability to terminate a virtual machine is required in order for a cloud user to stop utilizing resources and thus avoid indefinitely ongoing billing. Therefore this operation is mandatory to support in drivers.

    Driver Support:

    • PowerVM: complete

  • Trigger crash dump

    Status: optional.

    CLI commands:

    • nova trigger-crash-dump <server>

    Notes: The trigger crash dump operation is a mechanism for triggering a crash dump in an instance. The feature is typically implemented by injecting an NMI (Non-maskable Interrupt) into the instance. It provides a means to dump the production memory image as a dump file which is useful for users. Therefore this operation is considered optional to support.

    Driver Support:

    • PowerVM: missing

  • Resume instance CPUs (unpause)

    Status: optional.

    CLI commands:

    • nova unpause <server>

    Notes: See notes for the “Stop instance CPUs” operation

    Driver Support:

    • PowerVM: missing

  • uefi boot

    Status: optional.

    Notes: This allows users to boot a guest with uefi firmware.

    Driver Support:

    • PowerVM: missing

  • Device tags

    Status: optional.

    CLI commands:

    • nova boot

    Notes: This allows users to set tags on virtual devices when creating a server instance. Device tags are used to identify virtual device metadata, as exposed in the metadata API and on the config drive. For example, a network interface tagged with “nic1” will appear in the metadata along with its bus (ex: PCI), bus address (ex: 0000:00:02.0), MAC address, and tag (nic1). If multiple networks are defined, the order in which they appear in the guest operating system will not necessarily reflect the order in which they are given in the server boot request. Guests should therefore not depend on device order to deduce any information about their network devices. Instead, device role tags should be used. Device tags can be applied to virtual network interfaces and block devices.

    Driver Support:

    • PowerVM: missing

  • quiesce

    Status: optional.

    Notes: Quiesce the specified instance to prepare for snapshots. For libvirt, guest filesystems will be frozen through qemu agent.

    Driver Support:

    • PowerVM: missing

  • unquiesce

    Status: optional.

    Notes: See notes for the quiesce operation

    Driver Support:

    • PowerVM: missing

  • Attach block volume to multiple instances

    Status: optional.

    CLI commands:

    • nova volume-attach <server> <volume>

    Notes: The multiattach volume operation is an extension to the attach volume operation. It allows to attach a single volume to multiple instances. This operation is not considered to be mandatory to support. Note that for the libvirt driver, this is only supported if qemu<2.10 or libvirt>=3.10.

    Driver Support:

    • PowerVM: missing

Notes:

  • This document is a continuous work in progress

Policies

Nova-PowerVM Policies

In the Policies Guide, you will find documented policies for developing with Nova-PowerVM. This includes the processes we use for blueprints and specs, bugs, contributor onboarding, and other procedural items.

Policies

Nova-PowerVM Bugs

Nova-PowerVM maintains all of its bugs in Launchpad. All of the current open Nova-PowerVM bugs can be found in that link.

Bug Triage Process

The process of bug triaging consists of the following steps:

  1. Check if a bug was filed for a correct component (project). If not, either change the project or mark it as “Invalid”.
  2. Add appropriate tags. Even if the bug is not valid or is a duplicate of another one, it still may help bug submitters and corresponding sub-teams.
  3. Check if a similar bug was filed before. If so, mark it as a duplicate of the previous bug.
  4. Check if the bug description is consistent, e.g. it has enough information for developers to reproduce it. If it’s not consistent, ask submitter to provide more info and mark a bug as “Incomplete”.
  5. Depending on ease of reproduction (or if the issue can be spotted in the code), mark it as “Confirmed”.
  6. Assign the importance. Bugs that obviously break core and widely used functionality should get assigned as “High” or “Critical” importance. The same applies to bugs that were filed for gate failures.
  7. (Optional). Add comments explaining the issue and possible strategy of fixing/working around the bug.
Contributing to Nova-PowerVM

If you would like to contribute to the development of OpenStack, you must follow the steps in the “If you’re a developer” section of this page:

Once those steps have been completed, changes to OpenStack should be submitted for review via the Gerrit tool, following the workflow documented at:

Pull requests submitted through GitHub will be ignored.

Bugs should be filed on Launchpad, not GitHub:

Code Reviews

Code reviews are a critical component of all OpenStack projects. Code reviews provide a way to enforce a level of consistency across the project, and also allow for the careful onboarding of contributions from new contributors.

Code Review Practices

Nova-PowerVM follows the code review guidelines as set forth for all OpenStack projects. It is expected that all reviewers are following the guidelines set forth on that page.

Indices and tables

Devref

Developer Guide

In the Developer Guide, you will find information on how to develop for Nova-PowerVM and how it interacts with Nova compute. You will also find information on setup and usage of Nova-PowerVM

Internals and Programming

Source Code Structure

Since nova-powervm strives to be integrated into the upstream Nova project, the source code structure matches a standard driver.

nova_powervm/
  virt/
    powervm/
      disk/
      tasks/
      volume/
      ...
  tests/
    virt/
      powervm/
        disk/
        tasks/
        volume/
        ...
nova_powervm/virt/powervm

The main directory for the overall driver. Provides the driver implementation, image support, and some high level classes to interact with the PowerVM system (ex. host, vios, vm, etc…)

The driver attempts to utilize TaskFlow for major actions such as spawn. This allows the driver to create atomic elements (within the tasks) to drive operations against the system (with revert capabilities).

nova_powervm/virt/powervm/disk

The disk folder contains the various ‘nova ephemeral’ disk implementations. These are basic images that do not involve Cinder.

Two disk implementations exist currently.

  • localdisk - supports Virtual I/O Server Volume Groups. This configuration uses any Volume Group on the system, allowing operators to make use of the physical disks local to their system. Images will be cached on the same volume group as the VMs. The cached images will be periodically cleaned up by the Nova imagecache manager, at a rate determined by the nova.conf setting: image_cache_manager_interval. Also supports file-backed ephemeral storage, which is specified by using the QCOW VG - default volume group. Note: Resizing instances with file-backed ephemeral is not currently supported.
  • Shared Storage Pool - utilizes PowerVM’s distributed storage. As such this implementation allows operators to make use of live migration capabilities.

The standard interface between these two implementations is defined in the driver.py. This ensures that the nova-powervm compute driver does not need to know the specifics about which disk implementation it is using.

nova_powervm/virt/powervm/tasks

The task folder contains TaskFlow classes. These implementations simply wrap around other methods, providing logical units that the compute driver can use when building a string of actions.

For instance, spawning an instance may require several atomic tasks:
  • Create VM
  • Plug Networking
  • Create Disk from Glance
  • Attach Disk to VM
  • Power On

The tasks in this directory encapsulate this. If anything fails, they have corresponding reverts. The logic to perform these operations is contained elsewhere; these are simple wrappers that enable embedding into Taskflow.

nova_powervm/virt/powervm/volume

The volume folder contains the Cinder volume connectors. A volume connector is the code that connects a Cinder volume (which is visible to the host) to the Virtual Machine.

The PowerVM Compute Driver has an interface for the volume connectors defined in this folder’s driver.py.

The PowerVM Compute Driver provides two implementations for Fibre Channel attached disks.

  • Virtual SCSI (vSCSI): The disk is presented to a Virtual I/O Server and the data is passed through to the VM through a virtualized SCSI connection.
  • N-Port ID Virtualization (NPIV): The disk is presented directly to the VM. The VM will have virtual Fibre Channel connections to the disk, and the Virtual I/O Server will not have the disk visible to it.
Setting Up a Development Environment

This page describes how to setup a working Python development environment that can be used in developing Nova-PowerVM.

These instructions assume you’re already familiar with Git and Gerrit, which is a code repository mirror and code review toolset, however if you aren’t please see this Git tutorial for an introduction to using Git and this guide for a tutorial on using Gerrit and Git for code contribution to OpenStack projects.

Getting the code

Grab the code:

git clone https://git.openstack.org/openstack/nova-powervm
cd nova-powervm
Setting up your environment

The purpose of this project is to provide the ‘glue’ between OpenStack Compute (Nova) and PowerVM. The pypowervm project is used to control PowerVM systems.

It is recommended that you clone down the OpenStack Nova project along with pypowervm into your respective development environment.

Running the tox python targets for tests will automatically clone these down via the requirements.

Additional project requirements may be found in the requirements.txt file.

Usage

To make use of the PowerVM drivers, a PowerVM system set up with NovaLink is required. The nova-powervm driver should be installed on the management VM.

Note: Installing the NovaLink software creates the pvm_admin group. In order to function properly, the user executing the Nova compute service must be a member of this group. Use the usermod command to add the user. For example, to add the user stacker to the pvm_admin group, execute:

sudo usermod -a -G pvm_admin stacker

The user must re-login for the change to take effect.

The NovaLink architecture is such that the compute driver runs directly on the PowerVM system. No external management element (e.g. Hardware Management Console or PowerVC) is needed. Management of the virtualization is driven through a thin virtual machine running on the PowerVM system.

Configuration of the PowerVM system and NovaLink is required ahead of time. If the operator is using volumes or Shared Storage Pools, they are required to be configured ahead of time.

Configuration File Options

After nova-powervm has been installed the user must enable PowerVM as the compute driver. To do so, set the compute_driver value in the nova.conf file to compute_driver = powervm_ext.driver.PowerVMDriver.

The standard nova configuration options are supported. In particular, to use PowerVM SR-IOV vNIC for networking, the pci_passthrough_whitelist option must be set. See the networking-powervm usage devref for details.

Additionally, a [powervm] section is used to provide additional customization to the driver.

By default, no additional inputs are needed. The base configuration allows for a Nova driver to support ephemeral disks to a local volume group (only one can be on the system in the default config). Connecting Fibre Channel hosted disks via Cinder will use the Virtual SCSI connections through the Virtual I/O Servers.

Operators may change the disk driver (nova based disks - NOT Cinder) via the disk_driver property.

All of these values are under the [powervm] section. The tables are broken out into logical sections.

To generate a sample config file for [powervm] run:

oslo-config-generator --namespace nova_powervm > nova_powervm_sample.conf

The [powervm] section of the sample can then be edited and pasted into the full nova.conf file.

VM Processor Options
Configuration option = Default Value Description
proc_units_factor = 0.1 (FloatOpt) Factor used to calculate the processor units per vcpu. Valid values are: 0.05 - 1.0
uncapped_proc_weight = 64 (IntOpt) The processor weight to assign to newly created VMs. Value should be between 1 and 255. Represents the relative share of the uncapped processor cycles the Virtual Machine will receive when unused processor cycles are available.
Disk Options
Configuration option = Default Value Description
disk_driver = localdisk

(StrOpt) The disk driver to use for PowerVM disks. Valid options are: localdisk, ssp

If localdisk is specified and only one non-rootvg Volume Group exists on one of the Virtual I/O Servers, then no further config is needed. If multiple volume groups exist, then further specification can be done via the volume_group_name option.

Live migration is not supported with a localdisk config.

If ssp is specified, then a Shared Storage Pool will be used. If only one SSP exists on the system, no further configuration is needed. If multiple SSPs exist, then the cluster_name property must be specified. Live migration can be done within a SSP cluster.

cluster_name = None (StrOpt) Cluster hosting the Shared Storage Pool to use for storage operations. If none specified, the host is queried; if a single Cluster is found, it is used. Not used unless disk_driver option is set to ssp.
volume_group_name = None (StrOpt) Volume Group to use for block device operations. Must not be rootvg. If disk_driver is localdisk, and more than one non-rootvg volume group exists across the Virtual I/O Servers, then this attribute must be specified.
Volume Options
Configuration option = Default Value Description
fc_attach_strategy = vscsi

(StrOpt) The Fibre Channel Volume Strategy defines how FC Cinder volumes should be attached to the Virtual Machine. The options are: npiv or vscsi.

It should be noted that if NPIV is chosen, the WWPNs will not be active on the backing fabric during the deploy. Some Cinder drivers will operate without issue. Others may query the fabric and thus will fail attachment. It is advised that if an issue occurs using NPIV, the operator fall back to vscsi based deploys.

vscsi_vios_connections_required = 1 (IntOpt) Indicates a minimum number of Virtual I/O Servers that are required to support a Cinder volume attach with the vSCSI volume connector.
ports_per_fabric = 1

(IntOpt) (NPIV only) The number of physical ports that should be connected directly to the Virtual Machine, per fabric.

Example: 2 fabrics and ports_per_fabric set to 2 will result in 4 NPIV ports being created, two per fabric. If multiple Virtual I/O Servers are available, will attempt to span ports across I/O Servers.

fabrics = A

(StrOpt) (NPIV only) Unique identifier for each physical FC fabric that is available. This is a comma separated list. If there are two fabrics for multi-pathing, then this could be set to A,B.

The fabric identifiers are used for the ‘fabric_<identifier>_port_wwpns’ key.

fabric_<name>_port_wwpns (StrOpt) (NPIV only) A comma delimited list of all the physical FC port WWPNs that support the specified fabric. Is tied to the NPIV ‘fabrics’ key.
Config Drive Options
Configuration option = Default Value Description
vopt_media_volume_group = root_vg (StrOpt) The volume group on the system that should be used to store the config drive metadata that will be attached to the VMs.
vopt_media_rep_size = 1 (IntOpt) The size of the media repository (in GB) for the metadata for config drive. Only used if the media repository needs to be created.
image_meta_local_path = /tmp/cfgdrv/ (StrOpt) The location where the config drive ISO files should be built.
LPAR Detailed Settings

Fine grained control over LPAR settings can be achieved by setting PowerVM specific properties (extra-specs) on the flavors being used to instantiate a VM. For the complete list of PowerVM properties see IBM PowerVC documentation.

For example, to create a VM with one VCPU and 0.7 entitlement (0.7 of the physical CPU resource), a user could use a flavor created as follows:

openstack flavor create --vcpus 1 --ram 6144 --property \
  powervm:proc_units=0.7 pvm-6-1-0.7

In the example above powervm:proc_units property was used to specify CPU entitlement for the VM.

Remarks For IBM i Users

By default all VMs are created as AIX/Linux type LPARs. In order to create IBM i VM (LPAR type OS400) user must add os_distro property of value ibmi to the Glance image being used to create the instance. For example, to add the property to sample image i5OSR730, execute:

openstack image set --property os_distro=ibmi i5OSR730

Testing

Running Nova-PowerVM Tests

This page describes how to run the Nova-PowerVM tests. This page assumes you have already set up an working Python environment for Nova-PowerVM development.

With tox

Nova-PowerVM, like other OpenStack projects, uses tox for managing the virtual environments for running test cases. It uses Testr for managing the running of the test cases.

Tox handles the creation of a series of virtualenvs that target specific versions of Python.

Testr handles the parallel execution of series of test cases as well as the tracking of long-running tests and other things.

For more information on the standard tox-based test infrastructure used by OpenStack and how to do some common test/debugging procedures with Testr, see this wiki page:

PEP8 and Unit Tests

Running pep8 and unit tests is as easy as executing this in the root directory of the Nova-PowerVM source code:

tox

To run only pep8:

tox -e pep8

To restrict the pylint check to only the files altered by the latest patch changes:

tox -e pep8 HEAD~1

To run only the unit tests:

tox -e py27,py34

Indices and tables

Specifications

Example Spec - The title of your blueprint

Include the URL of your launchpad blueprint:

https://blueprints.launchpad.net/nova-powervm/+spec/example

Introduction paragraph – why are we doing anything? A single paragraph of prose that operators can understand. The title and this first paragraph should be used as the subject line and body of the commit message respectively.

Some notes about the nova-powervm spec and blueprint process:

  • Not all blueprints need a spec. For more information see https://docs.openstack.org/nova/latest/contributor/blueprints.html#specs
  • The aim of this document is first to define the problem we need to solve, and second agree the overall approach to solve that problem.
  • This is not intended to be extensive documentation for a new feature. For example, there is no need to specify the exact configuration changes, nor the exact details of any DB model changes. But you should still define that such changes are required, and be clear on how that will affect upgrades.
  • You should aim to get your spec approved before writing your code. While you are free to write prototypes and code before getting your spec approved, its possible that the outcome of the spec review process leads you towards a fundamentally different solution than you first envisaged.
  • But, API changes are held to a much higher level of scrutiny. As soon as an API change merges, we must assume it could be in production somewhere, and as such, we then need to support that API change forever. To avoid getting that wrong, we do want lots of details about API changes upfront.

Some notes about using this template:

  • Your spec should be in ReSTructured text, like this template.

  • Please wrap text at 79 columns.

  • The filename in the git repository should match the launchpad URL, for example: https://blueprints.launchpad.net/nova-powervm/+spec/awesome-thing should be named awesome-thing.rst

  • Please do not delete any of the sections in this template. If you have nothing to say for a whole section, just write: None

  • For help with syntax, see http://sphinx-doc.org/rest.html

  • To test out your formatting, build the docs using tox and see the generated HTML file in doc/build/html/specs/<path_of_your_file>

  • If you would like to provide a diagram with your spec, ascii diagrams are required. http://asciiflow.com/ is a very nice tool to assist with making ascii diagrams. The reason for this is that the tool used to review specs is based purely on plain text. Plain text will allow review to proceed without having to look at additional files which can not be viewed in gerrit. It will also allow inline feedback on the diagram itself.

  • If your specification proposes any changes to the Nova REST API such as changing parameters which can be returned or accepted, or even the semantics of what happens when a client calls into the API, then you should add the APIImpact flag to the commit message. Specifications with the APIImpact flag can be found with the following query:

    https://review.openstack.org/#/q/status:open+project:openstack/nova-powervm+message:apiimpact,n,z

Problem description

A detailed description of the problem. What problem is this blueprint addressing?

Use Cases

What use cases does this address? What impact on actors does this change have? Ensure you are clear about the actors in each use case: Developer, End User, Deployer etc.

Proposed change

Here is where you cover the change you propose to make in detail. How do you propose to solve this problem?

If this is one part of a larger effort make it clear where this piece ends. In other words, what’s the scope of this effort?

At this point, if you would like to just get feedback on if the problem and proposed change fit in nova-powervm, you can stop here and post this for review to get preliminary feedback. If so please say: Posting to get preliminary feedback on the scope of this spec.

Alternatives

What other ways could we do this thing? Why aren’t we using those? This doesn’t have to be a full literature review, but it should demonstrate that thought has been put into why the proposed solution is an appropriate one.

Security impact

Describe any potential security impact on the system. Some of the items to consider include:

  • Does this change touch sensitive data such as tokens, keys, or user data?
  • Does this change alter the API in a way that may impact security, such as a new way to access sensitive information or a new way to login?
  • Does this change involve cryptography or hashing?
  • Does this change require the use of sudo or any elevated privileges?
  • Does this change involve using or parsing user-provided data? This could be directly at the API level or indirectly such as changes to a cache layer.
  • Can this change enable a resource exhaustion attack, such as allowing a single API interaction to consume significant server resources? Some examples of this include launching subprocesses for each connection, or entity expansion attacks in XML.

For more detailed guidance, please see the OpenStack Security Guidelines as a reference (https://wiki.openstack.org/wiki/Security/Guidelines). These guidelines are a work in progress and are designed to help you identify security best practices. For further information, feel free to reach out to the OpenStack Security Group at openstack-security@lists.openstack.org.

End user impact

How would the end user be impacted by this change? The “End User” is defined as the users of the deployed cloud.

Performance Impact

Describe any potential performance impact on the system, for example how often will new code be called, and is there a major change to the calling pattern of existing code.

Examples of things to consider here include:

  • A small change in a utility function or a commonly used decorator can have a large impacts on performance.
  • Calls which result in a database queries (whether direct or via conductor) can have a profound impact on performance when called in critical sections of the code.
  • Will the change include any locking, and if so what considerations are there on holding the lock?
Deployer impact

Discuss things that will affect how you deploy and configure OpenStack that have not already been mentioned, such as:

  • What config options are being added? Are the default values ones which will work well in real deployments?
  • Is this a change that takes immediate effect after its merged, or is it something that has to be explicitly enabled?
  • If this change is a new binary, how would it be deployed?
  • Please state anything that those doing continuous deployment, or those upgrading from the previous release, need to be aware of. Also describe any plans to deprecate configuration values or features.
Developer impact

Discuss things that will affect other developers working on the driver or OpenStack in general.

Upgrade impact

Describe any potential upgrade impact on the system, such as:

  • If this change adds a new feature to the compute host that the controller services rely on, the controller services may need to check the minimum compute service version in the deployment before using the new feature. For example, in Ocata, the FilterScheduler did not use the Placement API until all compute services were upgraded to at least Ocata.
  • Nova supports N-1 version nova-compute services for rolling upgrades. Does the proposed change need to consider older code running that may impact how the new change functions, for example, by changing or overwriting global state in the database? This is generally most problematic when making changes that involve multiple compute hosts, like move operations such as migrate, resize, unshelve and evacuate.

Implementation

Assignee(s)

Who is leading the writing of the code? Or is this a blueprint where you’re throwing it out there to see who picks it up?

If more than one person is working on the implementation, please designate the primary author and contact.

Primary assignee:
<launchpad-id or None>
Other contributors:
<launchpad-id or None>
Work Items

Work items or tasks – break the feature up into the things that need to be done to implement it. Those parts might end up being done by different people, but we’re mostly trying to understand the timeline for implementation.

Dependencies

  • Include specific references to specs and/or blueprints in nova-powervm, or in other projects, that this one either depends on or is related to. For example, a dependency on pypowervm changes should be documented here.
  • If this requires functionality of another project that is not currently used by nova-powervm document that fact.
  • Does this feature require any new library dependencies or code otherwise not included in OpenStack? Or does it depend on a specific version of library?

Testing

Please discuss the important scenarios needed to test here, as well as specific edge cases we should be ensuring work correctly. For each scenario please specify if this requires specialized hardware, a full openstack environment, or can be simulated inside the nova-powervm tree.

Please discuss how the change will be tested. We especially want to know what tempest tests will be added. It is assumed that unit test coverage will be added so that doesn’t need to be mentioned explicitly, but discussion of why you think unit tests are sufficient and we don’t need to add more tempest tests would need to be included.

Is this untestable in gate given current limitations (specific hardware / software configurations available)? If so, are there mitigation plans (3rd party testing, gate enhancements, etc).

Documentation Impact

Which audiences are affected most by this change, and which documentation titles on nova-powervm.readthedocs.io should be updated because of this change? Don’t repeat details discussed above, but reference them here in the context of documentation for multiple audiences. For example, the Operations Guide targets cloud operators, and the End User Guide would need to be updated if the change offers a new feature available through the CLI or dashboard. If a config option changes or is deprecated, note here that the documentation needs to be updated to reflect this specification’s change.

References

Please add any useful references here. You are not required to have any reference. Moreover, this specification should still make sense when your references are unavailable. Examples of what you could include are:

  • Links to mailing list or IRC discussions
  • Links to notes from a summit session
  • Links to relevant research, if appropriate
  • Related specifications as appropriate (e.g. if it’s an EC2 thing, link the EC2 docs)
  • Anything else you feel it is worthwhile to refer to

History

Optional section intended to be used each time the spec is updated to describe new design, API or any database schema updated. Useful to let reader understand what’s happened along the time.

Revisions
Release Name Description
Rocky Introduced

Nova-PowerVM Specifications

Contents:

Newton Specifications

Linux Bridge and OVS VIF Support

Launchpad BluePrint

Currently the PowerVM driver requires a PowerVM specific Neutron agent. This blueprint will add support for additional agent types - specifically the Open vSwitch and Linux Bridge agents provided by Neutron.

Problem description

PowerVM has support for virtualizing an Ethernet port using the Virtual I/O Server and Shared Ethernet. This is provided using networking-powervm Shared Ethernet Agent. This agent provides key PowerVM use cases such as I/O redundancy.

There are a subset of operators that have asked for VIF support in line with other hypervisors. This would be support for the Neutron Linux Bridge Agent and Open vSwitch agent. While these agents do not provide use cases such as I/O redundancy, they do enable operators to utilize common upstream networking solutions when deploying PowerVM with OpenStack

Use Cases

An operator should be able to deploy an environment using Linux Bridge or Open vSwitch Neutron agents. In order to do this, the physical I/O must be assigned to the NovaLink partition on the PowerVM system (the partition with virtualization admin authority).

A user should be able to do the standard VIF use cases with either of these agents:

  • Add NIC
  • Remove NIC
  • Security Groups
  • Multiple Network Types (Flat, VLAN, vxlan)
  • Bandwidth limiting

The existing Neutron agents should be used without any changes from PowerVM. All of the changes that should occur will be in nova-powervm. Any limitations of the agents themselves will be limitations to the PowerVM implementation.

There is one exception to the use case support. The Open vSwitch support will enable live migration. There is no plan for Linux Bridges live migration support.

Proposed change
  • Create a parent VIF driver for NovaLink based I/O. This will hold the code that is common between the Linux Bridge VIFs and OVS VIFs. There will be common code due to both needing to run on the NovaLink management VM.
  • The VIF drivers should create a Trunk VEA on the NovaLink partition for each VIF. It will be given a unique channel of communication to the VM. The device will be named according to the Neutron device name.
  • The OVS VIF driver will use the nova linux_net code to set the metadata on the trunk adapter.
  • Live migration will suspend the VIF on the target host until it has been treated. Treating means ensuring that the communication to the VM is on a unique channel (its own VLAN on a vSwitch).
  • A private PowerVM virtual switch named ‘NovaLinkVEABridge’ will be created to support the private communication between the trunk adapters and the VMs.
  • Live migration on the source will need to clean up the remaining trunk adapter for Open vSwitch that is left around on the management VM.

It should be noted that Hybrid VIF plugging will not be supported. Instead, PowerVM will use the conntrack integration in Ubuntu 16.04/OVS 2.5 to support the OVSFirewallDriver. As of OVS 2.5, that allows the firewall function without needing Hybrid VIF Plugging.

Alternatives

None.

Security impact

None.

End user impact

None.

Performance Impact

Performance will not be impacted for the deployment of VMs. However, the end user performance may change as it is a new networking technology. Both the Linux Bridge and Open vSwitch support should operate with similar performance characteristics as other platforms that support these technologies.

Deployer impact

The deployer will need to do the following:

  • Attach an Ethernet I/O Card to the NovaLink partition. Configure the ports in accordance with the Open vSwitch or Linux Bridge Neutron Agent’s requirements.
  • Run the agent on their NovaLink management VM.

No major changes are anticipated outside of this. The Shared Ethernet Adapter Neutron agent will not work in conjunction with this on the same system.

Developer impact

None

Implementation
Assignee(s)
Primary assignee:
thorst
Other contributors:
kriskend tjakobs
Work Items

See Proposed Change

Dependencies
  • NovaLink core changes will be needed with regard to the live migration flows. This requires NovaLink 1.0.0.3 or later.
Testing

Testing will be done on live systems. Future work will be done to integrate into the PowerVM Third-Party CI, however this will not be done initially as the LB and OVS agents are heavily tested. The SEA Agent continues to need to be tested.

Documentation Impact

Deployer documentation will be built around how to configure this.

Nova support for SR-IOV VIF Types

https://blueprints.launchpad.net/nova-powervm/+spec/powervm-sriov-nova

This blueprint will address support of SR-IOV in conjunction with SR-IOV VF attached to VM via PowerVM vNIC into nova-powervm. SR-IOV support was added to Juno release of OpenStack, this blueprint will fit this scenario implementation into it.

A separate blueprint for networking-powervm has been made available for design elements regarding networking-powervm.

These blueprints will be implemented during Newton cycle of OpenStack development. Referring to Newton schedule, development should be completed during newton-3.

Refer to glossary section for explanation of terms.

Problem Description

OpenStack PowerVM drivers currently support networking aspect of PowerVM virtualization using Shared Ethernet Adapter, Open vSwitch and Linux Bridge. There is a need for supporting SR-IOV ports with redundancy/failover and migration. It is possible to associate SR-IOV VF to a VM directly, but this path will not be supported by this design. Such a setup will not provide migration support anyway. Support for this configuration will be added in future. This path also does not utilize advantages of hardware level virtualization offered by SR-IOV architecture.

Users should be able to manage a VM with SR-IOV vNIC as a network interface. This management should include migration of VM with SR-IOV vNIC attached to it.

PowerVM has a feature called vNIC which can is tied in with SR-IOV. By using vNIC the following use cases are supported: - Fail over I/O to a different I/O Server and physical function - Live Migration with SR-IOV, without significant intervention The vNIC is exposed to the VM, and the mac address of the client vNIC will match the neutron port.

In summary, this blueprint will solve support of SR-IOV in nova-powervm for these scenarios:

  1. Ability to attach/detach a SR-IOV VF to a VM as a network interface using vNIC intermediary during and after deployment, including migration.
  2. Ability to provide redundancy/failover support across VFs from Physical Ports within or across SR-IOV cards using vNIC intermediary.
  3. Ability to associate a VLAN with vNIC backed by SR-IOV VF.

Ability to associate a SR-IOV VF directly to a VM will be done in future.

Refer to separate blueprint for networking-powervm for changes in networking-powervm component. This blueprint will focus on changes to nova-powervm only.

Use Cases
  1. Attach vNIC backed by SR-IOV VF(s) to a VM during boot time
  2. Attach vNIC backed by SR-IOV VF(s) to a VM after it is deployed
  3. Detach vNIC backed by SR-IOV VF(s) from a VM
  4. When a VM with vNIC backed by SR-IOV is deleted, perform detach and cleanup
  5. Live migrate a VM if using vNIC backed SR-IOV support
  6. Provide redundancy/failover support of vNIC backed by SR-IOV VF attached to a VM during both deploy and post deploy scenarios.
Proposed changes

The changes will be made in two areas:

1. Compute virt driver. PowerVM compute driver is in nova.virt.powervm.driver.PowerVMDriver and it will be enhanced for SR-IOV vNIC support. A dictionary is maintained in virt driver vif code to map between vif type and vif driver class. Based on vif type of vif object that needs to be plugged, appropriate vif driver will be invoked. This dictionary will be modified to include a new vif driver class and its vif type (pvm_sriov).

The PCI Claims process expects to be able to “claim” a VF from the pci_passthrough_devices list each time a vNIC is plugged, and return it to the pool on unplug. Thus the get_available_resource API will be enhanced to populate this device list with a suitable number of VFs.

2. VIF driver. PowerVM VIF driver is in nova_powervm.virt.powervm.vif.PvmVifDriver. A VIF driver to attach network interface via vNIC (PvmVnicSriovVifDriver) and plug/ unplug methods will be implemented. Plug and unplug methods will use pypowervm code to create VF/vNIC server/vNIC clients and attach/detach them. Neutron port carries binding:vif_type and binding:vnic_type attributes. The vif type for this implementation will be pvm_sriov. The vnic_type will be ‘direct’.

A VIF driver (PvmVFSriovVifDriver) for directly attached to VM will get implemented in future.

Deployment of VM with SR-IOV vNIC will involve picking Physical Port(s), VIOS(es) and a VM and invoking pypowervm library. Similarly, attachment of the same to an existing VM will be implemented. RMC will be required. Evacuate and migration of VM will be supported with changes to compute virt driver and VIF driver via pypowervm library.

Physical Port information will be derived from port label attribute of physical ports on SR-IOV adapters. Port label attribute of physical ports will have to be updated with ‘physical network’ names during configuration of the environment. During attachment of SR-IOV backed vNIC to a VM, physical network attribute of neutron network will be matched with port labels of physical ports to gather a list of physical ports.

Failover/redundancy: VIF plug during deploy (or attach of network interface to a VM) will pass more than one Physical port and VIOS(es) (as stated above in deploy scenario) to pypowervm library to create vNIC on VIOS with redundancy. It should be noted that failover is handled automatically by the platform when a vNIC is backed by multiple VFs. The redundancy level will be controlled by an AGENT option vnic_required_vfs in the ML2 configuration file (see the blueprint for networking-powervm). It will have a default of 2.

Quality of Service: Each VF backing a vNIC can be configured with a capacity value, dictating the minimum percentage of the physical port’s total bandwidth that will be available to that VF. The ML2 configuration file allows a vnic_vf_capacity option in the AGENT section to set the capacity for all vNIC-backing VFs. If omitted, the platform defaults to the capacity granularity for each physical port. See the blueprint for networking-powervm for details of the configuration option; and see section 1.3.3 of the IBM Power Systems SR-IOV Technical Overview and Introduction for details on VF capacity.

For future implementation of VF - VM direct attach of SR-IOV to a VM, the request will include physical network name. PvmVFSriovVifDriver can lookup devname(s) associated with it from port label, get physical port information and create a SR-IOV logical port on the corresponding VM. Or may include a configuration option to allow the user to dictate how many ports to attach. Using NIB technique, users can setup redundancy.

For VF - vNIC - VM attach of SR-IOV port to a VM, the corresponding neutron network object will include physical network name, PvmVnicSriovVifDriver can lookup devname(s) associated with it from port label, get physical port information. Along with adapter ID and physical port ID, VIOS information will be added and a VNIC dedicated port on the corresponding VM will be created.

For migration scenario, physical network names should match on source and destination compute nodes, and accordingly in the physical port labels. On the destination, vNICs will be rebuilt based on the SR-IOV port configuration. The platform decides how to reconstruct the vNIC on the destination in terms of number and distribution of backing VFs, etc.

Alternatives

None

Security impact

None

Other end user impact

None

Performance impact

Since the number of VMs deployed on the host will depend on number of VFs offered by SR-IOV cards in the environment, scale tests will be limited in density of VMs.

Deployer impact
  1. SR-IOV cards must be configured in Sriov mode. This can be done via the pvmctl command, e.g.:
pvmctl sriov update -i phys_loc=U78C7.001.RCH0004-P1-C1 -s mode=Sriov
  1. SR-IOV physical ports must be labeled with the name of the neutron physical network to which they are cabled. This can be done via the pvmctl command, e.g.:
pvmctl sriov update --port-loc U78C7.001.RCH0004-P1-C1-T1 -s label=prod_net
  1. The pci_passthrough_whitelist option in the nova configuration file must include entries for each neutron physical network to be enabled for vNIC. Only the physical_network key is required. For example:
pci_passthrough_whitelist = [{"physical_network": "default"}, {"physical_network": "prod_net"}]

Configuration is also required on the networking side - see the blueprint for networking-powervm for details.

To deploy a vNIC to a VM, the neutron port(s) must be pre-created with vnic type direct, e.g.:

neutron port-create --vnic-type direct
Developer impact

None

Dependencies
  1. SR-IOV cards and SR-IOV-capable hardware
  2. Updated levels of system firmware and the Virtual I/O Server operating system
  3. An updated version of Novalink PowerVM feature
  4. pypowervm library - https://github.com/powervm/pypowervm
Implementation
Assignee(s)
  • Eric Fried (efried)
  • Sridhar Venkat (svenkat)
  • Eric Larese (erlarese)
  • Esha Seth (eshaseth)
  • Drew Thorstensen (thorst)
Work Items

nova-powervm changes:

  • Updates to PowerVM compute driver to support attachment of SR-IOV VF via vNIC.
  • VIF driver for SR-IOV VF connected to VM via vNIC.
  • Migration of VM with SR-IOV VF connected to VM via vNIC. This involves live migration, cold migration and evacuation.
  • Failover/redundancy support for SR-IOV VF(s) connected to VM via vNIC(s).

VIF driver for SR-IOV VF connected to VM directly will be a future work item.

Testing

1. Unit test All developed code will accompany structured unit test around them. These tests validate granular function logic.

2. Function test Function test will be performed along with CI infrastructure. Changes implemented for this blueprint will be tested via CI framework that exists and used by IBM team. CI framework needs to be enhanced with SR-IOV hardware. The tests can be executed in batch mode, probably as nightly jobs.

Documentation impact

All use-cases need to be documented in developer docs that accompany nova-powervm.

References
  1. This blog describes how to work with SR-IOV and vNIC (without redundancy/ failover) using HMC interface: http://chmod666.org/index.php/a-first-look-at-sriov-vnic-adapters/
  2. These describe vNIC and its usage with SR-IOV.
  3. These describe SR-IOV in OpenStack.
  4. This blueprint addresses SR-IOV attach/detach function in nova: https://review.openstack.org/#/c/139910/
  5. networking-powervm blueprint for same work: https://review.openstack.org/#/c/322210/
  6. This is a detailed description of SR-IOV implementation in PowerVM: https://www.redbooks.ibm.com/redpapers/pdfs/redp5065.pdf
  7. This provides a overall view of SR-IOV support in nova: https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
  8. Attach/detach of SR-IOV ports to VM with respect to libvirt. Provided here for comparison purposes: https://review.openstack.org/#/c/139910/
  9. SR-IOV PCI passthrough reference: https://wiki.openstack.org/wiki/SR-IOV-Passthrough-For-Networking
  10. pypowervm: https://github.com/powervm/pypowervm
Glossary
SR-IOV:

Single Root I/O Virtualization, used for virtual environments where VMs need direct access to network interface without any hypervisor overheads.

Physical Port:

Represents Physical port in SR-IOV adapter. This is not same as Physical Function. A Physical Port can have many physical functions associated with it. To clarify further, if a Physical Port supports RCoE, then it will have two Physical Functions. In other words, one Physical Function per protocol that port supports.

Virtual Function (VF):
 

Represents Virtual port belonging to a Physical Port (PF). Either directly or indirectly (using vNIC) a Virtual Function (VF) is connected to a VM. This is otherwise called SR-IOV logical port.

Dedicated SR-IOV:
 

This is equivalent to any regular ethernet card and it can be used with SEA. A logical port of a physical port can be assigned as a backing device for SEA.

Shared SR-IOV:

A VF to VM is not supported in Newton release. But an SR-IOV card in sriov mode is what we will be used for vNIC as described in this blueprint. Also, a SR-IOV in Sriov mode can have a promiscous VF assigned to the VIOS and configured for SEA(said configuration to be done outside of the auspices of OpenStack), which can then be used just like any other SEA configuration, and is supported (as described in next item below).

Shared Ethernet Adapter:
 

Alternate technique to provide network interface to a VM.

This involves attachment to a physical interface on PowerVM host and one or many virtual interfaces that are connected to VMs. A VF of PF in SR-IOV based environment can be a physical interface to Shared Ethernet Adapter. Existing support for this configuration in nova-powervm and networking-powervm will continue.

vNIC:

A vNIC is an intermediary between VF of PF and VM. This resides on VIOS and connects to a VF one one end and vNIC client adapter inside a VM. This is mainly to support migration of VMs across hosts.

vNIC failover/redundancy:
 

Multiple vNIC servers (connected to as many VFs that belong to as many PFs either on same SR-IOV card or across) connected to same VM as one network interface. Failure of one vNIC/VF/PF path will result in activation of other such path.

VIOS:

A partition in PowerVM systems dedicated for i/o operations. In the context of this blueprint, vNIC server will be created on VIOS. For redundancy management purposes, a specific PowerVM system may employ more than one VIOS partitions.

VM migration types:
 
  • Live Migration: migration of VM while both host and VM are alive.
  • Cold Migration: migration of VM while host is alive and VM is down.
  • Evacuation: migration of VM while hots is down (VM is down as well).
  • Rebuild: recreation of a VM.
pypowervm:

A python library that runs on the PowerVM management VM and allows virtualization control of the system. This is similar to the python library for libvirt.

History
Release Name Description
Newton Introduced

Ocata Specifications

Image Cache Support for localdisk driver

https://blueprints.launchpad.net/nova-powervm/+spec/image-cache-powervm

The image cache allows for a nova driver to pull an image from glance once, then use a local copy of that image for future VM creation. This saves bandwidth between the compute host and glance. It also improves VM deployment speed and reduces the stress on the overall infrastructure.

Problem description

Deploy times on PowerVM can be high when using the localdisk driver. This is partially due to not having linked clones. The image cache offers a way to reduce those deploy times by transferring the image to the host once, and then subsequent deploys will reuse that image rather than streaming from glance.

There are complexities with this of course. The cached images take up disk space, but the overall image cache from core Nova takes that into account. The value of using the nova image cache design is that it has hooks in the code to help solve these problems.

Use Cases
  • As an end user, subsequent deploys of the same image should go faster
Proposed change

Create a subclass of nova.virt.imagecache.ImageManager in the nova-powervm project. It should implement the necessary methods of the cache:

  • _scan_base_images
  • _age_and_verify_cached_images
  • _get_base
  • update

The nova-powervm driver will need to be updated to utilize the cache. This includes:

  • Implementing the manage_image_cache method
  • Adding the has_imagecache capability

The localdisk driver within nova-powervm will be updated to have the following logic. It will check the volume group backing the instance. If the volume group has a disk with the name ‘i_<partial uuid of image>’, it will simply copy that disk into a new disk named after the UUID of the instance. Otherwise, it will create a disk with the name ‘i_<partial uuid of image>’ that contains the image.

The image cache manager’s purpose is simply to clean out old images that are not needed by any instances anymore.

Further extension, not part of this blueprint, can be done to manage overall disk space in the volume group to make sure that the image cache is not overwhelming the backing disks.

Alternatives
  • Leave as is, all deploys potentially slow
  • Implement support for linked clones. This is an eventual goal, but the image cache is still needed in this case as it will also manage the root disk image.
Security impact

None

End user impact

None

Performance Impact

Performance of subsequent deploys of the same image should be faster. The deploys will have improved image copy times and reduced network bandwidth requirements.

Performance of single deploys using different images will be slower.

Deployer impact

This change will take effect without any deployer impact immediately after merging. The deployer will not need to take any specific upgrade actions to make use of it; however the deployer may need to tune the image cache to make sure it is not using too much disk space.

A conf option may be added to force the image cache off if deemed necessary. This will be based off of operator feedback in the event that we need a way to reduce disk usage.

Developer impact

None

Implementation
Assignee(s)
Primary assignee:
tjakobs
Other contributors:
None
Work Items
  • Implement the image cache code for the PowerVM driver
  • Include support for the image cache in the PowerVM driver. Tolerate it for other disk drivers, such as SSP.
Dependencies

None

Testing
  • Unit tests for all code
  • Deployment tests in local environments to verify speed increases
Documentation Impact

The deployer docs will be updated to reflect this.

References

None

History

Optional section intended to be used each time the spec is updated to describe new design.

Revisions
Release Name Description
Newton Introduced

Pike Specifications

File I/O Cinder Connector

https://blueprints.launchpad.net/nova-powervm/+spec/file-io-cinder-connector

There are several Cinder drivers that support having the file system mounted locally and then connecting in to the VM as a volume (ex. GPFS, NFS, etc…). There is the ability to support this type of volume in PowerVM, if the user has mounted the file system to the NovaLink. This blueprint adds support to the PowerVM driver to support such Cinder volumes.

Problem description

The PowerVM driver supports Fibre Channel and iSCSI based volumes. It does not currently support volumes that are presented on a file system as files.

The recent release of PowerVM NovaLink has added support for this in the REST API. This blueprint looks to take advantage of that support.

Use Cases
  • As a user, I want to attach a volume that is backed by a file based Cinder volume (ex. NFS or GPFS).
  • As a user, I want to detach a volume that is backed by a file based Cinder volume (ex. NFS or GPFS).
Proposed change

Add nova_powervm/virt/powervm/volume/fileio.py. This would extend the existing volume drivers. It would store the LUN ID on the scsi bus.

This does not support traditional VIOS. Like the iSCSI change, it would require running through the NovaLink partition.

Alternatives

None

Security impact

None.

One may consider the permission of the file presented by Cinder. The Cinder driver’s BDM will provide a path to a file. The hypervisor will map that file as the root user. So file permissions of the volume should not be a concern. This seems consistent with the other hypervisors utilizing these types of Cinder drivers.

End user impact

None

Performance Impact

None

Deployer impact

Deployer must set up the backing Cinder driver and connect the file systems to the NovaLink partition in their environment.

Developer impact

None

Implementation
Assignee(s)
Primary assignee:
thorst
Other contributors:
shyama
Work Items
  • Create a nova-powervm fileio cinder volume connector. Create associated UT.
  • Validate with the GPFS cinder backend.
Dependencies
  • pypowervm 1.0.0.4 or higher
Testing

Unit Testing is obvious.

Manual testing will be driven via connecting to a GPFS back-end.

CI environments will be evaluated to determine if there is a way to add this to the current CI infrastructure.

Documentation Impact

None. Will update the nova-powervm dev-ref to reflect that ‘file I/O drivers’ are supported, but the support matrix doesn’t go into the detail of what cinder drivers work with nova drivers.

File I/O Driver

https://blueprints.launchpad.net/nova-powervm/+spec/file-io-driver

The PowerVM driver currently uses logical volumes for localdisk ephemeral storage. This blueprint will add support for using file-backed disks as a localdisk ephemeral storage option.

Problem description

The PowerVM driver only supports logical volumes for localdisk ephemeral storage. It does not currently support storage that is presented as a file.

Use Cases
  • As a user, I want to have the instance ephemeral storage backed by a file.
Proposed change

Add nova_powervm/virt/powervm/disk/fileio.py. This would extend the existing disk driver. Use the DISK_DRIVER powervm conf option to select file I/O. Will utilize the nova.conf option instances_path.

Alternatives

None

Security impact

None

End user impact

None

Performance Impact

Performance may change as the backing storage methods of VMs will be different.

Deployer impact

The deployer must set the DISK_DRIVER conf option to fileio and ensure that the instances_path conf option is set in order to utilize the changes described in the blueprint.

Developer impact

None

Implementation
Assignee(s)
Primary assignee:
tjakobs
Other contributors:
None
Work Items
  • Create a nova-powervm fileio driver. Create associated UT.
Dependencies

Novalink 1.0.0.5

Testing
  • Unit tests for all code
  • Manual test will be driven using a File I/O ephemeral disk.
Documentation Impact

Will update the nova-powervm dev-ref to include File I/O as an additional ephemeral disk option.

References

None

Allow dynamic enable/disable of SRR capability

Include the URL of your launchpad blueprint:

https://blueprints.launchpad.net/nova-powervm/+spec/srr-capability-dynamic-toggle

Currently to enable or disable the SRR capability on the VM we need to have the VM in shutoff state. We should be able to toggle this field dynamically so that the shutdown of a VM is not needed.

Problem description

The simplified remote restart (SRR) capability governs whether a VM can be rebuilt (remote restarted) on a different host when the host on which the VM resides is down. Currently this attribute can be changed only when the VM is in shut-off state. This blueprint addresses that by enabling toggle of simplified remote restart capability dynamically (while the VM is still active).

Use Cases

The end user would like to : - Enable the srr capability on the VM without shutting it down so that any workloads on the VM are unaffected. - Disable the srr capability for a VM which need not be rebuilt to another host while the VM is still up and running.

Proposed change

The SRR capability is a VM level attribute and can be changed using the resize operation. In case of a resize operation for an active VM - Check if the hypervisor supports dynamic toggle of srr capability. - If it is supported proceed with updating of srr capability if it has been changed. - Throw a warning if update of srr capability is not supported.

Alternatives

None

Security impact

None

End user impact

None

Performance Impact

The change is srr capability is not likely to happen very frequently so this should not have a major impact. When the change happens the impact on the performance of any other component (the VM, the compute service, the REST service, etc.) should be negligible.

Deployer impact

End user will be able to dynamically the toggle the srr capability for the VM. The changes can be utilized immediately once they are deployed.

Developer impact

None

Implementation
Assignee(s)
Primary assignee:
manasmandlekar
Other contributors:
shyvenug
Work Items

NA

Dependencies

Need to work with PowerVM platform team to ensure that the srr toggle capability is exposed for the Compute driver to consume.

Testing

The testing of the change requires full Openstack environment with Compute resources configured. - Ensure srr state for VM can be toggled when it is up and running. - Ensure srr state for VM can be toggled when it is shut-off. - Perform rebuild operations to ensure that the capability is indeed getting utilized.

Documentation Impact

None

References

None

History
Revisions
Release Name Description
Pike Introduced

Rocky Specifications

Device Passthrough

https://blueprints.launchpad.net/nova-powervm/+spec/device-passthrough

Provide a generic way to identify hardware devices such as GPUs and attach them to VMs.

Problem description

Deployers want to be able to attach accelerators and other adapters to their VMs. Today in Nova this is possible only in very restricted circumstances. The goal of this blueprint is to enable generic passthrough of devices for consumers of the nova-powervm driver.

While these efforts may enable more, and should be extensible going forward, the primary goal for the current release is to pass through entire physical GPUs. That is, we are not attempting to pass through:

  • Physical functions, virtual functions, regions, etc. I.e. granularity smaller than “whole adapter”. This requires device type-specific support at the platform level to perform operations such as discovery/inventorying, configuration, and attach/detach.
  • Devices with “a wire out the back” - i.e. those which are physically connected to anything (networks, storage, etc.) external to the host. These will require the operator to understand and be able to specify/select specific connection parameters for proper placement.
Use Cases

As an admin, I wish to be able to configure my host and flavors to allow passthrough of whole physical GPUs to VMs.

As a user, I wish to make use of appropriate flavors to create VMs with GPUs attached.

Proposed change
Device Identification and Whitelisting

The administrator can identify and allow (explicitly) or deny (by omission) passthrough of devices by way of a YAML file per compute host.

Note

Future: We may someday figure out a way to support a config file on the controller. This would allow e.g. cloud-wide whitelisting and specification for particular device types by vendor/product ID, which could then be overridden (or not) by the files on the compute nodes.

The path to the config will be hardcoded as /etc/nova/inventory.yaml.

The file shall contain paragraphs, each of which will:

  • Identify zero or more devices based on information available on the IOSlot NovaLink REST object. In pypowervm, given a ManagedSystem wrapper sys_w, a list of IOSlot wrappers is available via sys_w.asio_config.io_slots. See identification. Any device not identified by any paragraph in the file is denied for passthrough. But see the allow section for future plans around supporting explicit denials.
  • Name the resource class to associate with the resource provider inventory unit by which the device will be exposed in the driver. If not specified, CUSTOM_IOSLOT is used. See resource_class.
  • List traits to include on the resource provider in addition to those generated automatically. See traits.

A formal schema is proposed for review.

Here is a summary description of each section.

Name

Each paragraph will be introduced by a key which is a human-readable name for the paragraph. The name has no programmatic significance other than to separate paragraphs. Each paragraph’s name must be unique within the file.

identification

Each paragraph will have an identification section, which is an object containing one or more keys corresponding to IOSlot properties, as follows:

YAML key IOSlot property Description
vendor_id pci_vendor_id X{4} (four uppercase hex digits)
device_id pci_dev_id X{4} “
subsys_vendor_id pci_subsys_vendor_id X{4} “
subsys_device_id pci_subsys_dev_id X{4} “
class pci_class X{4} “
revision_id pci_rev_id X{2} (two uppercase hex digits)
drc_index drc_index X{8} (eight uppercase hex digits)
drc_name drc_name String (physical location code)

The values are expected to match those produced by pvmctl ioslot list -d <property> for a given property.

The identification section is required, and must contain at least one of the above keys.

When multiple keys are provided in a paragraph, they are matched with AND logic.

Note

It is a stretch goal of this blueprint to allow wildcards in (some of) the values. E.g. drc_name: U78CB.001.WZS0JZB-P1-* would allow everything on the P1 planar of the U78CB.001.WZS0JZB enclosure. If we get that far, a spec amendment will be proposed with the specifics (what syntax, which fields, etc.).

allow

Note

The allow section will not be supported initially, but is documented here because we thought through what it should look like. In the initial implementation, any device encompassed by a paragraph is allowed for passthrough.

Each paragraph will support a boolean allow keyword.

If omitted, the default is true - i.e. devices identified by this paragraph’s identification section are permitted for passthrough. (Note, however, that devices not encompassed by the union of all the identification paragraphs in the file are denied for passthrough.)

If allow is false, the only other section allowed is identification, since the rest don’t make sense.

A given device can only be represented once across all allow=true paragraphs (implicit or explicit); an “allowed” device found more than once will result in an error.

A given device can be represented zero or more times across all allow=false paragraphs.

We will first apply the allow=true paragraphs to construct a preliminary list of devices; and then apply each allow=false paragraph and remove explicitly denied devices from that list.

Note

Again, we’re not going to support the allow section at all initially. It will be a stretch goal to add it as part of this release, or it may be added in a subsequent release.

resource_class

If allow is omitted or true, an optional resource_class key is supported. Its string value allows the author to designate the resource class to be used for the inventory unit representing the device on the resource provider. If omitted, CUSTOM_IOSLOT will be used as the default.

Note

Future: We may be able to get smarter about dynamically defaulting the resource class based on inspecting the device metadata. For now, we have to rely on the author of the config file to tell us what kind of device we’re looking at.

traits

If allow is omitted or true, an optional traits subsection is supported. Its value is an array of strings, each of which is the name of a trait to be added to the resource providers of each device represented by this paragraph. If the traits section is included, it must have at least one value in the list. (If no additional traits are desired, omit the section.)

The values must be valid trait names (either standard from os-traits or custom, matching CUSTOM_[A-Z0-9_]*). These will be in addition to the traits automatically added by the driver - see Generated Traits below. Traits which conflict with automatically-generated traits will result in an error: the driver must be the single source of truth for the traits it generates.

Traits may be used to indicate any static attribute of a device - for example, a capability (CUSTOM_CAPABILITY_WHIZBANG) not otherwise indicated by Generated Traits.

Resource Providers

The driver shall create nested resource providers, one per device (slot), as children of the compute node provider generated by Nova.

The provider name shall be generated as PowerVM IOSlot %(drc_index)08X e.g. PowerVM IOSlot 1C0FFEE1. We shall let the placement service generate the UUID. This naming scheme allows us to identify the full set of providers we “own”. This includes identifying providers we may have created on a previous iteration (potentially in a different process) which now need to be purged (e.g. because the slot no longer exists on the system). It also helps us provide a clear migration path in the future, if, for example, Cyborg takes over generating these providers. It also paves the way for providers corresponding to things smaller than a slot; e.g. PFs might be namespaced PowerVM PF %(drc_index)08X.

Inventory

Each device RP shall have an inventory of:

total: 1
reserved: 0
min_unit: 1
max_unit: 1
step_size: 1
allocation_ratio: 1.0

of the resource_class specified in the config file for the paragraph matching this device (CUSTOM_IOSLOT by default).

Note

Future: Some day we will provide SR-IOV VFs, vGPUs, FPGA regions/functions, etc. At that point we will conceivably have inventory of multiple units of multiple resource classes, etc.

Generated Traits

The provider for a device shall be decorated with the following automatically-generated traits:

  • CUSTOM_POWERVM_IOSLOT_VENDOR_ID_%(vendor_id)04X
  • CUSTOM_POWERVM_IOSLOT_DEVICE_ID_%(device_id)04X
  • CUSTOM_POWERVM_IOSLOT_SUBSYS_VENDOR_ID_%(subsys_vendor_id)04X
  • CUSTOM_POWERVM_IOSLOT_SUBSYS_DEVICE_ID_%(subsys_device_id)04X
  • CUSTOM_POWERVM_IOSLOT_CLASS_%(class)04X
  • CUSTOM_POWERVM_IOSLOT_REVISION_ID_%(revision_id)02X
  • CUSTOM_POWERVM_IOSLOT_DRC_INDEX_%(drc_index)08X
  • CUSTOM_POWERVM_IOSLOT_DRC_NAME_%(drc_name)s where drc_name is normalized via os_traits.normalize_name.

In addition, the driver shall decorate the provider with any traits specified in the config file paragraph identifying this device. If that paragraph specifies any of the above generated traits, an exception shall be raised (we’ll blow up the compute service).

update_provider_tree

The above provider tree structure/data shall be provided to Nova by overriding the ComputeDriver.update_provider_tree method. The algorithm shall be as follows:

  • Parse the config file.
  • Discover devices (GET /ManagedSystem, pull out .asio_config.io_slots).
  • Merge the config data with the discovered devices to produce a list of devices to pass through, along with inventory of the appropriate resource class name, and traits (generated and specified).
  • Ensure the tree contains entries according to this calculated passthrough list, with appropriate inventory and traits.
  • Set-subtract the names of the providers in the calculated passthrough list from those in the provider tree whose names are prefixed with PowerVM IOSlot and delete the resulting “orphans”.

This is in addition to the standard update_provider_tree contract of ensuring appropriate VCPU, MEMORY_MB, and DISK_GB resources on the compute node provider.

Note

It is a stretch goal of this blueprint to implement caching and/or other enhancements to the above algorithm to optimize performance by minimizing the need to call PowerVM REST and/or process whitelist files every time.

Flavor Support

Existing Nova support for generic resource specification via flavor extra specs should “just work”. For example, a flavor requesting two GPUs might look like:

resources:VCPU=1
resources:MEMORY_MB=2048
resources:DISK_GB=100
resources1:CUSTOM_GPU=1
traits1:CUSTOM_POWERVM_IOSLOT_VENDOR_ID_G00D=required
traits1:CUSTOM_POWERVM_IOSLOT_PRODUCT_ID_F00D=required
resources2:CUSTOM_GPU=1
traits2:CUSTOM_POWERVM_IOSLOT_DRC_INDEX_1C0FFEE1=required
PowerVMDriver
spawn

During spawn, we will query placement to retrieve the resource provider records listed in the allocations parameter. Any provider names which are prefixed with PowerVM IOSlot will be parsed to extract the DRC index (the last eight characters of the provider name). The corresponding slots will be extracted from the ManagedSystem payload and added to the LogicalPartition payload for the instance as it is being created.

destroy

IOSlots are detached automatically when we DELETE the LogicalPartition, so no changes should be required here.

Live Migration

Since we can’t migrate the state of an active GPU, we will block live migration of a VM with an attached IOSlot.

Cold Migration, Rebuild, Remote Restart

We should get these for free, but need to make sure they’re tested.

Hot plug/unplug

This is not in the scope of the current effort. For now, attaching/detaching devices to/from existing VMs can only be accomplished via resize (Cold Migration).

Alternatives

Use Nova’s PCI passthrough subsystem. We’ve all agreed this sucks and is not the way forward.

Use oslo.config instead of a YAML file. Experience with the [pci]passthrough_whitelist has led us to conclude that config format is too restrictive/awkward. The direction for Nova (as discussed in the Queens PTG in Denver) will be toward some kind of YAML format; we’re going to be the pioneers on this front.

Security impact

It is the operator’s responsibility to ensure that the passthrough YAML config file has appropriate permissions, and lists only devices which do not themselves pose a security risk if attached to a malicious VM.

End user impact

Users get acceleration for their workloads o/

Performance Impact
Discovery

For the update_provider_tree flow, we’re adding the step of loading and parsing the passthrough YAML config file. This should be negligible compared to e.g. retrieving the ManagedSystem object (which we’re already doing, so no impact there).

spawn/destroy

There’s no impact from the community side. It may take longer to create or destroy a LogicalPartition with attached IOSlots.

Deployer impact

None.

Developer impact

None.

Upgrade impact

None.

Implementation
Assignee(s)
Primary assignee:
efried
Other contributors:
edmondsw, mdrabe
Work Items

See Proposed change.

Dependencies

os-traits 0.9.0 to pick up the normalize_name method.

Testing

Testing this in the CI will be challenging, given that we are not likely to score GPUs for all of our nodes.

We will likely need to rely on manual testing and PowerVC to cover the code paths described under PowerVMDriver with a handful of various device configurations.

Documentation Impact
  • Add a section to our support matrix for generic device passthrough.
  • User documentation for: * How to build the passthrough YAML file. * How to construct flavors accordingly.
References

None.

History
Revisions
Release Name Description
Rocky Introduced