Oracle EPM Products Have vCPU Restrictions in Windows Server 2019 or Earlier

As more companies move their workload to the cloud, virtual servers with 64 or more vCPUs running Windows will become more common. Some customers may try to cut costs by hosting multiple products on one Windows server, or may not want to split their workloads among multiple application servers. When planning your Oracle EPM builds, be aware that Windows Server 2019 and earlier OSes may not allow your EPM processes to address all the vCPUs on your virtual server. Move to Linux-based servers, or re-think your distribution of applications among multiple Windows-based servers.

In the course of troubleshooting poor performance, a client with a very large Planning application bumped up the number of vCPUs on their Essbase server from a starting point of 32, up to 64, encountering CPU bottlenecks at every point, before finally trying 104 vCPUs. Performance grew worse with this latest test, and Microsoft was engaged to address the issue.

Without getting bogged down in the technical details, we were able to see the Windows 2019 server was only assigning half the vCPUs to the workload of a single process, in our case the Essbase Planning application. No matter how many calcs we threw at it, the Essbase application would not touch any vCPUs outside the first 52. We engaged a CPU stress tool and found the same issue – if we ran a single instance of the stress tool, it would never address more than the first 52 vCPUs. If we ran two instances, one instance would run against the first 52 vCPUs, the second instance would run against the second 52 vCPUs.

For vCPUs numbering more than 64, Windows Server 2019 and below sets the number of sockets to two or more and splits the virtual processors between them. The operating system assigns vCPUs to two or more Processor Groups – in our example, two groups of 52 vCPUs addressed as Socket0 and Socket1.

Each individual Essbase application running is a process and would be assigned a Processor Group in a round-robin fashion, at startup. The first thread of a process initially runs in the group to which the process is assigned. Each newly created thread is assigned to the same group as the thread that created it. In our example the Essbase application process for our Planning app is assigned to Processor Group 0 (52 vCPUs available) and shares those vCPUs with any other application that is assigned by the OS to that Processor Group. In Server 2019 and below, the OS will ONLY send our app’s processor requests to Processor Group 0 unless the application is able to explicitly request where to run its threads. Essbase, and other EPM suite software, is not, and so will only run using vCPUs in the OS-assigned group, Processor Group 0 in our example.

See linked Microsoft documentation for a more detailed explanation.

This situation becomes resolved in Windows Server 2022, in which case, applications are no longer constrained by default to a single processor group. If Essbase requests more resources than are able to provided by a single processor group, vCPUs from other processor groups would be assigned by the OS to handle the load.

Oracle Support and Development were both consulted at this point. Essbase specifically was called out, the answer, paraphrased, is it will only send thread requests to the OS without requesting an affinity for a particular processor group. Essentially, however many vCPUs reside on the server, you will never see a single EPM process take advantage of vCPUs outside of the initial processor group assigned to it. I asked about support for Windows Server 2022 as well. The response was, “Version certification includes testing and certifying against new platforms. Current matrices are here: Resolving Certification Issues for EPM System Products ( Doc ID 2024421.1 ). Please raise an enhancement request for this if you have not already done so.” 

In which situations would this ever be a concern? Let’s try 2 test cases. Case #1, a typical but large Windows EPM install (Essbase, Planning, Financial Management) with 1 socket and 64 vCPUs on Windows Server 2019. Case #2, a similar install but with 2 sockets and 80 vCPUs total on Win2019.

Case #1, you have a very large Planning application, and an HFM application, and want to run consolidations and calculations on your apps at the same time. The OS will split the load between all existing vCPUs so it is up to you to tune your application CPU usage – let’s say you decided to set your apps to use 30 vCPUs each. You start both activities, your single process for xfmdatasource.exe (your HFM app) gets 30 vCPUs and your single process for ESSSVR.exe (your Essbase Planning app) gets 30 vCPUs, with 4 ‘idle’ vCPUs total. Assuming other load on the server is minimal, this will not cause a problem.

Case #2, your very large Planning application could be set to use 30 vCPUs here too, as could your HFM application. You have two processor groups each with 40 vCPUs, so you should be able to easily apply additional vCPUs to take advantage of the greater processing power. Unfortunately, your operating system has round-robin assigned both applications to the same processor group, and both applications will be utilizing the same 40 vCPUs. Your application activities will be slower than on your Case #1 system as they will be contending for the same CPU resources. Meanwhile, the other processor group sits at nearly zero utilization.

Until Oracle releases a version of the EPM product suite that is certified to run on Windows Server 2022, outlier cases like these will be an issue for implementations for the foreseeable future. Thank you for your time, please leave a comment below!

Previous
Previous

Oracle CloudWorld 2022 Live Updates

Next
Next

What are Managed Services? (Part Three)