Ten years ago, in 2012, Burak Yenier and I organized our first 50 “Engineering Simulation Experiments” in the cloud. Together with engineers, High-Performance Computing (HPC) cloud providers, and independent software vendors (ISVs like Ansys), we analyzed especially the 26 experiments that failed, at that time, and we published a list of seven major roadblock, lessons learned, and recommendations. A few years later, in 2017, after more than 170 HPC cloud experiments, the success rate of these experiments increased from 50% to nearly 90%. Based on this breakthrough, we published our lessons learned in an HPC Today article about Dispelling the 7 Myths of Cloud Computing. And one of the myths we challenged was, “Cloud computing is more expensive than on-premise computing”.

Today, with another 50+ increasingly complex cloud experiments (in total 224 until now), we gained enough experience to provide a list of seven good reasons why cloud computing for industry applications doesn’t have to be more expensive than on-premise computing, but on the contrary, can save you a lot of unnecessary costs and gain high additional returns, beyond what you are used to achieving with HPC on-premises. Here’s the list:

  • Make your engineers more productive, often by 10X and more
  • Increase CAE license efficiency by 100% and save up to 50% CAE license cost
  • Reserved or spot instances can be 30% to 80% cheaper than on-demand instances
  • Avoid laborious Do-It-Yourself cloud on-boarding
  • Discover potential design failures early
  • Replace high on-prem CAPEX with flexible cloud OPEX
  • Strengthen competitiveness and increase ROI by better products faster to market

In the following, we will briefly dive into these cost savings (resp. increasing ROIs), in the HPC Cloud.

1.  Make your engineers more productive, often by 10X and more

On-premises, the engineers are limited by the limitations of computing resources available in their corporate data center or just on their desks. Therefore, they have to limit their designs, for example, the granularity of geometries (e.g., the number of finite elements), the physics (e.g., in the boundary layer, and turbulence), other properties (e.g., the number of different materials), and so on. On the other hand, in the cloud, they can do many more variations of these parameters and run 10+ times more simulations. In the cloud, the engineer can do the same work that 10 engineers would be able to do on-premises. Therefore, in the cloud, the company saves the additional expenses of 9 additional engineers, resulting in cost savings in the order of $2 million per year (in this example). And even if the company is not so much interested in doing more in the cloud (e.g., because they might already have the best product in the world, now the engineer could do the same number of simulation jobs much faster, and that way reduces the design and development cycle and thus the time to market. But with your next-generation product faster to market increases your company’s competitiveness and the opportunity to innovate, faster.

2.  Increase CAE license efficiency by 100%, saving up to 50% CAE license cost

Simple math shows that, by using the cloud with always the latest hardware technologies (CPUs, etc.), your CAE license efficiency can increase by a factor of two or more, simply due to faster (and more) cloud hardware: while on-premises, for example, an engineer uses an expensive CAE license for one simulation job for two hours, for the same two hours this engineer can now use this license in the cloud for two jobs, consecutively, very useful for DoE, Machine Learning, Parameter Studies, etc. And, in addition, this engineer becomes twice as productive (see the productivity argument). Or, btw, two engineers can use a floating license for 1 hour each, consecutively.

3.  Use reserved or spot instances that can save you 30% – 80%

In the early days, many cloud to on-prem comparisons were based on list prices for on-demand compute instances in the cloud. Example today for one-day computing: 120 cores for $4.00 times 24 hours results in $96 per day or 3 cents per core per hour. For a new on-prem HPC system, one core per hour can cost well under one cent, just the hardware, in case this HPC system is fully utilized! But in industry, there are times when such a system is only 50% (or less utilized), and over time this system is getting older, and comparably (to the cloud) slower, with limited memory, etc. which then would need an expensive upgrade.

On the other hand, we often forget that cloud providers offer a resource reservation plan (e.g., an enterprise agreement) for so-called reserved instances where you get a certain discount when you commit to certain resource consumption. The discount can be in the order of 20 – 40% depending on the amount of total consumption committed in advance. These reserved instances don’t have to run 24/7; you can still switch them on and off as needed, on demand.

But you can even get a higher discount, from so-called spot instances that enable you to request unused cloud instances at steep discounts, often at 50% – 80% off on-demand cloud instance prices. The downside is that the time for using a spot instance is limited, and the longer you use them the smaller the discount and the higher the risk of losing them when someone else offers a higher price for them. Therefore, the ideal usage of spot instances is for short-running single-node compute jobs, like e.g., as part of a Design of Experiment (DoE) with dozens of jobs running on dozens of servers. In such a scenario, if one of the servers is taken away, one can easily repeat this job within the next sequence of DoE jobs. Similarly with simulations used for Machine Learning, when you produce dozens or even hundreds of simulation results to train (and verify) a neural network algorithm.

4.  Avoid laborious Do-It-Yourself cloud on-boarding

Moving complex engineering application software to the cloud is not a trivial task, and you need the combined expertise in engineering software, HPC resources, and other infrastructure in the cloud, IT challenges in moving data and code through the firewalls over the Internet into a highly secure cloud datacenter and finally taking into account the company’s legal and compliance requirements. This expertise is often not available in e.g., the manufacturing industry and experts with such combined knowledge are rare. Still trying to move complex engineering workflows and data to an HPC cloud is therefore high risk, results are often sub-optimal, and failure rates in the industry are known to be very high. Therefore, avoiding do-it-yourself can save a company a lot of money and a lot of (wasted) time.

5.  Discover potential design failures early

The more an engineer is able to simulate and execute jobs, the more she is able to analyze the product design. Therefore, more computing resources can mean more geometry variations, finer granularity (meshes), more accurate physics, more materials, etc., thus discovering inaccuracies and potential failures early in the design phase. Discovering such failure later, and in the worst case only when the product is already in the market, can cost a company many millions of dollars. Thus, using cloud computing resources during design and development can dramatically reduce your risk of failure and of losing your good reputation in the market.

6.  Replace high on-prem CAPEX with flexible cloud OPEX

Selecting, buying, and implementing an HPC system comes with a tedious procedure. Inviting and evaluating offers from different hardware vendors, perhaps even benchmarking and comparing different systems beforehand, selecting the final candidates, negotiating all kinds of specificities, handling a lot of bureaucracies and documents, and much more can lead to an acquisition time of many months, while the old on-premise system is getting older and (relatively) slower, and the new system still takes a few months until it is fully operational. And the new HPC system comes with a high capital expenditure before it is even used. And it comes with additional cost for electricity, space, cooling, training, operation, software, downtime, less than 100% utilization, etc., which has to be summed up into the Total Cost of Ownership.

On the other hand, cloud usage just comes with an operational expenditure. It is HPC as a Service and pay-per-use, and many of the above-mentioned time- and money-consuming processes don’t exist or are at least included (hidden) in the core-per-hour price. Additional cost might come from data transfer, storage, and some of the intelligent tools a cloud provider offers.

7.  Strengthen competitiveness and increase ROI by better products and faster time to market

In summary of what has been said a few times above, having more computing resources available can mean performing more engineering simulations, with more parameters for e.g., different geometries and finer meshes, more physics variations, more and different materials, more detailed parameter studies (e.g., in Design of Experiments, DOEs), and more intelligent neural networks. All this can result in a higher-quality next-generation product that enters the market faster, thus strengthening a company’s competitiveness, its ability to innovate (fast), its stamina and longevity in a fast-changing world, and its branding. How much ROI will result from all of this, what $-value will this be? Very difficult to measure, but definitely a huge amount, if not invaluable. Time will tell . . .

Disclaimer: I am the author at PLM ECOSYSTEM, focusing on developing digital-thread platforms with capabilities across CAD, CAM, CAE, PLM, ERP, and IT systems to manage the product data lifecycle and connect various industry networks. My opinions may be biased. Articles and thoughts on PLMES represent solely the author's views and not necessarily those of the company. Reviews and mentions do not imply endorsement or recommendations for purchase.

Leave a Comment

Your email address will not be published. Required fields are marked *