5 steps to Building a modern data foundation

5 steps to Building a modern data foundation

Organizations that once stored gigabytes of data now find themselves having to manage petabytes or even exabytes across their IT infrastructures. The borderless Internet, which allows applications to reach nearly anywhere, is one reason for the unprecedented data growth. Perhaps an even bigger driver is the increasing use of the public cloud for highly accessible and cost-effective compute and storage services.

To harness the value of their massive data volumes, companies are building modern, cloud-based data infrastructures that help them create a universal version of “truth.” These foundations unify siloed data pockets for a holistic view, empowering everyone in an organization to make better-informed decisions and act with confidence.

“Exponential data growth has been happening for a while, but now there’s so much more that can be done with it,” says Herain Oberoi, Director of Product Marketing, Databases, Analytics, and Blockchain at Amazon Web Services (AWS). “Cloud economics has removed the constraints of having to decide what data to store and what to discard. Now, you can keep and process it all in real-time and take immediate action on it.”

Creating a ‘flywheel’ framework

AWS outlines five fundamental steps to building a modern, cloud-based data foundation. Intended to help you get the most from your data by guiding you toward better decisions about which products to develop, how to find new revenue streams, where you might automate manual processes, and ways to win and retain customers. The framework uses a flywheel concept popularized by author Jim Collins, whereby each component feeds the others to drive momentum in capturing maximum value from your data continually.

The five steps are not linear, which gives organizations flexibility depending on their current level of data maturity. “You can start anywhere, and they build on each other,” says Oberoi.

  1. Break free from legacy databases. This step represents the “low-hanging fruit,” Oberoi says. Many organizations still have a legacy, proprietary databases, which are expensive, create lock-in, and carry punitive licensing terms, he says. These issues can be resolved by moving to open-source databases. Oberoi cautions, however, that getting the same performance as with commercial-grade databases isn’t guaranteed. He advises making sure your open-source database delivers the cost efficiencies you seek without causing a performance or availability hit.
  2. Move to managed services in the cloud. As open-source and other database platforms begin to scale, IT time and administrative costs can grow. Many organizations still self-manage their databases, focusing on operational tasks such as hardware and software installation, configuration, patching, backups, performance tuning, and configuring for high availability, security, and compliance. “All that time spent administering means less time analyzing data or innovating on an application,” Oberoi says. Cloud-based, managed database services reduce time spent on this “undifferentiated heavy lifting” so teams can focus on higher-value activities, he says.
  3. Modernize your data warehouse. Traditional data warehouses don’t have the ability to effectively store and analyze the growing volume and variety of data, which leads to data being stored in multiple silos. Giving your data flywheel the push it needs for self-sustaining momentum requires a modern data warehouse approach, including a data lake, which can store unlimited data volumes in various structured and unstructured formats. This “lake house” approach makes it much easier to catalog data, make it accessible, and analyze it across the business.
  4. Build modern apps with purpose-built databases. The days of developers building a monolithic application with a single relational database are fading quickly. Instead, developers break complex applications into smaller pieces with a microservices architecture, then picking the best purpose-built database to solve each problem. This method frees the application from having to use a single, overburdened database for every use case and “delivers the high performance, scale, and agility that allows organizations to innovate faster,” says Shawn Bice, Vice President of Databases, AWS.
  1. Turn data into insights. A data lake provides a central repository for storing all types of data, as-is, at scale. Oberoi advises creating and maintaining an online data catalog to avoid the data lake devolving into the dreaded “data swamp.” “You can analyze real-time streaming data, determine operational health, and quickly diagnose and fix problems. You can also predict what might be coming instead of analyzing only what’s happened in the past,” he says.

A data lake has been a game-changer for Amazon.com. “Five years ago, we were limited in our ability to grow and analyze our business,” says Jeff Carter, Amazon’s VP of Data, Analytics, and Machine Learning. Amazon made the strategic decision to move all its data off a traditional Oracle database and into an AWS S3 data lake. “By migrating to the [cloud], we have been able to scale to meet our business needs” while lowering the cost to maintain the architecture by 30% to 50%, Carter says.

Data is one of the most valuable assets of any organization. Unlocking its value is a catalyst to positive business outcomes, from improving operational efficiencies to delighting customers. A modern, cloud-based data infrastructure provides a foundation for smarter, data-driven decisions.

Disclaimer: I am the author at PLM ECOSYSTEM, focusing on developing digital-thread platforms with capabilities across CAD, CAM, CAE, PLM, ERP, and IT systems to manage the product data lifecycle and connect various industry networks. My opinions may be biased. Articles and thoughts on PLMES represent solely the author's views and not necessarily those of the company. Reviews and mentions do not imply endorsement or recommendations for purchase.

Leave a Comment

Your email address will not be published. Required fields are marked *