Deutsche Bahn has developed an AI platform that is scalable, secure, and efficient. They have a dedicated AI platform team and their focus was on rapid project initiation, reduced maintenance costs, and effective management of AI and machine learning tools across diverse data analytics teams. AWS Sagemaker has helped them achieve their goals.
For an in-depth understanding, the full article is available here. Read on, for a TL;DR version.
TL;DR
Here is the basic concept behind such AI platforms.
Lets checkout the different parts here.
1.A self service portal
This is a frontend that offers a self service aka “shopping cart” approach to discover data, request a ML domain and data governance. Here is how this works.
- A team working on a new ML use case e.g. engine maintenance, wants a new environment to train their models and deploy inference endpoints.
- The team uses the portal to discover data they need and request a new domain. This request creates a ticket that is assigned to the Data Governance team.
- The data governance team approves the request.
- A new domain is deployed automatically with all the necessary resources, security policies and guardrails. Lake formation permissions and other plumbing necessary for data access are done immediately. Team A can login and start building!
This is easier said than done. Most enterprises have data spread across different plants and geographic regions. Moreover, there is sensitive data and data governance requirements. Data Mesh is the way to go here and the platform account takes care of this heavy lifting, more on that in another post.
2.The Sagemaker Service Account
AWS Sagemaker is a fully managed ML service that uses the concept of domains. Think of a Sagemaker domain as a secure environment that a specific team can use for all kinds of ML tasks, like training and inference.
For ML engineers, a SageMaker domain provides a fully managed and scalable environment where they can experiment with and deploy complex models. They can focus on optimizing models rather than managing infrastructure.
Data scientists often require an environment that supports exploratory data analysis and visualization. SageMaker’s integration with Jupyter Notebooks and VS Code Editor, offers an interactive interface where they can write, run, and share code seamlessly.
3.The AI Platform Account
This is a central account that you use to manage infrastructure and operations for Sagemaker studio. For e.g. when a team requests a Sagemaker domain, the platform account uses CDK to automatically provisions everything necessary behind the scenes. This step is flexible enough to be tailored for the exact needs of any Organisation.
VPC (virtual private clouds) and VPC endpoints ensure that traffic between your Amazon account and the Sagemaker service does not leave the Amazon network. e.g. This means that when users use their Sagemaker notebooks, the traffic is routed via AWS networks and not the internet.
👐
So that’s the basic concept behind a scalable AI platform. This post was intentionally high level and there are different ways as well on how you can build it. e.g. You can also use AWS service catalog and product portfolios for deploying domains. For an in-depth look, the full article is available here.
🚀 Try it
Here is a AWS CDK application that demonstrates integrating SageMaker, Amazon Cognito, featuring classes for domain provisioning, user authentication, and resource deployment with minimal privileges. Github repo
Leave a Reply