AWS open source news and updates #119
July 1st, 2022 - Instalment #119
Welcome to regular and new readers alike, to the AWS open source newsletter episode #119.
This week we feature more new open source projects, such as “cdk-bill-bot”, a tool that can help you reduce AWS bill surprises, “steampipe-mod-aws-perimeter” helps you look for resources that are publicly accessible, “aws-cloudformation-diagrams” is a nice visualisation tool for CloudFormation users, “aws-swagger-ui” a project to help you set up Swagger UI for API Gateway, “kinesis-hot-shard-advisor” a handy tool that helps you identify whether you have hot key or hot shard issues on your Kinesis data streams, and many more.
We also have blog posts, tutorials and videos on topics that include RStudio, Amazon Corretto, Terraform, Linux, Hugging Face, Syne Tune, Apache Airflow, OpenSearch, Apache Iceberg, Pulumi, Lustre, Apache Spark, PostgreSQL and many more. Finally, make sure you check out the events section for the latest open source events, there are some great events coming up over the next week.
At Amazon we work backwards from our customers, and one of the ways we do that is collecting data to help us know what is important and what we should focus on. Please could you complete this simple, anonymous survey. The first 25 will get an AWS $25 credit code.
Celebrating open source contributors
The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.
So thank you to the following open source heroes: David Tippett, Steven Hillon, Adam Bien, Lerna Ekmekcioglu, Jack Iu, Arun Km, Rohit Bhosale, Abhishek Ray, Jeremy Cowan, Akshit Khanna, Arun Thangaraj, Chaitanya Varma Mudundi, Sekar Srinivasan and Christian Bonzelet
Latest open source projects
The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.
cdk-bill-bot this project from Christian Bonzelet enables AWS customers to proactively monitor their infrastructure costs and identify unforeseen expenses in a timely manner. Bill wants to prevent AWS customers from receiving bad surprises in their monthly bill. Therefore he addresses two primary problem areas, first, cost history is not monitored on a regular basis, and second, basic cost optimisation best practices are not setup.
As Christian points out in his README, this is an alpha stage project so use at your own risk and beware of potentially breaking changes.
steampipe-mod-aws-perimeter the latest open source project from the fine folk at Turbot, this project provides an AWS perimeter checking tool that can be used to look for resources that are publicly accessible, shared with untrusted accounts, have insecure network configurations, and more.
aws-cloudformation-diagrams another visualisation project, this time allowing you to generate dynamic diagram of AWS Cloudformation template from YAML file. Using Python, you will be able to quickly create some pdf architecture from your CloudFormation templates in no time at all.
aws-icons-for-plantuml PlantUML is an open-source tool allowing users to create diagrams from a plain text language. This repository contains images, sprites, macros, and other includes for Amazon Web Services (AWS) services and resources. You can use this repository to create PlantUML diagrams with AWS components. All elements are generated from the official AWS Architecture Icons and when combined with PlantUML and the C4 model, are a great way to communicate your design, deployment, and topology as code.
aws-swagger-ui this repo shows how to set up Swagger UI for API Gateway. It uses Lambda for serving Swagger UI.
kinesis-hot-shard-advisor The Amazon Kinesis Hot Shard Advisor is a CLI tool that simplifies identifying whether you have hot key or hot shard issues on your Kinesis data streams. The tool can also identify whether you are hitting the shard level throughput limit per-second basis.
simpleiot-cli last week I shared the SimpleIOT SDK. This week we have the command-line-interface (CLI) for the SimpleIOT framework. SimpleIOT abstracts out IoT device connectivity and hides the underlying details so you can focus on your application’s unique features.
Demos, Samples, Solutions and Workshops
eventbridge-events-to-vpc shows you how to send EventBridge events to a private endpoint in a VPC using a Lambda function to relay events. This solution deploys the Lambda function connected to the VPC and uses IAM permissions to enable EventBridge to invoke the Lambda function. In this solution, you set up an example application with an EventBridge event bus, a Lambda function to relay events, a Flask application running in an EKS cluster to receive events behind an Application Load Balancer (ALB), and a secret stored in Secrets Manager for authenticating requests. This application uses EKS and Secrets Manager to demonstrate sending and authenticating requests to a containerized workload, but the same pattern applies for other container orchestration services like ECS and your preferred secret management solution.
step-functions-workflows-collection this repo contains Step Functions workflows that shows how to orchestrate multiple services into business-critical workflows with minimal code. You can use these workflows to help develop your own projects quickly.
aws-test-automation-for-devops-using-aws-cdk this sample code shares methods and patterns for applying test automation in CI/CD of AWS in order to transform DevOps through test automation. With this sample, you can implement test automation directly through practice.
aws-sso-configuration-automation this project accelerate AWS Single Sign-On (SSO) implementation using AWS CDK. This CDK program allows you to conveniently define your own permission sets and assignments without the need to tediously create your own AWS CloudFormation templates for your AWS SSO deployment minimising the risk of human misconfigurations.
AWS and Community blog posts
Gary Stafford has put together his latest blog post, Developing Spring Boot Applications for Querying Data Lakes on AWS. In this post Gary explores how to build an example Java Spring Boot RESTful Web Service that allows end-users to query data stored in a data lake on AWS. The RESTful Web Service will access data stored as Apache Parquet in Amazon S3 through an AWS Glue Data Catalog using Amazon Athena. The service will use Spring Boot and the AWS SDK for Java to expose a secure, RESTful Application Programming Interface (API). As always, must read post this week. [hands on]
Check out Using Amazon Corretto (OpenJDK) for lean, fast, and efficient AWS Lambda Applications where Adam Bien discusses how you can launch large, monolithic applications on top of AWS Lambda, showing that they perform well and are cost effective. He also introduces running Amazon Corretto on ARM64 using Lambda’s Graviton2-based offering. Finally Adam demonstrates how you can use Amazon Corretto with Quarkus to build micro services quickly and at low cost. Must read for Java developers this week. [hands on]
Pulumi is an open-source Infrastructure as Code (IaC)tool, similar to Terraform, with a key difference between Pulumi and Terraform is that Pulumi lets you choose a number of different general-purpose programming languages whereas Terraform has a domain-specific language called Hashicorp Configuration Language (HCL). Abhishek Ray has put together this blog post, How to use Pulumi and Python to create an EC2 instance to show you the basics and help you get started [hands on]
Apache Iceberg is an open table format, originally designed at Netflix in order to overcome the challenges faced when using existing data lake formats. It is getting a lot of interest, and this week we have a couple of blog posts that will help you get to know this open source technology in more detail.
First up we have Chaitanya Varma Mudundi from RedHat, who provides a nice overview of Apache Iceberg and some ideas of how you can run this on AWS in his post, Apache Iceberg: An Introduction from Rackspace on Running the New Open Table Format on AWS
Following that we have the post, Build a high-performance, ACID compliant, evolving data lake using Apache Iceberg on Amazon EMR from Sekar Srinivasan. In this post, Sekar discuss' the modern data lake requirements and the challenges—including support for ACID transactions and concurrent writers, partition and schema evolution—that come with these, before showing how Apache Iceberg solves these challenges. There is a sample notebook that you can use to follow along, to see how you can run Apache Iceberg on Amazon EMR using the AWS Glue Data Catalog as the metastore, and query the data using Athena. [hands on]
Very nice post, Multi-Region Terraform Deployments with AWS CodePipeline using Terraform Built CI/CD from Lerna Ekmekcioglu and Jack Iu where they demonstrate the best practice for multi-Region deployments using HashiCorp Terraform as infrastructure as code (IaC), and AWS CodeBuild , CodePipeline as continuous integration and continuous delivery (CI/CD) for consistency and repeatability of deployments into multiple AWS Regions and AWS Accounts. [hands on]
A couple of posts this weeks for those of you who are interested or follow Kubernetes. First up we have Jeremy Cowan who writes, Amazon EKS improves control plane scaling and update speed by up to 4x and shares some of the insights around how AWS has been able to optimise and speed up the performance of the Amazon EKS control plane, with the aim of making the EKS user experience even better.
The second post is a hands on tutorial from Akshit Khanna and Arun Thangaraj, where you will leverage several AWS services, including AWS App Mesh, Amazon Route 53, and Amazon EKS, to run a resilient, highly available application in two different regions. Read on if this sounds like something you want to try, in Run an active-active multi-region Kubernetes application with AppMesh and EKS
To round this section off, we have Leverage AWS secrets stores from EKS Fargate with External Secrets Operator penned by Ryan Stebich. In this post, Ryan walks through using the External Secrets Operator on an Amazon EKS Fargate cluster to consume secrets stored in AWS Secrets Manager. [hands on]
Lustre is an open source type of parallel distributed file system, generally used for large-scale cluster computing. Arun Km and Rohit Bhosale show you how you can safeguard your data using FSx for Lustre’s encryption feature, in their post, Protecting your high-performance file systems with Amazon FSx for Lustre [hands on]
Other posts you might like from the past week
- Hyperparameter optimization for fine-tuning pre-trained transformer models from Hugging Face discusses hyperparameter optimisation for fine-tuning pre-trained transformer models from Hugging Face based on Syne Tune [hands on]
- How to integrate Linux instances with AWS Gateway Load Balancer presents a sample handler that implements Linux virtual Layer 3 interfaces (using Linux’s TUN support, explained here) to handle the AWS Gateway Load Balancer connectivity.[hands on]
- Use AWS Nitro Enclaves to perform computation of multiple sensitive datasets provides a walk through of how to build the Proof of Concept (POC) bidding service application I shared in #118 [hands on]
- Archive and Purge Data for Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL Compatibility using pg_partman and Amazon S3 shares how you can efficiently use PostgreSQL’s native range partition to partition current (hot) data with pg_partman and archive historical (cold) data in Amazon S3
- Automate Amazon RDS for PostgreSQL horizontal scaling and system integration with Amazon EventBridge and AWS Lambda provides a solution to automate horizontal scaling of your Amazon RDS for PostgreSQL environment using an event-driven architecture [hands on]
- Disaster recovery considerations with Amazon EMR on Amazon EC2 for Spark workloads shows you how to architect your Amazon EMR environment for disaster recovery to maintain business continuity with minimum Recovery Time Objective (RTO) during Availability Zone failure or when your EMR cluster is inoperable [hands on]
- Jenkins high availability and disaster recovery on AWS shares the challenges to scale Jenkins for high availability (HA) and disaster recovery (DR) and how you can address these using AWS services [hands on]
Transport Layer Security (TLS)
To respond to evolving technology and regulatory standards for Transport Layer Security (TLS), we will be updating the TLS configuration for all AWS service API endpoints to a minimum of version TLS 1.2. This update means you will no longer be able to use TLS versions 1.0 and 1.1 with all AWS APIs in all AWS Regions by June 28, 2023.
To find out more, read the post TLS 1.2 to become the minimum TLS protocol level for all AWS API endpoints
Amazon MemoryDB for Redis is now a Payment Card Industry Data Security Standard (PCI DSS) compliant service. MemoryDB is a fully managed, Redis-compatible, in-memory database that provides low latency, high throughput, and durability at any scale. You can now use MemoryDB to store sensitive payment card data with low latency and high throughput for use cases such as payment processing, mobile wallet, and payment fraud prevention that are subject to PCI DSS.
You can now bring your own development environment in a custom image to RStudio on Amazon SageMaker. RStudio on SageMaker is the industry’s first fully managed RStudio Workbench in cloud. You can quickly launch the familiar RStudio Integrated Development Environment (IDE), dial up and down the underlying compute resources without interrupting your work, and even switch to programming using Python on Amazon SageMaker Studio Notebooks. All your work, including code, datasets, repositories, and other artefacts are synchronised between the two environments. You can bring your current RStudio license to Amazon SageMaker at no additional charge to quickly get started.
RStudio on SageMaker already comes with a built-in image pre-configured with R programming and data science tools including SageMaker SDK, AWS CLI, AWS SDK, and Reticulate package for integration with Python-based interfaces. Starting today, you can register your own custom image with packages and tools of your choice, and make them available to all the users sharing the RStudio on SageMaker domain. Bringing your own custom image has several benefits. You can standardise and simplify the getting started experience for data scientists and developers by providing a starter base image, pre-configure the drivers required for connecting to data stores, or pre-install specialised data science software for your business domain.
AWS Toolkit for Visual Studio
The AWS Toolkit for Visual Studio is an open source extension for Microsoft Visual Studio running on Microsoft Windows that makes it easier for developers to develop, debug, and deploy applications using Amazon Web Services, allowing you to get started faster and be more productive. Developers can now access Amazon CloudWatch Logs within Visual Studio using the AWS Toolkit for Visual Studio. Directly from the IDE, it is now possible to search and filter log groups, log streams, and events. Additionally, log groups can be accessed from their associated resources, and log events can be downloaded to a file.
The latest release of the toolkit includes several convenient CloudWatch logs features. Visual Studio users can list CloudWatch Log groups from the CloudWatch Logs node in the AWS Explorer. Individual log groups can be opened in a document tab, where you can view the log group’s streams, as well as export stream events to a local file. While viewing a log stream, you can search and filter log messages using keywords or phrases, such as “Exception” or “Error”. You can also search using a time range, to see events that led up to and resulted from the error you were searching.
Videos of the week
Join my colleague in a recording of his user group meetup last week, where you will understand how you can implement Infrastructure as Code (IaC) and Continuous Integration and Continuous Deployment (CI/CD) using AWS Cloud Development Kit. Donnie shows you with step-by-step demos and after you will understand the overall concept of IaC and CI/CD, AWS services you need to use and developer tool for seamless integration with your development workflow.
Great video from the re:Mars event from the lovely folks at Astronomer, one of the leading groups behind engineering efforts on Apache Airflow. In this session, Steven Hillon VP of Data shows you how data science teams use Airflow to run models in production.
David Tippett, Developer Advocate in the OpenSearch team joins the HPE Technology developer community to share news about OpenSearch and how you can use it to help you search through all your data.
Events for your diary
Build On Live: Data Analytics July 7th, 9am PDT
Don’t miss this live streamed event from my colleagues in AWS, where they cover all things data on AWS. There are some fabulous sessions covering some of your favourite open source data technologies (Apache Airflow, Apache Hudi, Amazon EMR and more).
This is being live streamed on Twitch, so you do not need to register. Make sure you put it in your diary so you remember, and find out more info on the Build On Live 2022 Schedule page
BOSC 2022 July 13-14, Madison, Wisconsin, USA
The Bioinformatics Open Source Conference (BOSC) has been held annually since 2000, and this year AWS is proud to be a platinum sponsor for this event. BOSC covers all aspects of open source bioinformatics software and open science, including (but not limited to) these topics, Open Science and Reproducible Research, Open Biomedical Data, Citizen/Participatory Science, Standards and Interoperability, Data Science Workflows, Open Approaches to Translational Bioinformatics, Developer Tools and Libraries, Inclusion, and Outreach and Training. This is a hybrid event (in person/virtual) and you find out more by checking out the event page, BOSC 2022
OpenSearch Every other Tuesday, 3pm GMT
This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.
Sign up to the next session, OpenSearch Community Meeting
OpenSearchCon 2022 Sept 21st, 2022 Seattle
Come to the first annual OpenSearchCon!
This day-long conference will be packed with presenters who build and innovate with OpenSearch. It doesn’t matter if you’re just getting started on your OpenSearch journey, running giant clusters, or contributing tons of code; the event is for everyone. Join us to celebrate the progress and look into the future of the project. Admission is free, and registration will be open in the next few weeks. All you will need to do is sign up, and get to Seattle!
Check out the full details, including signing up and location, at the meetup page here.