AWS open source news and updates #139
December 18th, 2022 - Instalment #139
Welcome to the last AWS open source newsletter of 2022, edition #139. I am planning on take a few weeks off to recharge, and wish readers of this newsletter a fabulous Christmas and New Year. Over 100K of you have read this newsletter, so I want to thank you all for your continued support. This newsletter is only possible because of the passion and enthusiasm of open source Builders, and I look forward to seeing what 2023 will bring.
I hope some of you managed to catch the last episode of Build on Open Source for 2022. After a quick look back at the best of 2022, we had another awesome guest, AWS Community Builder Ran Isenberg. If you missed the session, don’t worry, you can catch up by checking out the Video section below.
This week we feature more great projects, including “jupyter-scheduler”, a JupyterLab extension for running notebook jobs ,“aiac” chatGPT but for creating Terraform code, “hardeneks” a tool to help you baseline your Amazon EKS environments, “aws-lambda-snapstart-java-rules” make sure your Java based lambda functions will work with SnapStart, “aws-tf-prowler-fargate” probably the easiest way to run Prowler on your AWS environments, “aws-iot-greengrass-v2-painless-installer” a great way to simplify how to install AWS IoT Greengrass v2, and many more. If reading is more your thing, then why not check out the tutorials, blog posts, hands on deep dives covering many of your favourite open source projects, including this week Terraform, AWS IoT Greengrass v2, Karpenter, PostgreSQL, MySQL, AWS Amplify, Apache Skywalking, Micronaut, Kubernetes, Grafana, Porting Assistant for .NET, Bottlerocket, Amazon EMR, AWS CDK, and Apache Ranger.
Make sure you don’t skip the Videos section as we have some top notch videos this week, and we wrap up with some events to check out for early 2023, including the State of Open Con 23 - the CFP is open and closes this Sunday so get busy and submit a talk for this event if you can.
Build on Open Source season two
Finally, before I leave you I want to share that we are currently planning the second season of Build on Open Source and we are looking for open source Builders, Maintainers, and enthusiasts to come on as guests and talk about their favourite projects or walk us through what they are currently working on. Could that be you? Please get in touch, either via email, commenting below, or contact Derek or myself via social media.
Celebrating open source contributors
The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.
So thank you to the following open source heroes: Jerome Van Der Linden, Laurence Geng, Medha Shree, Ayush Kumar, Swaminathan Jayaraman, Andreas Wittig, Sebastian Bille, Sai Vennam, Isaac Levin, Marcio Morales, Bruno Gabriel da Silva, Robert Northard, Elamaran Shanmugam, Premal Tailor, Salman Sali, Vadym Kazulkin, Yue Guo, Jimmy Ray and Sriram Ranganathan.
A look back at the most popular projects of 2022
As this is the last newsletter of 2022, I thought I would share what have been the most popular projects this year (measured by folks that have clicked on the links). These are just too good to miss, so check this list out and let me know what you think. What were your favourite projects of 2022? What is missing that you are surprised about?
2022 most popular projects
- querypal the most viewed project, provides a nice WebUI for Amazon Athena
- cfn-diagram is CLI tool to visualise CloudFormation/SAM/CDK stacks as visjs networks, draw.io or ascii-art diagrams
- driftctl helps you detect, track and alert on infrastructure drift
- infracost shows cloud cost estimates for Terraform
- cfn_nag is a linting tool for CloudFormation templates
- steampipe ise SQL to instantly query AWS resources across regions and accounts
- keycloak is an open source Identity and Access Management solution
- gnuradio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios
- eventcatalog helps you discover, explore and document your Event Driven Architectures powered by Markdown.
- ddb_local provides a Python wrapper for DynamoDB Local
- cloudquery-policies/aws is an open-source cloud asset inventory powered by SQL
- kronicle is an open source tool and dashboard for documenting and visualising a tech stack
- memq is an efficient, scalable cloud native PubSub system from Pintrest
Latest open source projects
The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution.
hardeneks this tool tuns checks to see if an Amazon EKS cluster follows EKS Best Practices.
aws-lambda-snapstart-java-rules introduced during re:Invent, you can use AWS Lambda SnapStart environments to make big improvements in your cold start times. There are some things as a developer you need to be aware of (check the docs) but this tool, the SnapStart Bug Scanner is the SpotBugs (an open source static analysis tool to look for bugs in Java code) plugin for helping AWS Lambda customers inspect their functions against potential bugs unique to AWS Lambda SnapStart environment. Using it will help you determine whether your code can run in AWS Lambda SnapStart environments.
aiac is a command line tool to generate IaC (Infrastructure as Code) templates via OpenAI’s API. The CLI allows you to ask the model to generate templates for different scenarios (e.g. “get terraform for AWS EC2”). It will make the request, and store the resulting code to a file, or simply print it to standard output. With the excitement around chatGPT, this project looks super interesting. I have already tried (and been impressed) with asking chatGPT to write basic code, so lets see how this does.
jupyter-scheduler is a JupyterLab extension for running notebook jobs. This extension is composed of a Python package named jupyter_scheduler for the server extension and a NPM package named @jupyterlab/scheduler for the frontend extension. Installation of this extension provides a REST API to run, query, stop and delete notebook jobs; the UI provides an interface to create, list and view job details. Read the post Introducing Jupyter Scheduler, where Jason Weill introduces the project and how you can get started.
aws-tf-prowler-fargate is a Terraform module helps you assess your multi-account environment in AWS Organizations using the open source Prowler security assessment tool deployed on AWS Fargate. It assesses all accounts using a time-based schedule expression in Amazon CloudWatch, creates assessment reports in CSV format, and stores them in an Amazon Simple Storage Service (S3) bucket.
aws-iot-greengrass-v2-painless-installer this solution allows installing Greengrass V2 on an edge gateway without requiring access to the AWS account where the device will connect. The User supervising the installation only needs to be able to authenticate with Amazon Cognito in order to initiate the installation process. No specific knowledge of the AWS Cloud nor Greengrass is necessary. The solution can be further customised for instance to federate user authentication to your enterprise identity management system (e.g. Active Directory).
anti-malware-scanning Hardik Singh Behl has put together this reference solution together on how you can us the open source anti virus tool ClamAV, and use event driven capabilities of Amazon S3 to execute a Lambda function that scans the file for bad things. Very nice.
cdk-s3-upload-presignedurl-api Jerome Van Der Linden has created this new AWS CDK construct to make your lives easier. If you want your users to be able to #upload a specific object to an Amazon S3 bucket, but you don’t want them to have AWS security credentials or permissions, you can use pre-signed URLs. It’s a common pattern but before Jerome created this construct, there was no IaC component that you could leverage directly. Great stuff Jerome.
Demos, Samples, Solutions and Workshops
Lucene.Net-AWS-Lambda-EFS this project from Salman Sali enables you to implement a Serverless Search for .Net hosted on AWS Lambda with Amazon EFS Storage.
amazon-keyspaces-with-apache-kafka this repository contains hands on content how to build a data pipeline to ingest real time data using managed open-source compatible services such as Amazon Elastic Kubernetes Service (EKS), Amazon Managed Streaming for Apache Kafka (MSK), and Amazon Keyspaces (for Apache Cassandra). Apache Kafka and Cassandra share distributed core capabilities like high availability, scalability, and throughput that makes them a good solution for large scale processing applications like IoT data, user metadata, trade monitoring, and route optimisation. This data pipeline can consume a sample stream from Twitter API which streams 1% of all the tweets in realtime as a data source, parse the tweets, metadata, and publish the parsed data to a Kafka topic. Kafka works as a distributed queue as well as a buffer layer to transport messages. MSK Connect consumes these messages from Kafka topic and writes them to Amazon Keyspaces tables.
The solution uses EKS to deploy containerised Twitter Event source application, the containerised application consumes, and a stream of tweets from Twitter API, parse the tweets (discards tweets that don’t have a hashtag), extract tweet metadata (created at, lang etc.), publishes these messages to Kafka topic twitter_input with desired fields using Kafka producer API. You will use the MSK Connect to ingest data from the twitter_input topic to Amazon Keyspaces.
iot-analytics-athena-ddb provides a solution for a reference architecture to analyse electricity data from smart meters, for maintenance purposes of a hypothetical Energy company, and for self-service analysis of customers, to understand how much electricity they consume.
amazon-sagemaker-clip-search this repository aims at building a machine learning (ML) powered search engine prototype to retrieve and recommend products based on text or image queries. This is a step-by-step guide on how to create SageMaker Models with Contrastive Language-Image Pre-Training (CLIP), use the models to encode images and text into embeddings, ingest embeddings into Amazon OpenSearch Service index, and query the index using OpenSearch Service k-nearest neighbors (KNN) functionality.
ack-rds-gitops-workshop roll your sleeves up for this great workshop, where you will learn to deploy a continuous integration and delivery (CI/CD) workflow using GitOps and the AWS Controllers for Kubernetes (ACK) service controller for Amazon RDS on Amazon EKS to create and manage Amazon Aurora Serverless v2 databases effectively. GitOps relies on Git as the single source of truth for declaratively managing containerised infrastructure and application components. With Git at the centre of CI/CD pipelines, developers can automate and simplify application deployments and operations to Kubernetes.
###2022 most popular posts
As this is the last newsletter of 2022, I wanted to highlight the most viewed blog posts of 2022 just in case some of you missed these. These are the top ten, so am hoping that some of these will be new to you.
- Presto® on Apache Kafka® At Uber Scale
- Dashboards as Code with HCL + SQL
- The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and Debezium
- Monitor AWS resources created by Terraform in Amazon DevOps Guru using tfdevops
- Progressive Delivery using AWS App Mesh and Flagger
- Parallel CDK stack deployments with GitHub Actions
- Use CDK8S To Create AWS Controllers for Kubernetes Custom Resources
- How Prime Video updates its app for more than 8,000 device types
- First Look at Lambda Powertools TypeScript
- AWS CDK v2 Tutorial – How to Create a Three-Tier Serverless Application
Congratulations to the writers of these posts, you have obviously put together something of interest to the broader open source and AWS community.
AWS and Community blog posts
From a retrospective look to the latest blog posts for you to enjoy.
Ran Tao, Cloud architect @Jina AI shares a detailed guide on how to lower the cost of using GPUs by using time-slicing, and how you can achieve that using Karpenter. In the post, Time-Slicing GPUs with Karpenter find out more about how you can use Karpenter and NVIDIA’s k8s plugin to achieve time-slicing on GPUs that will allow users to share GPUs between pods, and hence save on costs.
AWS Community Builder Vadym Kazulkin has written his latest blog post, a two part series on Measuring Java 11 Lambda cold starts with SnapStart. In the first, Measuring Java 11 Lambda cold starts with SnapStart - Part 1 First Impressions he provides a quick introduction and overview of this new capability within AWS Lambda, and in the follow up post, Measuring Java 11 Lambda cold starts with SnapStart - Part 2 Using Micronaut Framework revisits the first post but this time comparing what happens when using the Micronaut framework.
Apache SkyWalking is an open source Application Performance Monitoring (APM) tool for monitoring and troubleshooting distributed systems, especially designed for micro services, cloud native and container-based (Docker, Kubernetes, Mesos) architectures. My colleague Yue Guo has put together a blog post, How to run Apache SkyWalking on AWS EKS and RDS/Aurora on how to quickly set up Apache SkyWalking on AWS EKS and RDS/Aurora, as well as a couple of sample services, monitoring services to observe SkyWalking itself.
Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Amazon EMR enables fine-grained access control with Apache Ranger through a number of components (a Secret Agent and EMR record server). In the tutorial Apache Ranger and AWS EMR Automated Installation and Integration Series (4): OpenLDAP + Open-Source Ranger, Laurence Geng shows you how to use OpenLDAP as the authentication provider, and user accounts data store on it, and Ranger plays the authorisation controller. [hands on]
A few posts this week, starting off with Blue/Green Kubernetes upgrades for Amazon EKS Anywhere using Flux, where Robert Northard, Elamaran Shanmugam, and Premal Tailor detail a solution on how you can achieve blue/green Kubernetes platform deployments on Amazon EKS-A deployed on vSphere. [hands on]
Following that Marcio Morales and Bruno Gabriel da Silva have written Windows Authentication on Amazon EKS Windows pods, an end-to-end guide on how to configure an Amazon EKS cluster to exchange Kerberos tickets with an Active Directory Domain, allowing Windows pods to use Windows Authentication. [hands on]
Jimmy Ray and Sriram Ranganathan shared an overview of newly released Amazon EKS add-ons advanced configurations support in their post, Amazon EKS add-ons: Advanced configuration. Advanced configuration support for Amazon EKS add-ons allows you to set your configuration directly through the Amazon EKS add-ons API, to install and configure their operational software during cluster creation in a single step.[hands on]
At re:Invent we announced the availability of AWS Marketplace add-ons for Amazon EKS. This feature extends the add-ons experience to include operational software for security, storage, observability, and networking available in AWS Marketplace. Swaminathan Jayaraman and Sai Vennam walk you through this in the post, Deploy third-party software add-ons from AWS Marketplace to Amazon EKS clusters. [hands on]
Finally we have the post Expose Amazon EKS pods through cross-account load balancer from Medha Shree and Ayush Kumar who show you how to expose Amazon EKS pods through cross-account load balancing. [hands on]
Other posts and quick reads
- Deploy a Next.js 13 app with authentication to AWS Amplify explains how to create and deploy a Next.js 13 app with user authentication to Amplify Hosting in five steps [hands on]
- Cloud Brigade Accelerates Full-Stack App Development with AWS Amplify is a case study from Cloud Brigade and how they leverage AWS Amplify to accelerate development of solutions for their customers
- Partition existing tables using native commands in Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL shares how to use PostgreSQL native SQL commands to convert an existing non-partitioned table to a partitioned one [hands on]
- Datavail: Migrating and modernizing commercial databases to open-source database engines on AWS an interview with Datavail on how they are helping to migrate customers to open source databases
V2.55 release of AWS CDK was released, containing the usual new featurs, bug fixes and changes, which include:
- autoscaling: support default instance warmup for Auto Scaling groups
- cfnspec: cloudformation spec v101.0.0
- cognito: add new AdvancedSecurityMode property
- core: add volumes-from option to docker run command for bundling
- s3: update runtime of notifications-handler to python3.9
- s3-deployment: add additional sources with addSource
Amazon EMR on EKS now supports accelerated computing over graphics processing unit (GPU) instance types using Nvidia RAPIDS Accelerator for Apache Spark. The growing adoption of artificial intelligence (AI) and machine learning (ML) in analytics has increased the need for processing data quickly and cost efficiently with GPUs. Nvidia RAPIDS Accelerator for Apache Spark helps customers leverage the benefit of GPU performance while saving infrastructure costs. With this release, EMR on EKS customer can use the RAPIDS accelerator by simply specifying the Spark-RAPIDS release label when calling EMR on EKS API.
Until now, EMR on EKS customers had to create a custom image to use Nvidia RAPIDS Accelerator. This requires engineering and test effort. In addition, with every new Nvidia RAPIDS release, bug fixes or security updates, customers had to rebuild the custom image and go through the testing again. Starting with EMR 6.9, EMR on EKS is introducing a new Nvidia RAPIDS Accelerator for Spark image. Customers can use the same StartJobRun API to run their Spark jobs, and simply specify a new Spark-RAPIDS release label to leverage RAPIDS Accelerator on an EKS cluster with GPU supported instance type.
Prowler, the handy cloud security tool, new version 3.0 has been fully rewritten in Python and can scan your AWS account in minutes across all regions and covering more than 250 checks for the most popular AWS services. Prowler v3 also comes with a new check architecture, better compliance support and consolidated reporting formats.
Amazon Relational Database Service (RDS) Proxy now supports Amazon Aurora with PostgreSQL-compatibilie edition and Amazon RDS for PostgreSQL running major version 14. PostgreSQL 14 consists of performance improvements for parallel queries, heavily-concurrent workloads, partitioned tables, logical replication, and vacuuming. PostgreSQL 14 also improves functionality with new capabilities. For example, you can cancel long-running queries if a client disconnects and you can close idle sessions if they time out. With this launch, you can enforce SCRAM (Salted Challenge Response Authentication Mechanism) password-based authentication for proxy, making connections from your applications more secure.
Amazon Managed Grafana now supports AWS CloudFormation. You can use AWS CloudFormation templates to create, update, and delete your Amazon Managed Grafana workspaces, as well as manage or update workspace SAML authentication settings.
A couple of quick updates for AWS Amplify users.
Last week was the announcement of the general availability of v2.0 Amplify Library for Android. Amplify Library for Android allows developers building apps for the Android platform to easily include features like authentication, storage, maps, and more. This version of the library has been re-written to improve Android developers’ experience when using Auth and Storage features.
MySQL and PostgreSQL
Amazon DevOps Guru for RDS now detects if your Amazon Aurora database is receiving a significantly larger number of SQL queries and if those queries are reading more data than usual. This new functionality will help you to discover, in the event of degraded database performance, if an application traffic change is the likely cause of the performance degradation. Amazon DevOps Guru for RDS currently supports Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition
Videos of the week
Join Andreas Wittig as he speaks with Sebastian Bille, who shows us an open source project he created called IAM Legend. IAM Legend is an extension for Visual Studio Code. When editing IAM policies, the tool provides auto-completion and documentation for IAM actions. IAM Legend speeds up the process of writing IAM policies following the least-privilege principle.
Securing EKS Clusters Using Bottlerocket gives viewers an overview of Bottlerocket, current security challenges and related features, and a live demo showing how easy it is to set up.
Karpenter is a compute provisioning and management solution, which also acts as a cluster autoscaler. In this video, Sai Vennam shows you how open-source Karpenter works, and how it differs from Kubernetes Cluster Autoscaler.
Porting Assistant for .NET
Are you a .NET developer and you are still maintaining a few .NET Framework apps? Are you hearing about the latest in .NET and want to look into modernising? If so, there is an open source tool you should look into. The Porting Assistant for .NET from AWS is an Open Source tool that you can use today to help you in your journey to the latest versions of .NET. My colleauge Isaac Levin is your host as he walks you through the Porting Assistant and see how you can start using it today!
Build on Open Source
This newsletter was reviewed in the latest Build on Open Source show, S01E08. If you missed the show, you can still watch it over on YouTube
For those unfamiliar with this show, Build on Open Source is where we go over this newsletter and then invite special guests to dive deep into their open source project. Expect plenty of code, demos and hopefully laughs. We have put together a playlist so that you can easily access all (seven) of the other episodes of the Build on Open Source show. Build on Open Source playlist
Events for your diary
If you are planning any events in 2023, either virtual, in person, or hybrid, get in touch as I would love to share details of your event with readers.
FOSSDEM Feb 4-5th, 2023 in Brussles
FOSDEM is a free event for software developers to meet, share ideas and collaborate. Every year, thousands of developers of free and open source software from all over the world gather at the event in Brussels. 4 & 5 February 2023. A must attend event for all open source fans, check out and register via this link.
State of Open Con 23 Feb 7-8th, 2023 in London
OpenUK will be hosting a 1000 person plus two day conference in Central London, “State of Open Con 23” in association with IEEE, the headline sponsor. Check out more info and sign up here.
Everything Open March14-15th Melbourne, Australia
A new event for the fine folks in Australia. Everything Open is running for the first time, and the organisers (Linux Australia) have decided to run this event to provide a space for a cross-section of the open technologies communities to come together in person. Check out the event details here. The CFP us currently open, so why not take a look and submit something if you can.
OpenSearch Every other Tuesday, 3pm GMT
This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.
Sign up to the next session, OpenSearch Community Meeting