AWS open source newsletter #196
Edition #196
Welcome to issue #196 of the AWS open source newsletter, the newsletter where we try and provide you the best open source on AWS content.
As always, more great new projects are featured in this edition of the newsletter, including a link to the Valkey repo, a nice GUI based project to help you build orchestration workflows that uses Apache Airflow under the covers, a tool to help you find signals through the noise of your security logs, a project to help you run serverless tasks in a cron like fashion, a command line runner for Amazon CodeCatalyst, a tool to help you simplify the deployment of Cruise Control on Amazon MSK, a nice Mac client for experimenting with Amazon Bedrock, and some really cool demo apps, the pick of which (for me) is a nice way of surfacing up your Amazon Bedrock models in a way that existing applications that expect an API key can use.
Also featured in this weeks edition is content on Valkey, Apache Airflow, OpenSearch, LangChain, PostgreSQL, WordPress, RAGmap, RAGxplorer, Cedar, AWS CDK, Lambda Web Adapter, Postfix, Spring Boot, Amazon Corretto, Kubernetes, Karpenter, KEDA, Prometheus, OPA, Amazon EMR, PySpark, MySQL, Open JD, AWS Amplify, GraphQL, AWS PDK, Apache Livy, and Nodestream. Make sure you check out the events section this week, I have added more events, and hopefully will see some of you at the AWS Security meet-up in the AWS Thailand office this week, where I will be talking about Cedar.
Latest open source projects
The great thing about open source projects is that you can review the source code. If you like the look of these projects, make sure you that take a look at the code, and if it is useful to you, get in touch with the maintainer to provide feedback, suggestions or even submit a contribution. The projects mentioned here do not represent any formal recommendation or endorsement, I am just sharing for greater awareness as I think they look useful and interesting!
Tools
valkey
valkey was featured in the last edition of this newsletter, and is the home of development on the formerly open-source Redis project. The repo provides instructions on how to build against a number of different targets, but this community project is still in progress. Kyle Davis announced last week that Valkey 7.2.5 GA is out!, so if building from source is not your thing, head over to the public container repositories where all the work has been done for you.
domino
domino is a new open source workflow management platform that provides a very nice GUI and drag and drop experience for creating workflows. Now regular readers of this newsletter will know I am a big fan of the Node Red open source project, and I got very strong Node Red vibes about the GUI, which is a good thing. Under the covers, we have another favourite project of mine, Apache Airflow. If you head over to the main documentation site https://www.domino-workflows.io/, there is a public demo of it running in Amazon EKS, so you can try it out before installing it yourself. If you love this project, get in touch with the maintainers, they would love your feedback!
CloudConsoleCartographer
CloudConsoleCartographer is a project that was released at Black Hat Asia on April 18, 2024, Cloud Console Cartographer is a framework for condensing groupings of cloud events (e.g. CloudTrail logs) and mapping them to the original user input actions in the management console UI for simplified analysis and explainability. It helps you detect signals from the noise more efficiently, which is always important when you are dealing with security incidents. If you want to find out more, I recommend checking out Daniel Bohannon’s post, Introducing Cloud Console Cartographer: An Open-Source Tool To Help Security Teams Easily Understand Log Events Generated by AWS Console Activity (unofficial winner of this weeks longest blog post title!) which provides more background as well as several examples to explain how you can use this. Another bonus is the lovely rainbow ascii graphics that this tool seems to adopt - yay!
serverless-lambda-cron-cdk
serverless-lambda-cron-cdk This repository provides a starter kit for setting up cron jobs using AWS Lambda. It includes the necessary AWS Cloud Development Kit (CDK) deployment code, a CI/CD pipeline, as well as the source code for the Lambda function. The kit is designed to be easily configurable and deployable, allowing for quick setup and iteration. It’s ideal for developers looking to automate tasks on a schedule using AWS Lambda.
codecatalyst-runner-cli
codecatalyst-runner-cli This repository contains a command line tool that will allow you to run Amazon CodeCatalyst workflows locally. The README provides the instructions for quickly installing and getting started, so if you have been using Amazon CodeCatalyst and looking for this, look no more.
cruise-control-for-msk
cruise-control-for-msk is a repo that provides AWS CloudFormation templates that simplifies the deployment and management of Cruise Control and Prometheus for monitoring and rebalancing Amazon MSK clusters. Amazon MSK is a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data. With this new CloudFormation template, you can now integrate Cruise Control and Prometheus to gain deeper insights into your Amazon MSK cluster’s performance and optimise resource utilisation. By automating the deployment and configuration of Cruise Control and Prometheus, you can improve operational efficiency, reduce the time and effort required for manual setup and maintenance, and allow you to focus on higher-value tasks. Check out the README for more details.
amazon-gamelift-agent
amazon-gamelift-agent is a Java application that is used to launch game server processes on Amazon GameLift fleets. This application registers a compute resource for an existing Amazon GameLift fleet using the RegisterCompute API. The application also calls the GetComputeAuthToken API to fetch an authorisation token for the compute resource, using it to make a web socket connection to the Amazon GameLift service. One for the game developers.
aws-emr-advisor
aws-emr-advisor started as fork of Qubole SparkLens, this tool can be used to analyse Spark Event Logs to generate insights and costs recommendations using different deployment options for Amazon EMR. The tool generates an HTML report that can be stored locally or on Amazon S3 bucket for a quick review.
amazon-bedrock-client-for-mac
amazon-bedrock-client-for-mac this repo provides the code for the Amazon Bedrock Client for Mac is a macOS demo application built with SwiftUI. It serves as a client interface for AWS Bedrock, allowing users to interact with AWS Bedrock models.
Demos, Samples, Solutions and Workshops
aws-aurora-db-vertical-autoscaler
aws-aurora-db-vertical-autoscaler is a project that I heard about from Dmitry Shurupov (thanks for reaching out!) that helps you implement vertical autoscaling for Aurora for Postgres using Lambda functions. Oleg Mironov put together a blog post to go into more details, including a nice detailed flow diagram of how this code works. So if this is something that looks interesting to you, go check out Implementing vertical autoscaling for Aurora databases using Lambda functions in AWS, and thanks again Dmitry.
bedrock-access-gateway
bedrock-access-gateway provides an OpenAI-compatible RESTful APIs for Amazon Bedrock. Amazon Bedrock offers a wide range of foundation models (such as Claude 3 Opus/Sonnet/Haiku, Llama 2/3, Mistral/Mixtral, etc.) and a broad set of capabilities for you to build generative AI applications. Check the Amazon Bedrock landing page for additional information. Sometimes, you might have applications developed using OpenAI APIs or SDKs, and you want to experiment with Amazon Bedrock without modifying your codebase. Or you may simply wish to evaluate the capabilities of these foundation models in tools like AutoGen etc. Well, this repository allows you to access Amazon Bedrock models seamlessly through OpenAI APIs and SDKs, enabling you to test these models without code changes.
opensearch-for-gophers
opensearch-for-gophers This project contains an example that showcases different features from the official Go Client for OpenSearch that you can use as a reference about how to get started with OpenSearch in your Go apps. It is not intended to provide the full spectrum of what the client is capable of—but it certainly puts you on the right track. You can run this code with an OpenSearch instance running locally, to which you can leverage the Docker Compose code available in the project. Alternatively, you can also run this code with Amazon OpenSearch that can be easily created using the Terraform code also available in the project. Nice README that provides useful examples to get you going.
svdxt-sagemaker-huggingface
svdxt-sagemaker-huggingface is the latest demo repo from regular contributor Gary Stafford, that showcases some of the cool stuff Gary has been writing about in the generative AI space. This time he takes a look at the emerging field of generating videos through Stability AI’s Stable Video Diffusion XT (SVT-XT). This foundation model is a diffusion model that takes in a still image as a conditioning frame and generates a video from it. The repo provides everything you need to get started, and showcases some of the videos that Gary has created - it is pretty cool stuff!
AWS and Community blog posts
Each week I spent a lot of time reading posts from across the AWS community on open source topics. In this section I share what personally caught my eye and interest, and I hope that many of you will also find them interesting.
The best from around the Community
We had posts that cover a lot of open source technologies this week, but I am certainly seeing a growing trend for content on all things generative AI. First up we have my colleague Ricardo Ferreira who put together this post about how he used Amazon Q Developer, to refactor and modernise an application in Go to use OpenSearch in the post Goodbye Elasticsearch, Hello OpenSearch: A Golang Developer’s Journey with Amazon Q (Lessons Learned). Abishek Gupta put together two must read posts in this area, starting with Vector Databases for generative AI applications where he shares how to can overcome limitations using Vector databases and RAG, and then How to use Retrieval Augmented Generation (RAG) for Go applications showing you how you can implement RAG with LangChain and PostgreSQL using Go. João Galego takes a look at two open source projects (RAGmap and RAGxplorer) that help you explore those embeddings that we are increasingly creating in his post, Mapping embeddings: from meaning to vectors and back. The last post in this generative AI round up is from my colleague and long time WordPress fan Rio Astamal, who shares how you can use generative AI to build a Wordpress plugin that integrates with Amazon Bedrock, in the post I built a WordPress AI plugin to make authors more productive. Here’s how. Grab a cup of your favourite beverage when reading that one, good stuff.
From all things generative AI we move on to other open source technologies. AWS Hero Daniel Aniszkiewicz has been creating some amazing content on all things Cedar and Amazon Verified Permission, and he is back again with a really great post that will help you get started with understanding how you can use and create policies and entities in Cedar with his post, Authorization and Amazon Verified Permissions - A New Way to Manage Permissions Part XIV: AVP Getting Started. AWS Community Builder Johannes Konings is back again with more great AWS CDK content, this time he shares his thoughts about using Tags in CDK in his post, Consideration about cdk-notifier and Tags. Lambda Web Adapter (LWA) is a project I featured many moons ago in this newsletter, and AWS Community Builder Zied Ben Tahar has put together, Adding flexibility to your deployments with Lambda Web Adapter, to explore how to use LWA with CDK to simplify the deployment of your Web apps in Lambda and how to easily transition to ECS Fargate. A good read this one. If you are using Amazon Simple Email Service (SES), then Vivek Gite has put together this post How to configure AWS SES with Postfix MTA on Debian Linux, provides a tutorial that walks you through configuring Amazon SES to work with Postfix. To conclude our community round up this week, we have AWS Community Builder Vadym Kazulkin who shares the third part of his Spring Boot series, Spring Boot 3 application on AWS Lambda - Part 3 Develop application with AWS Serverless Java Container where he shows you how you can use Spring Boot 3.2 as part of your Lambda functions.
Apache Airflow
As a regulat contributor to content on Apache Airflow, I am always looking out for interesting posts and this week we have a great post from Jayesh Shinde and Harshad Yeola, Dynamic DAG generation with YAML and DAG Factory in Amazon MWAA, that explore the process of creating Dynamic DAGs with YAML files, using the DAG Factory library. The post takes you through the background of what DAG Factories are, and then walks you through an example. I actually tried this out and it worked great, and helped me answer a question that I had about these. Go read and try this out for yourselves. [hands on]
LangChain
If you are doing any kind of work with foundational large language models, then you are likely to be using LangChain. If you were not aware, the LangChain documentation now has a great reference section on how to integrate with Amazon Bedrock, which you can check out here. It covers everything from foundation models, Embeddings, Document Loaders, Vector data stores and more.
Cloud Native round up
- Scale AI training and inference for drug discovery through Amazon EKS and Karpenter provides a case study from Iambic Therapeutics that shows how they use Karpenter on Amazon Elastic Kubernetes Service (Amazon EKS) to scale AI training and inference [hands on]
- Open source observability for AWS Inferentia nodes within Amazon EKS clusters walks you through the Open Source Observability pattern for AWS Inferentia, which shows you how to monitor the performance of ML chips, used in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster, with data plane nodes based on Amazon Elastic Compute Cloud (Amazon EC2) instances of type Inf1 and Inf2 [hands on]
- Using OPA to validate Amazon EKS Blueprint Templates explores the benefits of using OPA to scan your Amazon EKS Blueprints for Terraform as code, and how it can help you maintain a secure and compliant environment [hands on]
- Autoscaling Kubernetes workloads with KEDA using Amazon Managed Service for Prometheus metrics provides a hands on guide that shows you how to autoscale an application on Amazon EKS utilising KEDA and Amazon Managed Service for Prometheus [hands on]
Other posts and quick reads
- Run interactive workloads on Amazon EMR Serverless from Amazon EMR Studio looks at how to run interactive PySpark workloads in EMR Studio using EMR Serverless as the compute [hands on]
- Monitor query plans for Amazon Aurora PostgreSQL demonstrates how you can monitor query plans to maintain optimal database performance, and discuss some key use cases of the monitoring query plan feature [hands on]
- Reduce Amazon Aurora MySQL backup costs using MySQL Shell and Amazon S3 shares how to reduce the Aurora backup cost for long retention periods by using MySQL Shell integrated with Amazon Simple Storage Service (Amazon S3) [hands on]
- How to add Open Job Description in your render pipeline describes show how toe use the openjd-model-for-python and openjd-sessions-for-python libraries to accept and process OpenJD jobs [hands on]
- Building a Secure GraphQL API with AWS Amplify and AWS AppSync helps you get to grips with the integration of Amazon CloudFront and AWS AppSync to enforce domain-specific access on GraphQL APIs, addressing CORS challenge [hands on]
- Amazon OpenSearch Service Under the Hood : OpenSearch Optimized Instances(OR1) dives into the OpenSearch Optimised Instance family (OR1) instances, and how it can provide high indexing throughput and durability using a new physical replication protocol, and explores some of the challenges we solved to maintain correctness and data integrity [hands on]
Quick updates
Amazon Corretto
On April 16, 2024 Amazon announced quarterly security and critical updates for Amazon Corretto Long-Term Supported (LTS) and Feature Release (FR) versions of OpenJDK. Corretto 22.0.1, 21.0.3, 17.0.11, 11.0.23, 8u412 are now available for download. Amazon Corretto is a no-cost, multi-platform, production-ready distribution of OpenJDK. Grab them at the downloads page
AWS PDK blueprints
AWS Project Development Kit (AWS PDK) is an open-source tool to help bootstrap and maintain cloud projects. It provides building blocks for common patterns together with development tools to manage and build your projects. The AWS PDK lets you define your projects programatically via the expressive power of type safe constructs available in one of 3 languages (typescript, python or java). Under the covers, AWS PDK is built on top of Projen, and I mentioned this project back in edition #184 of this newsletter. You can now use AWS PDK blueprints in Amazon CodeCatalyst. You can now use the AWS PDK in CodeCatalyst through the PDK blueprints, enabling you to compose one or more such blueprints together to create an application comprising of a React website, Smithy API, and the supporting CDK infrastructure to deploy the application to AWS. Very nice.
Apache Airflow
Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud. Amazon MWAA now offers larger environment sizes, giving customers of the managed service the ability to define a greater number of workflows in each Apache Airflow environment, supporting more complex tasks that can utilise increased resources. With Amazon MWAA larger Environment sizes, customers can now create, or upgrade to, extra-large (XL) and 2XL Amazon MWAA environment sizes, in addition to the small, medium, and large sizes available previously, with double the resources in XL environments, and four times the resources in 2XL environments, compared to large, across Airflow workers, schedulers, web servers, and metadatabase. You can create or update to larger Amazon MWAA environments with just a few clicks in the AWS Management Console in all currently supported Amazon MWAA regions.
You can dive deeper into this update by checking out the post, Introducing Amazon MWAA larger environment sizes, where Hernan Garcia, Jeetendra Vaidya, and Sriharsh Adar explore the scenarios they are well suited for, and how you can set up or upgrade your existing Amazon MWAA environment to take advantage of the increased resources.
Apache Livy and Amazon EMR
You can now use Apache Livy to submit your Apache Spark jobs to Amazon EMR on EKS, in addition to using StartJobRun API, Spark Operator, Spark Submit and Interactive Endpoints. With this launch, customers will be able to use a REST interface to easily submit Spark jobs or snippets of Spark code, retrieve results synchronously or asynchronously while continuing to get all of the Amazon EMR on EKS benefits such as EMR optimised Spark runtime, SSL secured Livy endpoint, programmatic set-up experience etc. Apache Livy is a service that enables easy interactions with the Spark cluster over the REST interface. Prior to today’s launch, customers when running their batch or interactive workloads from their local environment over a REST interface were required to either use the StartJobRun API or Interactive endpoints. This resulted in customers required to make changes to their applications. Now, with today’s launch, we have simplified the experience where customers can easily create a Livy endpoint and use that endpoint to submit Apache Spark jobs to their Amazon EMR on EKS clusters over a REST interface. Additionally, for enhanced security, customers can also restrict access to the Livy endpoint by configuring authentication through one of the supported protocols such as Kerberos, Custom Auth.
PostgreSQL
Trusted Language Extensions for PostgreSQL (pg_tle) now supports client authentication hook that lets you run additional checks over the existing authentication process, allowing you to enhance the security posture of your databases. A hook is an internal callback mechanism available to developers for extending PostgreSQL’s core functionality. By using hooks, developers can implement their own functions or procedures for use during various database operations. This release also includes additional updates for pg_tle such as password checks across all the databases on a RDS DB instance and enhanced custom datatype to support TOAST. Support for client authentication hook, password checks, and enhanced custom datatype is available on database instances in Amazon RDS running PostgreSQL 16.2-R2 and higher, 15.6-R2 and higher, 14.11-R2 and higher, and 13.14-R2 and higher in all applicable AWS Regions.
Nodestream
The Amazon Neptune connector for Nodestream, the Parquet input file format for Nodestream, and the Nodestream Security Bill Of Material (SBOM) plug-in for CycloneDX and SPDX file formats, was announced last week. Nodestream is an open source project for ETL (Extract Transform Load), designed to be flexible and extensible, allowing to define how data is collected and modelled as a graph. It uses a pipeline-based approach to define how data is collected and processed, and it provides a way to define how the graph should be updated when the schema changes. SBOMs help organisations improve the transparency, security, and reliability of their software applications. The Nodestream SBOM plug-in offers an opinionated graph data model for SBOM data analysis. It imports SBOMs from CycloneDX, a full-stack Bill of Materials (BOM) standard that provides advanced supply chain capabilities for cyber risk reduction, and SPDX the System Package Data Exchange for Linux. It also imports SBOMs from GitHub and from Amazon Inspector.
Kubernetes
You can now receive granular cost visibility for Amazon Elastic Kubernetes Service (Amazon EKS) in the AWS Cost and Usage Reports (CUR), enabling you to analyse, optimise, and chargeback cost and usage for your Kubernetes applications. With AWS Split Cost Allocation Data for Amazon EKS, customers can now allocate application costs to individual business units and teams based on how Kubernetes applications consume shared EC2 CPU and memory resources. Using Amazon EKS, customers create applications that automatically scale up and down and run in a highly available configuration across multiple Availability Zones. With Split Cost Allocation Data for Amazon EKS, customers get granular visibility into pod-level costs based on compute and memory utilisation. Customers can aggregate these costs by cluster, namespace and other Kubernetes primitives, allowing them to allocate costs to individual business units or teams. Customers can also identify unused CPU or memory resources, enabling opportunities to optimise their cluster configurations to minimise inefficiencies. After opt-in, these cost data will be available in the AWS CUR within 24 hours. Customers can use the Containers Cost Allocation dashboard to visualise the costs in Amazon QuickSight and CUR query library to query the costs using Amazon Athena.
Check out this post, Improve cost visibility of Amazon EKS with AWS Split Cost Allocation Data, where Shubir Kapoor and Mihir Surani show you how to set this up, and show you how it works.
Videos of the week
Why AWS backs Valkey, an open source alternative to Redis
David Nalley joins Swapnil Bhartiya to discuss the transition of Redis to Valkey at the Linux Foundation. David also talks about AWS’ involvement in open source and why it plays such a crucial role for the company. He says, “At the end of the day, open source is about collaboratively solving problems and not having to reinvent the wheel every time we need to accomplish something.”
Migrating from Serverless Framework to CDK
Join AWS Hero Rehan van der Merwe and Amo Moloko, a JavaScript engineer based in Cape Town, South Africa, as they explore how mono-repos enable serverless event-driven architectures for modern teams, through the power of AWS CDK.
Events for your diary
If you are planning any events in 2024, either virtual, in person, or hybrid, get in touch as I would love to share details of your event with readers.
AWS Security Meetup Group TH - Session #4 May 2nd, AWS Thailand office (Bangkok)
I will be joining the AWS Security Meetup later this week to talk about Cedar, the open source domain language for authorisation. Check out the link for locations and to reserve your spot.
OpenSearchCon Europe May 6th-7th, Berlin Germany
I am happy to share news of the launch of a European edition of OpenSearchCon, so make sure you mark these dates in your diary. OpenSearchCon Europe has now joined OpenSearchCon North America on our 2024 conference schedule. Read more about the event in the post, Announcing OpenSearchCon Europe 2024
Devoxx UK May 8th-10th, Business Design Centre in Islington
Devoxx UK is one of the most important developer events in the UK, and this year I will be joining my colleagues on the AWS booth to showcase Amazon Corretto, and other open source goodies for folks attending. The event is at the Business Design Centre in Islington, and there is still time to get tickets for this event. Three days of amazing content, including a talk on Lambda SnapStart - Under the hood which is going to be epic.
PyCon Italia May 22nd-25th, Florence Italy
I will be speaking at PyCon Italia in the wonderful city of Florence in May, talking and showing you how you can use Cedar within your Python applications. This is one of my most favourite events, with an amazing community that comes together over a few days to share their passion of all things open source. If you are coming, then I would love to meet you so get in touch.
BSides Exeter July 27th, Exeter University, UK
Looking forward to joining the community at BSides Exeter to talk about one of my favourite open source projects, Cedar. Check out the event page and if you are in the area, come along and learn about Cedar and more!
Cortex Every other Thursday, next one 16th February
The Cortex community call happens every two weeks on Thursday, alternating at 1200 UTC and 1700 UTC. You can check out the GitHub project for more details, go to the Community Meetings section. The community calls keep a rolling doc of previous meetings, so you can catch up on the previous discussions. Check the Cortex Community Meetings Notes for more info.
OpenSearch Every other Tuesday, 3pm GMT
This regular meet-up is for anyone interested in OpenSearch & Open Distro. All skill levels are welcome and they cover and welcome talks on topics including: search, logging, log analytics, and data visualisation.
Sign up to the next session, OpenSearch Community Meeting
Celebrating open source contributors
The articles and projects shared in this newsletter are only possible thanks to the many contributors in open source. I would like to shout out and thank those folks who really do power open source and enable us all to learn and build on top of what they have created.
So thank you to the following open source heroes: Zied Ben Tahar, Vadym Kazulkin, Vivek Gite, Johannes Konings, Daniel Aniszkiewicz, Rio Astamal, João Galego, Ricardo Ferreira, Abishek Gupta, Rehan van der Merwe, David Nalley, Swapnil Bhartiya, Amo Moloko, Hernan Garcia, Jeetendra Vaidya, Sriharsh Adar, Mark Stephens, Mark Wiebe, Daniel Neilson, Pooja Singh, Rekha Reddy Anupati, Aditya Samant, Jayesh Shinde, Harshad Yeola, Sameer Malik, Baji Shaik, Anil Maktala, Arundeep Nagaraj, Bukhtawar Khan, Gaurav Bafna, Sachin Kale, Ranjith Ramachandra, Rohin Bhargava, Matthew Welborn, Paul Whittemore, Alex Iankoulski, Riccardo Freschi, Shubir Kapoor, Mihir Surani, Piyush Mattoo, Hans Nesbitt, Siva Guruvareddiar, Imaya Kumar Jagannathan, Oleg Mironov, Dmitry Shurupov, and Gary Stafford.
Feedback
Please please please take 1 minute to complete this short survey.
Stay in touch with open source at AWS
Remember to check out the Open Source homepage for more open source goodness.
One of the pieces of feedback I received in 2023 was to create a repo where all the projects featured in this newsletter are listed. Where I can hear you all ask? Well as you ask so nicely, you can meander over to newsletter-oss-projects.
Made with ♥ from DevRel