AWS open source news and updates No. 34
September 7th - Instalment #34
Week No.34, and another packed issue with some great posts, projects and events covering all your favourite open source projects. This week there has been a lot of focus on Bottlerocket, which went GA last week, so plenty of stuff to read as well as several videos to watch. There are a couple of events happening later this week, so check the events just a little below and save the date in your diary.
Before you get on with the rest of this newsletter, I just want to share this event happening in a couple of weeks that I have helped shape. I would urge you to set aside some time if you can, as each of these speakers are subject matter experts in their topics.
Dev Day: Open source edition - if you haven’t already done so, this is a must attend event, with four excellent sessions covering open source topics from Kubernetes networking, looking at different options on running Apache Flink, building machine learning workflows with Kubernetes and Kubeflow Pipelines and finally full-stack TypeSafety with React, GraphQL, and TypeScript. Use the link on that page to sign up and then sit back with the glow of satisfaction.
Your feedback matters!
I have put together a short feedback survey, which I would ask you to take - it will take no more than 2 minutes. You can access here. Many thanks!
Celebrate open source contributors
The articles posted in this series are only possible thanks to contributors and project maintainers and so I would like to shout out and thank those folks who really do power open source and enable us all to build on top of what they have created.
So thank you to Suthan Phillips, Chao Gao, Adam Youngberg, Ran Ribenzaft, James Saryerwinnie,Ben Smith, Keith Gregory, Steve Bryen, Sebastian Crossa, Mark Birch, trek10inc, Moheeb Zara, Viraj Phanse, Samartha Chandrashekar, Chaim Rand, Kushal Koolwal, Jingzhao Ni, Matt Klein, Lizan Zhou, Drew Wright, Alex Williams, Curtis Rissi, Peder Ulander, Karan Jariwala, Chaitanya Bapat, Jaoquin Menchaca, Brandon Kimberly, Siva Ramani, Naveen Balaraman, Ankit Bhargava, and Hudson Humphries
Make sure you find and follow these builders and keep up to date with their open source projects and contributions.
Events for your diary
Check out these events happening over the coming weeks. If you have an event you want me to share with readers, please drop me a message.
State of the Source 9th September, 3PM BST
The State of the Source Summit invites open source communities of practice from around the world to organise and contribute to a global conversation on the current state of open source software: non-technical issues that foster development and community, the licenses that enable collaboration, the practices that promote contribution, and the issues confronting cooperation.
Machine Learning workshop September 10th, 2PM BST
Unifying Data Pipelines and Machine Learning with Apache Spark™ and Amazon SageMaker a Databricks and Immuta workshop to learn how Unified Data Analytics can bring Data Science, Business Analytics and engineering together to accelerate your Data and ML efforts.
Dev Day: Open Source September 17th, 10:00 BST - 4:00PM BST
Dev Day: Open source edition - our first Dev Day featuring open source topics from Kubernetes networking, looking at different options on running Apache Flink, building machine learning workflows with Kubernetes and Kubeflow Pipelines and finally full-stack TypeSafety with React, GraphQL, and TypeScript.
MLops Best Practices with Amazon SageMaker and KubeFlow September 15th, 9am Singapore TZ
This webinar will present best practice deployment of Machine Learning workloads using Amazon Sagemaker and Kubeflow (the open source toolkit for Kubernetes). Details and registration can be found here.
CDK Day September 30th, 3PM BST
Now this is an event you should put in your diaries, an event that has been put together by the AWS CDK community (and a lot of familiar faces from posts I have shared in this weekly newsletter), has a fantastic line up of speakers and promises to be an unmissable event if you are thinking of or looking at AWS CDK. Find more details, including speaker line up and registration, at the link -> https://www.cdkday.com/
DebConf
DebConf happened at the end of August and AWS was a platinum level sponsor of this event. You can check out the sessions here and many have uploaded links to videos/slides which you can reference, and many of the sessions have shared notes too. My favourite session was Doing things together - very thought provoking, and certainly not what I expected. Nice work Enrico Zini and Ulrike Uhlig.
Latest from open source projects
AWS Power Tools for Java
aws-lambda-powertools-java this hot Python project which is a must have project when creating your serverless applications is now available for Java developers. Also, incase you missed it, 1.5 of the python version of AWS Lambda Power Tools dropped. Check it out here.
AWSets
AWSets from Jeff and the folks at trek10inc, is an open source utility for crawling an AWS account and exporting all its resources for further analysis. Simple to use and get up and running, this little tool will be very handy for your governance/auditing jobs or just trying to take stock of what you have in your AWS account. Not every thing is covered, but if you like it why not get involved?
aws-serverless-document-scanner
aws-serverless-document-scanner another awesome project from Moheeb Zara, walking you through how to use AWS Amplify to build a serverless document scanner. He has written this up in an easy to follow post, Building a serverless document scanner using Amazon Textract and AWS Amplify which you can build this project and perhaps use it as a base for your own experimentations.
Red Commander
Red Commander this project from Alex Williams at GuidePoint security, and supporting blog post, Introducing Red Commander: A Guidepoint Security Open Source Project helps you provision Red Team C2 infrastructure using Ansible automation scripts.
kibana-notebooks
kibana-notebooks is an interesting project from the Open Distro team, Kibana Notebooks. Kibana Notebooks enable data-driven, interactive data analytics and collaborative documents to be created and used as live notes in Kibana. Check out the RFC here As a side note, I found this via Shenoy Pratik Gurudatt, who has just finished an internship at AWS (read about that here) - good luck for the future Shenoy.
bucket-brigade
Make sure you read the detailed post from Adam Youngberg that provides some background on the prior art, Bucket Brigade — Securing Public S3 Buckets is an open source tool from Databricks that can help you monitor your Amazon S3 buckets to see if you have left any publicly open.
infracost
infracost is a neat open source project that I just love the idea of, and is probably going to be a must check out if you are using Terraform. This project creates AWS cost estimates for your Terraform projects, broken down by hour/month costs. Brilliant stuff from the folks at Infracost.
Fresh blog posts for your reading pleasure
Envoy on AWS Graviton2
CNCF Project Envoy enables Arm64 CI using Azure Pipelines on AWS Graviton2 this post from the CNCF Envoy contributors Kushal Koolwal (Arm), Jingzhao Ni (Arm), Matt Klein (Envoy/CNCF), Lizan Zhou (Tetrate) shows how this project is extending the Envoy build system to enable multi architecture builds for Envoy, allowing customers to deploy on AWS Graviton2 instances.
TensorFlow
TensorFlow Performance Analysis this nice deep dive from Chaim Rand walks you through how to approach optimising your training sessions when working with TensorFlow. The ability to analyse and optimise the performance of your training sessions can lead to meaningful savings in time and cost, and you will take away some great tips on how to approach this after reading this post.
AdaptDL
Introducing AdaptDL, an Open Source resource adaptive deep-learning framework this launch announcement from Petuum of AdaptDL, and open source resource-adaptive deep learning (DL) training and scheduling framework. The goal of AdaptDL is to make distributed DL easy and efficient in dynamic-resource environments such as shared clusters and the cloud., and the post shows some pretty impressive cost saving you might be able to achieve. The blog post provides some starters and an introduction on how you can get going. Check out the GitHub repository for more info.
Zim
Introducing Zim: A caching build system for teams using monorepos this post from Drew Wright talks about Zim, an open source caching build system for software development teams using monorepos that contain many components and dependencies, and provides fast incremental, parallel builds across a team. This post shows you how they leveraged AWS services to scale, and the post then walks you through how you can use Zim via a simple walkthrough. If you favour a monorepo then this might be a useful project to look at.
Mocking AWS
Mocking AWS with Jest (and TypeScript) in this fantastic post from Matt Morgan, he dives deep into mocking AWS using a number of open source tools, many of which I have featured in earlier editions of this newsletter.
SAM on Linux
Running the SAM CLI on Linux Keith Gregory, Chariot’s AWS Practice Lead shares a quick post and tip on how to get AWS SAM cli running on Linux, so if Linux is your development platform of choice, then you need to read this quick post.
Transformative power of an open source value chain - podcast
The transformative power of an Open Source value chain this podcast from Heretechs that features host Justin Arbuckle, colleague Mark Birch, and Adam Jacob (CoFounder of Chef). Starts off with the question, why on earth would I trust open source….if you want to know more, you need to listen on.
As a bonus, if you have not read Mark’s excellent post, From Developer Experience to Enablement then check it out. Some great insights on developer experience, enablement and how in the age where every company is a software company, is relevant not just to the tech companies.
AWS open source posts
Bottlerocket
Announcing the General Availability of Bottlerocket, an open source Linux distribution built to run containers post from Samartha Chandrashekar
Next up we have some videos that covered Bottlerocket. First up, check out Peder Ulander talking with Nicole Hemsoth on Next Platform TV about Bottlerocket.
{% youtube dpEjl20zTNw %}
There was more from Peder in this post, Five Things To Know About Bottlerocket, AWS’ New Container-Optimized Linux and the post shares some nuggets that make this a must read.
If you missed the Bottlerocket session from Kubecon AWS Container day, then here it is. Justin Haynes walks you through this project, what it is, why we created and how to get started. Twenty minutes well worth your time.
{% youtube L33l7Yd8oZM %}
The final video is from Containers from the Couch (no, it’s not a new container orchestration system) that features Justin Garrison, Brent Langston and Adam Keller also talking about Bottlerocket. Check them out here.
{% youtube 6NM5V3lH0tc %}
We then had How to Get Started with Bottlerocket OS from AWS Partner epsagon, with Ran Ribenzaft writing a nice overview post of Bottlerocket covering some use cases he sees where it might be useful.
Getting Started with Bottlerocket and Certified AWS Partners is the final Bottlerocket post, where Curtis Rissi talks about the AWS Partner opportunities that exist for using Bottlerocket for the benefit of their own customers. So read on to find out more what those opportunities are and how to get started.
Apache Hive
Amazon EMR supports Apache Hive ACID transactions - Apache Hive is an open-source data warehouse package that runs on top of an Apache Hadoop cluster. You can use Hive for batch processing and large-scale data analysis, and uses Hive Query Language (HiveQL), which is similar to SQL. In this post, Suthan Phillips and Chao Gao introduce the Hive ACID feature and explain how it works and its concepts with a straightforward use case. They go on to describe the default behaviour of Hive ACID on Amazon EMR, and offered some best practices.
Security
Fresh from the Cloud Builders You Tube channel are a couple of videos from Steve Bryen covering a couple of the open source projects on security. If you want to know more about git-secrets and policy sentry, projects I have spoken about in the past, then these short videos are a great way to bootstrap you.
{% youtube cSQ4PpbObR4 %} {% youtube UzpQnELZ8AA %}
AWS Amplify CLI and tagging
Organizing your AWS resources using Tags with the Amplify CLI, guest post from Sebastian Crossa, an Open Source contributor from the MLH Fellowship that will walk you through how to tag your resources if you are using the AWS Amplify cli tool. AWS Amplify enables mobile & web developers to build full stack serverless apps, and the AWS Amplify CLI helps developers to create backend resources through a guided workflow.
So who are MLH? In a separate blog post, Meet our MLH Fellows, introduces the Major League Hacking team, a group of students who got together to work on Amplify, and has been another good example of the advantages of building Amplify as Open Source Software. This is a must read post, so I don’t want to give too much away.
AWS Amplify, Android and RxJava
Using RxJava with AWS’ Amplify Android Library Jameson Williams takes a look at using Reactive Extensions (RxJava), and how you can use them with AWS’ Amplify Framework. RxJava is an open source library for composing asynchronous and event-based programs using observable sequences for the Java VM and has become one of the most popular Android libraries, something of a de-facto standard for Android codebases written in Java.
Setting up your developer environment
Jump-starting your serverless development environment another cracking post from one of my favourite AWS advocates, Ben Smith this time walks you through how to setup your developer environment to super charge your serverless development. No surprise that this involves plenty of great open source tooling.
AWS Chalice
Automatically deploy a Serverless REST API from GitHub with AWS Chalice another post from James ‘Chalice’ Saryerwinnie, this time showing you how to move beyond ‘chalice deploy’ and setting up a CI/CD pipeline so that you can work on your chalice projects collaboratively. If you use AWS Chalice, then you will need to check and bookmark this post.
.NET Core on Amazon EKS
Build and Deploy .Net Core WebAPI Container to Amazon EKS using CDK & cdk8s Siva Ramani and Naveen Balaraman pack a lot into this post, showing you how you can deploy an ASP.NET Core Web API application that uses various AWS Services, creating infrastructure as code using CDK8s to simplify the process and deploying onto Amazon EKS. This is a good recipe for application modernisation and how to move those .NET workloads onto containers.
Horovod and Apache MXNet
How to run distributed training using Horovod and MXNet on AWS DL Containers and AWS Deep Learning AMIs, Karan Jariwala and Chaitanya Bapat from the AWS Deep Learning team share how to run the distributed training using Horovod and MXNet on Amazon EC2 and Amazon EKS using AWS Deep Learning Containers and AWS Deep Learning AMI’s. Horovod is an open source frameworks that provides distributed training support to Apache MXNet, PyTorch, and TensorFlow. Why would you want to do this? Well, using this approach model training can be distributed across a cluster of instances, providing a significant increase in performance with only minimal changes to your training script.
Deep Learning book
Amazon team adds key programming frameworks to Dive into Deep Learning book I shared details about this book on deep learning that has been put together by data scientists from Amazon. This post talks about how the examples have now been extended to include PyTorch and TensorFlow, something that students and other readers of the book had been asking about. Read the post for more details.
Dgraph on AWS
Dgraph on AWS: Setting up a horizontally scalable graph database post from Jaoquin Menchaca, will show you how to set up a resilient highly available Dgraph cluster on AWS. Dgraph is an open source, distributed graph database, built for production environments, and written entirely in Go, and has client integrations with official clients in Go, Java, Python, JavaScript, and C#; and community-supported clients with Dart, Rust, and Elixir. Dgraph users also can use any of the tools and libraries that work with GraphQL.
OpenTelemetry
AWS adds observability metrics to the OpenTelemetry C++ library is a post brought to you by AWS interns Brandon Kimberly, Ankit Bhargava, and Hudson Humphries and talks about their first engineering contributions to the popular open source observability project OpenTelemetry. OpenTelemetry is a complete solution that solves the problem of collecting telemetry metrics, and its mission is to develop an open, industry-wide standard for telemetry data, as well as providing reference implementations with universal tools that support metrics, tracing, and logs. Recently we made contributions to OpenTelemetry that included the metrics collection and processing functionality for the C++ library. These metrics are collected from instrumented applications and infrastructure. They allow users to monitor the health of their services, improve performance, and detect anomalies.
Open Distro for Elasticsearch
Power data analytics, monitoring, and search use cases with the Open Distro for Elasticsearch SQL Engine on Amazon ES Viraj Phanse dives deep into Open Distro for Elasticsearch’s SQL Engine and how it provides a comprehensive, flexible, and user-friendly set of features to obtain search results using SQL. When I have run Open Distro for Elasticsearch workshops, this has always been one of the most popular parts that gets the most engagement/interest. Learn about the SQL workbench, the SQL cli, query support, JDBC drivers and more.
Case Study
NatureServe
Promoting biodiversity conservation with open data and the cloud, this guest post from Lori Scott, chief information officer (CIO) at NatureServe, and Sean O’Brien, president and chief executive officer (CEO) at NatureServe, share how their organization is using Amazon Web Services (AWS) and open data to promote biodiversity conservation.
Quick updates
Amazon Corretto 15
Amazon Corretto 15 Release Candidate (RC) - Amazon now supports the latest Java Feature Release JDK 15 by introducing Amazon Corretto 15 RC (Release Candidate). Corretto 15 is available on Linux, Windows and macOS. Download Corretto 15 RC directly here.
Amazon RDS PostgreSQL
Amazon Aurora PostgreSQL supports rdkit extension. The RDKit extension allows cheminformatics to deal with manipulation of chemical structures, fingerprinting search functions and molecular structure matching. Using PostgreSQL as a datastore, you can interact with several built-in functions to compare, manipulate and identify molecular structures via standard SQL. RDKit is only supported for version 11.8 or higher. If you create the RDKit extension on a version prior to 11.8, you will need to drop and recreate it when you convert to 11.8.
AWS System Manager and Ubuntu
Patch Manager, a capability of AWS Systems Manager, now allows you to deploy patches automatically to instances running any current versions of Ubuntu. Until now, Patch Manager supported current versions of Ubuntu up to 18.04. With this release, Patch Manager now supports Ubuntu’s latest version - Ubuntu 20.04. You can use Patch Manager to automatically patch Linux instances running Red Hat Enterprise Linux (RHEL), Ubuntu Server, Amazon Linux, Amazon Linux 2, CentOS, Oracle Linux, Debian, and SUSE Linux Enterprise Server (SLES).
digital training course: Amazon FSx for Lustre Primer
This week we announced a free new digital course from AWS Training and Certification: Amazon FSx for Lustre Primer. This course explains how to get started with Amazon FSx for Lustre, a fully managed file system service for machine learning, high performance computing, and other workloads. This intermediate, 100-minute course offers self-paced reading modules, video demonstrations, and a quiz to check your knowledge. The course is designed for storage engineers, systems administrators, and cloud architects. In this course you’ll learn about the key benefits of and common use cases for Amazon FSx for Lustre, along with the cost model associated with the service. You’ll also explore how to set up and monitor your file system more effectively. In addition, you’ll learn how to view the key metrics collected by Amazon CloudWatch.
Enrol via this link.
Share your open source projects
Do you have some content you want to share with a broader audience? We are always looking for guest content for the AWS Open blog. Please get in touch (via comments below) and I would love to speak with you about what you are doing in open source. We are always looking for interesting new content.
The best submissions will get some AWS Credit codes as a thank you.
Stay in touch with open source at AWS
I hope this summary has been useful. Remember to check out the Open Source homepage to keep up to date with all our activity in open source by following us on @AWSOpen