September 27th, 2021 - Instalment # 83
Newsletter # 83.
Welcome to issue # 83 of this newsletter, and more great new open source projects to check out. For infrastructure as code practitioners we have several projects for both CDK and Terrafrom, a CI/CD project to help you scale GitHub Actions runners, a simple hosting project with some nice features you can use as a baseline for your own project, a reference architecture for data analytics on AWS with some comprehensive CDK stacks you can inspect and borrow for your own, a project to help you visualise some of your key cloud metrics and more.
On top of that we have new posts covering Apache Airflow, Open Telemetry, Cortex, Kubernetes, Suricata, Spring Boot, moto, GraphQL and more. This weeks videos cover AWS SAM and Open Telemetry, and to finish off we have the events sections with events happening later this week.
Zappa makes it super easy to build and deploy serverless, event-driven Python applications. One of my first attempts in building serverless applications many moons ago was via this project, and the great tutorials and documentation that supported it.
Celebrating open source contributors
The articles posted in this series are only possible thanks to contributors and project maintainers and so I would like to shout out and thank those folks who really do power open source and enable us all to build on top of what they have created.
So thank you to the following open source heroes: Kenneth Winner, Jimmy Dahlqvist, Gokul Chandra, Michael Hausenblas, QP Hou, Björn Wilmsmann, Philip Riecks, Tom Hombergs, Adam Palmer, Jesper Eneberg, Talia Nassi, Ilan Gofman, Krzysztof Lis, Wojciech Matuszewski, Danilo Poccia, Jonas Birmé, Ian Mckay, Oscar Nord, Eddy Lin, Yin Song, Josiah Davis, Eden Duthie, Chen Wu, Nisha Notani, Timur Tulyaganov, and Yuriy Prykhodko.
Make sure you find and follow these builders and keep up to date with their open source projects and contributions.
Latest open source projects
permissions.cloud is an open source project from AWS Hero Ian Mckay that provides a handy IAM reference guide which I used this week when I was putting together my blog post (see below). You can view this online by heading over to https://permissions.cloud/
self-hosted-runners-on-aws check out this project from Jimmy Dahlqvist, that helps you setup and run GitHub Actions CI/CD runners on AWS in an auto scaled way. Reacts on GitHub webhooks to trigger auto scaling of EC2 spot instances.
cdk-lambdaless-apigw-websockets an interesting “experimental” project from Wojciech Matuszewski that explores how to build a websocket API without resorting to writing Lambda functions.
cdk-appsync-transformer Kenneth Winner has put together this CDK construct following on from a blog post he wrote on using the AWS Cloud Development Kit with AppSync. Kenneth has written this transformer in order to emulate AWS Amplify’s method of using GraphQL directives in order to template a lot of the Schema Definition Language.
authenticated-static-site I love these kinds of projects, this one is an example CDK project that shows how you can set up an authenticated static site. Using Amazon Cognito for authentication and Amazon S3 for hosting your site, it integrates a simple pipeline to push out your site content stored in GitHub, via AWS CodePipeline. I can see this being useful for a number of my demo projects, might just have to convert these CDK stacks into Python…
The Cloud Intelligence Dashboards are a collection of Amazon QuickSight dashboards, and include The Cost Intelligence Dashboard, CUDOS Dashboard, Trusted Advisor Organization (TAO) Dashboard, and Trends Dashboard. This repository has everything you need to get you up and running to set those up. To help provide more background, check out the blog post Visualize and gain insights into your AWS cost and usage with Cloud Intelligence Dashboards and CUDOS using Amazon QuickSight from Nisha Notani, Timur Tulyaganov, and Yuriy Prykhodko.
amazon-sagemaker-tsp-deep-rl The Travelling Salesperson Problem (TSP) is summarised as ““Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?” and this project provides code that demonstrates how to train, deploy, and make inferences using deep reinforcement learning to solve the Travelling Salesperson Problem. To walk you through it, Yin Song, Josiah Davis, Eden Duthie, and Chen Wu have come together to write, Solving the Traveling Salesperson Problem with deep reinforcement learning on Amazon SageMaker
aws-analytics-reference-architecture if you are looking to put together a data analytics solution, you should check out this reference architecture first. It combines a number of current good practices for design, the implementation and operation of an analytics platform. The documentation (AWS Analytics Reference Architecture) is great, providing an easy way to navigate and understand the different components. Code contains CDK reference examples, which you can borrow in your own projects as well.
amazon-gamelift-plugin-unity this project is a plugin for Unity that contains libraries and native UI that makes it easier to access GameLift resources and integrate GameLift into your Unity game. You can use the GameLift Unity Plugin to access GameLift APIs and deploy AWS CloudFormation templates for common gaming scenarios.
terraform-aws-appconfig a new Terraform module from clowd.haus which creates AWS AppConfig resources within your Terraform code.
amazon-ecs-fullstack-app-terraform this repository contains Terraform code to deploy a complete demo application, including the infrastructure as well as a CI/CD pipeline that you can use as a baseline for your own projects.
Gist of the week
This very handy snippet, ingest-aws.js, from Jonas Birmé allows to create an event to watch for a video being uploaded to an Amazon S3 bucket, and this kicking off a transcode event. It uses the open source ingest-application-framework which was new to me, and provides a ready to go framework for building video on demand ingest applications.
AWS and Community blog posts
A couple of posts this week.
First up, a post I somehow missed last year which is worth checking out, is this post from QP Hou, Breaking up the Airflow DAG monorepo where he explores how to bring multi-repo DAG development using a new open sourced project called objinsync](https://aws-oss.beachgeek.co.uk/xq), that is a stateless DAG sync daemon, which is deployed as a sidecar container. [hands on]
Following that, a new blog post/tutorial from myself, Integrating Amazon Timestream in you Amazon Managed Workflows for Apache Airflow v2.x, where I show you how you can orchestrate Amazon Timestream data using Apache Airflow to help you with a number of different use cases. [hands on]
Damon Cortesi has put together this post a few months back, Athena SQLite that shows you how to query SQLite databases in S3 using Athena’s Federated Query functionality. It is implemented within the Serverless Application Repository (SAR) and you can check out the details of how to deploy it here.
Cortex is a distributed system that allows for a horizontally scalable, highly available, and long-term storage solution for Prometheus metrics. In this post, Building a series deletion API in Cortex from Ilan Gofman, he shares his experience of designing and implementing the series deletion feature inside of the Cortex open source project.
More OpenTelemtry goodness this week, starting off with this post from Danilo Poccia, who wrote last week of the general availability of tracing in AWS Distro for OpenTelemetry in, New for AWS Distro for OpenTelemetry – Tracing Support is Now Generally Available
Following that we have Eddy Lin, who discusses his experience building the Go MultiMod tool, an open source solution that automates most of the tedious work of releasing new versions with Golang. Read more in his post, Simplifying OpenTelemetry Collector and Go library releases with the Go MultiMod tool
Using PostgreSQL with Spring Boot on AWS — Part 1 and Using PostgreSQL with Spring Boot on AWS — Part 2 is a follow up post from Björn Wilmsmann, Philip Riecks, and Tom Hombergs, authors of the book Stratospheric: From Zero to Production with Spring Boot and AWS. The original post was very well received, so they are back with more, this time exploring how you can use PostgreSQL from a Spring Boot web application. [hands on]
If you ever wondered the origins of the name of the open source IDS/IPS Suricata, then you need to read this post (it kind of made sense after I knew, but still amazed me). Adam Palmer and Jesper Eneberg show you how to create an open-source IDS/IPS service running in Docker containers, using Amazon Elastic Container Service (ECS) and Amazon Linux 2 (AL2) in the post, Building an Open Source IDS IPS service for Gateway Load Balancer [hands on]
moto is an open source library that allows you to easily mock out tests based on AWS infrastructure. In the post Getting started with testing serverless applications, Talia Nassi shows you how you can use this as part of your testing approach when building serverless applications. [hands on]
A series of two posts covering migration from a monolithic REST API and monolithic frontend to a modern federated GraphQL system. In Building federated GraphQL on AWS Lambda Krzysztof Lis from IMDb shares their experience and what they learned, including how open source tools helped with the learning curve as they migrated. Following that, in Managing federated schema with AWS Lambda and Amazon S3 he looks at one of the biggest challenges they faced, GraphQL schema management.
Amazon EKS Anywhere & EKS Connector Gokul Chandra provides a very comprehensive walk through of Amazon EKS Anywhere and EKS Connector, showing you how you can put these together as part of your hybrid cloud architecture. This post will require your focus, so grab your preferred hot beverage before reading.
Eyevinn Ingest Application Framework
Eyevinn Ingest Application Framework (IAF) allows you to build plugins in a modular way that can interact with different storage or transcoding solutions. Oscar Nord dives deeper into this in his post, Building plugins for the Eyevinn Ingest Application Framework. He looks at two of the IAF plugins developed to easily transcode and make on demand video content (VOD) available on multiple devices using AWS Elemental MediaConvert and AWS Elemental MediaPackage. You can see in the projects section above for more details.
Videos of the week
Mehmet Nuri Deveci and Eric Johnson bring you the latest Serverless Office Hours, this week featuring AWS Serverless Application Model (SAM) and demonstrate how SAM Accelerate helps developers build serverless applications quickly. We demo how SAM Accelerate will help developers test in the cloud by speeding up the deployment process and bringing all logging to a central location.
OpenTelemetry and FluentBit
Michael Hausenblas discusses good practices and current developments around CNCF open source projects and specifications including OpenTelemetry and FluentBit. This has been on the events section for a few weeks, so am hoping some of view will have seen this already, if not, check out his presentation. Embracing Observability in Distributed Systems.
Distributed load testing
This is something to keep an eye out for, a new series from Open Source-AWS on You Tube, taking a look and walking through open source projects on AWS. The first one is taking a look at distributed load testing using AWS Fargate.
EMR Studio is an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualise, and debug big data and analytics applications written in R, Python, Scala, and PySpark. You can now use (Version EMR 6.4.0 and later), Python, Scala, SparkSQL, and R within the same Jupyter notebook in EMR Studio, providing flexibility to use different programming languages for Spark workloads. Previously, you could only write code in one language within the same notebook for Spark workloads. With this feature enhancement to Jupyter notebooks, you can now switch between Python, Scala, SparkSQL, and R within the same Jupyter notebook and share data between cells via temporary tables. You can also use this feature from EMR Notebooks or from Jupyter notebooks talking to Jupyter Enterprise Gateway (JEG) on EMR 6.4.0 and later.
Events for your diary
Coming up later this week we have…
Secure Coding Virtual Summit September 29, 2021
The Secure Coding Virtual Summit is your source for everything you need to build secure code from the ground up. There are many interesting session, but check out the sessions covering how to secure and protect yourself when using open source.
Full details, including speaker line up and how to register, here.
GraphQL API security best practices with AWS AppSync and AWS Amplify 14th October, 11am AEST
As a developer, the most important parts of managing your applications should always include enhancing performance while strengthening security. In this webinar, we take you through security best practices for your GraphQL API’s with AWS AppSync and AWS Amplify, providing you with an understanding of how these can be applied to your applications. In this session, you will learn about:
- GraphQL Protocol and how to configure a schema
- Possible ways to authenticate and authorise access to GraphQL APIs
- How to configure network security for your API
- How to enable observability for your API with logging, tracing or auditing
Amazon SageMaker and Open-Source Tools for ML: Better Together October 7 | 11 AM PT | 2 PM ET
Many organisations rely on open-source tools to support the Machine Learning lifecycle. Amazon SageMaker has been rapidly evolving by introducing support and compatibility for various open-source frameworks. In this session, you will learn how to build a customisable ML Infrastructure based on Amazon SageMaker and open-source components. We will discuss pros and cons, the limitations of different tools that support specific stages of the ML workflow, and best practices for MLOps, to automate these stages into repeatable pipelines.
To read more and register for this event, click here.
Flink Forwards Global 2021 October 26th/27th
Flink Forward Global 2021 is a 2-day virtual conference for the Apache Flink and stream processing communities. Apache Flink is an open-source distributed engine for processing data streams that can support both streaming and batch workloads. Flink Forward has keynote presentations and talks on production Flink use cases, technical deep dive sessions, and the growth of the Flink ecosystem. You can meet core Flink committers, new and experienced users, and thought leaders who share experiences and best practices in stream processing, real-time analytics, and the management of mission-critical Flink deployments in production.