Apache Airflow
-
AWS open source news and updates #110
Apr 29, 2022 | 14 minute read
April 29th, 2022 - Instalment #110 Newsletter #110. Welcome to edition #110 of the AWS open source newsletter. It has been a busy week, with the AWS Summit London happening this week (where I was lucky enough to do a session on Apache Airflow) meaning I am publishing this a little later than I had planned. We have more great new projects this week, including a project that helps make it easier to deploy your static and dynamic applications, a tool that provides help in managing the long term health of your AWS Data Lake, a cool project to help you replicate data from a Kinesis Data Stream across regions, a nice CloudWatch dashboard widget that summarises your CloudFormation stacks, and many more - so check them out.
-
AWS open source news and updates #107
Apr 4, 2022 | 17 minute read
April 4th, 2022 - Instalment #107 Newsletter #107. Welcome to edition #107 of the AWS open source newsletter, and we have a bumper edition this week packed with more great new open source projects and content for you to consume. Topics featured this week include optimising open source big data tools, developer tooling, case studies and we even some some great open source content for .NET core developers. This weeks projects include a really nice handy browser plugin called “aws-search-extension”, that lets you search and find developer information from the AWS docs, a tool that will help you detect whether you have configured or using dockershim in your Kubernetes clusters, a library to help you integrate Amazon Cognito in your Laravel PHP applications, and plenty more developer tools and sample projects.
-
AWS open source news and updates #105
Mar 20, 2022 | 15 minute read
March 21st, 2022 - Instalment #105 Newsletter #105. Welcome to edition #105 of the AWS open source news and updates, where we bring you the latest open source projects, posts, events, and much more. This weeks new projects include the latest work in progress from AWS Hero Ian Mckay, “iamfast” is an AWS IAM policy generation tool that is in early stages but promises to be very useful indeed. “iasql-engine” is a tool that models cloud infrastructure as data, “ssm-patch-portal” provides a nice gui front end to simplify patching with AWS System Manager, a new crowdsource guide that contains learning resources for AWS, a business intelligence platform built using open source technologies from the NHS, and many more.
-
AWS open source news and updates #104
Mar 14, 2022 | 17 minute read
March 14th, 2022 - Instalment #104 Newsletter #104. Welcome to #104 of the AWS open source news and updates newsletter, bringing you the latest updates from around the AWS and Communities. This week we have yet more great new open source projects, including a Deno runtime for your Lambda functions, data lineage and data testing tools, a performance testing tool for Apache Kafka, an ELT tool for Amazon Redshift, an Amazon S3 archive tool, and many more.
-
Contributing to the Apache Airflow project - Part Two
Mar 11, 2022 | 11 minute read
This is the second and concluding post providing an overview of the experience and journey contributing to the Apache Airflow project. You can catch Part One here. Contributing to Apache Airflow - Part Deux In Part One of this series, we took our first steps in contributing to the Apache Airflow project. With a little bit more knowledge and experience, our first interactions with the Airflow community, we are ready to start exploring how the code works and see how we might go about fixing this.
-
Orchestrating hybrid workflows using Amazon Managed Workflows for Apache Airflow (MWAA)
Mar 7, 2022 | 46 minute read
Using Apache Airflow to orchestrate hybrid workflows In some recent discussions with customers, the topic of how open source is increasingly being used as a common mechanisms to help build re-usable solutions that can protect investments in engineering and development time, skills and that work across on premises and Cloud environment. In 2021 my most viewed blog post talked about how you can build and deploy containerised applications, anywhere (Cloud, your data centre, other Clouds) and on anything (Intel and Arm).
-
AWS open source news and updates #103
Mar 7, 2022 | 15 minute read
March 7th, 2022 - Instalment #103 Newsletter #103. Welcome to edition #103 of the AWS open source news and updates. This weeks featured new open source projects include botocove (a decorator that helps you run your functions across your AWS accounts easily), functionless (a TypeScript plugin that transforms TypeScript code into Service-to-Service integrations), replibyte (a tool to replicate your PostgreSQL data), aws-security-bulletin-alert (notifies you of new AWS Security Bulletins) and sends out E-Mail notifications via Amazon SES), and many more.
-
AWS open source news and updates #102
Feb 28, 2022 | 13 minute read
Feb 28th, 2022 - Instalment #102 Newsletter #102. Welcome to edition #102 of the AWS open source news and updates newsletter, and this week we have a super collection of new open source projects that I am really excited to share. First up we have the AWS DataOps Development Kit, which uses AWS CDK under the covers, and is an open source development framework to help you build data workflows. Threatmapper is an open source cloud native security observability platform, which looks easy to use and has some good visualisations.
-
AWS open source news and updates #101
Feb 21, 2022 | 12 minute read
Feb 21st, 2022 - Instalment #101 Newsletter #101. There is nothing basic and fundamental about edition 101 of the AWS open source newsletter, with another great round up of new open source projects including eks-creation-engine from the folks at Lightspin helping you all to stay safer with this handy tool you should check out, idp-scim-sync to help users of AWS SSO who want to synchronise with their Google Workspace Directory, typecart an analysis tool for proof evolution and many other great projects and sample code.
-
AWS open source news and updates #100
Feb 14, 2022 | 12 minute read
Feb 14th, 2022 - Instalment #100 Newsletter #100. Happy Valentines everyone, and welcome to this landmark 100st edition of this newsletter. This week we celebrate the love that many builders have for open source with more great new open source projects and content. Cuddle up to new projects that will help you build scalable systems, simplify your work with AWS DynamoDB, integrate your .NET applications with OpenSearch, keep on top of your VPC networks, and more.
-
Contributing to the Apache Airflow project - Part One
Feb 10, 2022 | 18 minute read
Contributing to Apache Airflow Introduction In this series of posts, I am going to share what I learn as embark on my first upstream contribution to the Apache Airflow project. The purpose is to show you how typical open source projects like Apache Airflow work, how you engage with the community to orchestrate change and hopefully inspire more people to contribute to this open source project. I will post regular updates as a series of posts, as the journey unfolds.
-
AWS open source news and updates #99
Feb 7, 2022 | 12 minute read
Feb 7th, 2022 - Instalment #99 Newsletter #99. While Nena gave you 99 red balloons, I give you the latest version of the AWS open source news letter. This week we feature more great new open source projects including a project to help you with drift detection in your CloudFormation stacks, new Terraform modules, an open-source prometheus exporter, some AWS CDK resources and sample projects and more. This weeks AWS and Community posts cover PostgreSQL, Apache Airflow, AWS CDK, Redis, GraphQL, Apollo GraphQL, Kubernetes, AWS EKS and more.
-
AWS open source news and updates #98
Jan 29, 2022 | 14 minute read
Jan 31st, 2022 - Instalment #98 Newsletter #98. Welcome to another edition of AWS open source news and updates, featuring more new open source projects. This week, these include eventbridge-assistant (a VScode plugin to help you whilst you are developing with Amazon EventBridge), stratus-red-team (a tool you can use to emulate offensive attack techniques), critter (AWS Config rule integration testing), syne-tune-s3-transfer (an example of how to apply the distributed parameter search library to optimise download performance), karpenter-terraform (a Terraform module to help you automate deployment of karpenter), and a couple of super interesting open source solutions covering last mile delivery and software defined radio.
-
AWS open source news and updates #97
Jan 22, 2022 | 12 minute read
Jan 22nd, 2022 - Instalment #97 Newsletter #97. Welcome to another edition of the AWS open source newsletter, packed with more great new open source projects, content, and events. This week, we have new projects that help you improve security by de-obfuscating strings, a library to help you automate the configuration of your build pipelines, a new Terraform module, a nice new VSCode plugin that will help you when working with IAM, and several more.
-
Setting up MWAA to use a KMS key
Dec 14, 2021 | 6 minute read
Introduction In a previous post, I shared how you can using AWS CDK to provision your Apache Airflow environments using the Managed Workflows for Apache Airflow service (MWAA). I was contacted this week by Michael Grabenstein, who flagged an issue with the code in that post. The post used code that configured a kms key for the MWAA environment, but when trying to deploy the app it would fail with the following error:
-
Integrating Amazon Timestream in your Amazon Managed Workflows for Apache Airflow v2.x
Sep 23, 2021 | 28 minute read
Integrating with Amazon Timestream in your Apache Airflow DAGs Amazon Timestream is a fast, scalable, and serverless time series database service perfect for use cases that generate huge amounts of events per day, optimised to make it faster and more cost effective that using relational databases. I have been playing around with Amazon Timestream to prepare for a talk I am doing with some colleagues, and wanted to see how I could integrate it with other AWS services in the context of leveraging some of the key capabilities of Amazon Timestream.
-
Reading and writing data across different AWS accounts with Amazon Managed Workflows for Apache Airflow v2.x
Sep 7, 2021 | 13 minute read
Reading and writing data across different AWS accounts in you Apache Airflow DAGs As regular readers will know, I sometimes lurk in the Apache Airflow slack channel to see what is going on. If you are new to Apache Airflow, or want to get a deeper understanding then I highly recommend spending some time here. The community is super welcoming and eager to help new participants. It was during a recent session I came across an interesting problem that one of the builders was having, which was how to access (read/write) data in an S3 bucket which was in a different account to the one hosting Amazon Managed Workflows for Apache Airflow (MWAA).
-
Working with parameters and variables in Amazon Managed Workflows for Apache Airflow
Jul 27, 2021 | 36 minute read
Maximising the re-use of your DAGs in MWAA During some recently conversations with customers, one of the topics that they were interested in was how to create re-usable, parameterised Apache Airflow workflows (DAGs) that could be executed dynamically through the use variables and/or parameters (either submitted via the UI or the command line). This makes a lot of sense, as you may find that you repeat similar tasks in your workflows, and so this approach allows you to maximise the re-use of that work.
-
Working with Amazon EKS and Amazon Managed Workflows for Apache Airflow v2.x
Jun 10, 2021 | 11 minute read
Introduction The Apache Airflow slack channel is a vibrant community of open source builders that is a great source of feedback, knowledge and answers to problems and use cases you might have when trying to do stuff with Apache Airflow. This week I picked up on someone seeing errors with Amazon EKS, and so I thought what better time to try out the new Apache Airflow 2.x version that was recently launched in Amazon Managed Workflows for Apache Airflow (MWAA).
-
Working with the RedshiftToS3Transfer operator and Amazon Managed Workflows for Apache Airflow
May 15, 2021 | 18 minute read
Introduction Inspired by a recent conversation within the Apache Airflow open source slack community, I decided to channel the inner terrier within me to tackle this particular issue, around getting an Apache Airflow operator (the protagonist for this post) to work. I found the perfect catalyst in the way of the original launch post of Amazon Managed Workflows for Apache Airflow (MWAA). As is often the way, diving into that post (creating a workflow to take some source files, transform them and then move them into Amazon Redshift) led me down some unexpected paths to here, this post.
-
Using AWS CDK to deploy your Amazon Managed Workflows for Apache Airflow environment
Apr 28, 2021 | 11 minute read
update I am grateful to Michael Grabenstein for spotting some mistakes in the original post/code. I hope these have now been rectified in this post. Using AWS CDK to deploy your Amazon Managed Workflows for Apache Airflow environment What better way to celebrate CDK Day than to return to a previous blog where I wrote about automating the installation and configuration of Amazon Managed Workflows for Apache Airflow (MWAA), and take a look at doing the same thing but this time using AWS CDK.
-
Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part Two
Apr 21, 2021 | 17 minute read
Part Two - Automating Amazon EMR In Part One, we automated an example ELT workflow on Amazon Athena using Apache Airflow. In this post, Part Two, we will do the same thing but automate the same example ELT workflow using Amazon EMR. Make sure you recap the setup from Part One. All the code so you can reproduce this yourself can be found in the GitHub repository here. Automating Amazon EMR
-
Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part One
Apr 21, 2021 | 18 minute read
update: I have changed the post to use standard Apache Airflow variables rather than using AWS Secrets Manager. Part One - Automating Amazon Athena As part of an upcoming DevDay event, I have been working on how you can use Apache Airflow to help automate your Extract, Load and Transform (ELT) Workflows. Amazon Athena and Amazon EMR are two AWS services that help customers who have existing SQL skills/expertise and are looking at tools such as Presto or Apache Hive when undertaking those transformations.
-
Monitoring and logging with Amazon Managed Workflows for Apache Airflow
Feb 9, 2021 | 12 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging <- this post Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 6, where to find logs to help you understand and troubleshoot your Apache Airflow workflows, and how you can monitor your Apache Airflow environments.
-
A simple CI/CD system for your Amazon Managed Workflows for Apache Airflow development workflow
Feb 3, 2021 | 16 minute read
updated Feb 19th Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow <- this post Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 5, how you can setup a very simple CI/CD setup to enable faster development of your Apache Airflow DAGs.
-
Interacting with Amazon Managed Workflows for Apache Airflow via the command line
Feb 1, 2021 | 12 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line < this post Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 4, how you can interact and access the Apache Airflow via the command line.
-
Accessing your Amazon Managed Workflows for Apache Airflow environments
Jan 28, 2021 | 8 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments < this post Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 3, how you can interact and access the Apache Airflow environments.
-
Working with permissions in Amazon Managed Workflows for Apache Airflow
Jan 27, 2021 | 10 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions <- this post Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 2, how to ensure that you control access to Apache Airflow following best practices such as default no access/least privilege.
-
Automating the installation and configuration of Amazon Managed Workflows for Apache Airflow
Jan 26, 2021 | 15 minute read
updated, August 25th Thanks to Philip T for spotting a typo in the cloudformation code below - it is ok in the GitHub repo, but I have fixed it now below. Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow <- this post Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 1, automating the installation and configuration of Managed Workflows for Apache Airflow (MWAA).