Posts
-
Experimenting with digital lanyards - introducing the Badger2040
May 9, 2022 | 6 minute read
Experimenting with digital lanyards As someone who attends events on a regular basis, I have spent a fair bit of time over the years looking at interesting ways to engage with attendees. One of the problems I was looking to solve was how do I share useful information with attendees without having to interrupt the conversations (something that typically happens as I try and find those links on my mobile phone).
-
Contributing to the Apache Airflow project - Part Two
Mar 11, 2022 | 11 minute read
This is the second and concluding post providing an overview of the experience and journey contributing to the Apache Airflow project. You can catch Part One here. Contributing to Apache Airflow - Part Deux In Part One of this series, we took our first steps in contributing to the Apache Airflow project. With a little bit more knowledge and experience, our first interactions with the Airflow community, we are ready to start exploring how the code works and see how we might go about fixing this.
-
Orchestrating hybrid workflows using Amazon Managed Workflows for Apache Airflow (MWAA)
Mar 7, 2022 | 46 minute read
Using Apache Airflow to orchestrate hybrid workflows In some recent discussions with customers, the topic of how open source is increasingly being used as a common mechanisms to help build re-usable solutions that can protect investments in engineering and development time, skills and that work across on premises and Cloud environment. In 2021 my most viewed blog post talked about how you can build and deploy containerised applications, anywhere (Cloud, your data centre, other Clouds) and on anything (Intel and Arm).
-
Contributing to the Apache Airflow project - Part One
Feb 10, 2022 | 18 minute read
Contributing to Apache Airflow Introduction In this series of posts, I am going to share what I learn as embark on my first upstream contribution to the Apache Airflow project. The purpose is to show you how typical open source projects like Apache Airflow work, how you engage with the community to orchestrate change and hopefully inspire more people to contribute to this open source project. I will post regular updates as a series of posts, as the journey unfolds.
-
Running my dev.to blog using Hugo on Netlify
Jan 7, 2022 | 6 minute read
Running my dev.to blog using Hugo on Netlify I am a big fan of dev.to, and the work that the team do to foster a great community of builders is something that keeps me there. I have always maintained another blog (running on Netlify, which is also super awesome), kind of like a mirror. Up until last year, I was able to publish to dev.to and it would take care of publishing to that mirror.
-
Setting up MWAA to use a KMS key
Dec 14, 2021 | 6 minute read
Introduction In a previous post, I shared how you can using AWS CDK to provision your Apache Airflow environments using the Managed Workflows for Apache Airflow service (MWAA). I was contacted this week by Michael Grabenstein, who flagged an issue with the code in that post. The post used code that configured a kms key for the MWAA environment, but when trying to deploy the app it would fail with the following error:
-
Integrating Amazon Timestream in your Amazon Managed Workflows for Apache Airflow v2.x
Sep 23, 2021 | 28 minute read
Integrating with Amazon Timestream in your Apache Airflow DAGs Amazon Timestream is a fast, scalable, and serverless time series database service perfect for use cases that generate huge amounts of events per day, optimised to make it faster and more cost effective that using relational databases. I have been playing around with Amazon Timestream to prepare for a talk I am doing with some colleagues, and wanted to see how I could integrate it with other AWS services in the context of leveraging some of the key capabilities of Amazon Timestream.
-
Reading and writing data across different AWS accounts with Amazon Managed Workflows for Apache Airflow v2.x
Sep 7, 2021 | 13 minute read
Reading and writing data across different AWS accounts in you Apache Airflow DAGs As regular readers will know, I sometimes lurk in the Apache Airflow slack channel to see what is going on. If you are new to Apache Airflow, or want to get a deeper understanding then I highly recommend spending some time here. The community is super welcoming and eager to help new participants. It was during a recent session I came across an interesting problem that one of the builders was having, which was how to access (read/write) data in an S3 bucket which was in a different account to the one hosting Amazon Managed Workflows for Apache Airflow (MWAA).
-
Working with parameters and variables in Amazon Managed Workflows for Apache Airflow
Jul 27, 2021 | 36 minute read
Maximising the re-use of your DAGs in MWAA During some recently conversations with customers, one of the topics that they were interested in was how to create re-usable, parameterised Apache Airflow workflows (DAGs) that could be executed dynamically through the use variables and/or parameters (either submitted via the UI or the command line). This makes a lot of sense, as you may find that you repeat similar tasks in your workflows, and so this approach allows you to maximise the re-use of that work.
-
Creating a multi architecture CI/CD solution with Amazon ECS and ECS Anywhere
Jul 16, 2021 | 39 minute read
Please let me know how I can improve posts such as this one, by completing this very short survey. $25 AWS credits will be provided for the first 20 completed - take the survey Organisations are moving their workloads to the cloud as quickly as they can. While most applications can be easily migrated to the cloud, some applications need to remain on-premises due to low-latency or data sovereignty requirements.
-
Working with Amazon EKS and Amazon Managed Workflows for Apache Airflow v2.x
Jun 10, 2021 | 11 minute read
Introduction The Apache Airflow slack channel is a vibrant community of open source builders that is a great source of feedback, knowledge and answers to problems and use cases you might have when trying to do stuff with Apache Airflow. This week I picked up on someone seeing errors with Amazon EKS, and so I thought what better time to try out the new Apache Airflow 2.x version that was recently launched in Amazon Managed Workflows for Apache Airflow (MWAA).
-
Working with the RedshiftToS3Transfer operator and Amazon Managed Workflows for Apache Airflow
May 15, 2021 | 18 minute read
Introduction Inspired by a recent conversation within the Apache Airflow open source slack community, I decided to channel the inner terrier within me to tackle this particular issue, around getting an Apache Airflow operator (the protagonist for this post) to work. I found the perfect catalyst in the way of the original launch post of Amazon Managed Workflows for Apache Airflow (MWAA). As is often the way, diving into that post (creating a workflow to take some source files, transform them and then move them into Amazon Redshift) led me down some unexpected paths to here, this post.
-
Using AWS CDK to deploy your Amazon Managed Workflows for Apache Airflow environment
Apr 28, 2021 | 11 minute read
update I am grateful to Michael Grabenstein for spotting some mistakes in the original post/code. I hope these have now been rectified in this post. Using AWS CDK to deploy your Amazon Managed Workflows for Apache Airflow environment What better way to celebrate CDK Day than to return to a previous blog where I wrote about automating the installation and configuration of Amazon Managed Workflows for Apache Airflow (MWAA), and take a look at doing the same thing but this time using AWS CDK.
-
Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part Two
Apr 21, 2021 | 17 minute read
Part Two - Automating Amazon EMR In Part One, we automated an example ELT workflow on Amazon Athena using Apache Airflow. In this post, Part Two, we will do the same thing but automate the same example ELT workflow using Amazon EMR. Make sure you recap the setup from Part One. All the code so you can reproduce this yourself can be found in the GitHub repository here. Automating Amazon EMR
-
Automating your ELT Workflows with Managed Workflows for Apache Airflow - Part One
Apr 21, 2021 | 18 minute read
update: I have changed the post to use standard Apache Airflow variables rather than using AWS Secrets Manager. Part One - Automating Amazon Athena As part of an upcoming DevDay event, I have been working on how you can use Apache Airflow to help automate your Extract, Load and Transform (ELT) Workflows. Amazon Athena and Amazon EMR are two AWS services that help customers who have existing SQL skills/expertise and are looking at tools such as Presto or Apache Hive when undertaking those transformations.
-
Monitoring and logging with Amazon Managed Workflows for Apache Airflow
Feb 9, 2021 | 12 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging <- this post Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 6, where to find logs to help you understand and troubleshoot your Apache Airflow workflows, and how you can monitor your Apache Airflow environments.
-
A simple CI/CD system for your Amazon Managed Workflows for Apache Airflow development workflow
Feb 3, 2021 | 16 minute read
updated Feb 19th Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow <- this post Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 5, how you can setup a very simple CI/CD setup to enable faster development of your Apache Airflow DAGs.
-
Interacting with Amazon Managed Workflows for Apache Airflow via the command line
Feb 1, 2021 | 12 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line < this post Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 4, how you can interact and access the Apache Airflow via the command line.
-
Accessing your Amazon Managed Workflows for Apache Airflow environments
Jan 28, 2021 | 8 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments < this post Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 3, how you can interact and access the Apache Airflow environments.
-
Working with permissions in Amazon Managed Workflows for Apache Airflow
Jan 27, 2021 | 10 minute read
Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow Part 2 - Working with Permissions <- this post Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 2, how to ensure that you control access to Apache Airflow following best practices such as default no access/least privilege.
-
Automating the installation and configuration of Amazon Managed Workflows for Apache Airflow
Jan 26, 2021 | 15 minute read
updated, August 25th Thanks to Philip T for spotting a typo in the cloudformation code below - it is ok in the GitHub repo, but I have fixed it now below. Part of a series of posts to support an up-coming online event, the Innovate AI/ML on February 24th, from 9:00am GMT - you can sign up here Part 1 - Installation and configuration of Managed Workflows for Apache Airflow <- this post Part 2 - Working with Permissions Part 3 - Accessing Amazon Managed Workflows for Apache Airflow environments Part 4 - Interacting with Amazon Managed Workflows for Apache Airflow via the command line Part 5 - A simple CI/CD system for your development workflow Part 6 - Monitoring and logging Part 7 - Automating a simple AI/ML pipeline with Apache Airflow In this post I will be covering Part 1, automating the installation and configuration of Managed Workflows for Apache Airflow (MWAA).
-
TIL: Testing an Amazon Cloudwatch alarm
Jan 7, 2021 | 2 minute read
Today I was setting up an application load balancer that sits in front of a test application I have put together. Setting this up was super easy, and very quickly I had my domain pointing to the alias and serving requests. As part of the setup, I wanted to monitor the application load balancer to let me know when requests were failing to the downstream application (anything other than an HTTP 200) and so I set this up super easily in Amazon Cloudwatch.
-
Amazon Aurora - setting up and configuration, four ways
Oct 15, 2020 | 8 minute read
In this post I want to share four different approaches to installing and configuring your Amazon Aurora database clusters. Everything in this post is covered in detail in the embedded video, but I wanted to share some additional information that I did not include in the video that was easier done in this blog. {% youtube wZfh9PurE9E %} Why four ways? The approach in the video was to look at the journey you might take when learning a new technology and then how you move to productise that technology.
-
Long running data import jobs with AWS Session Manager
Sep 10, 2020 | 4 minute read
Yesterday I was looking to import the TPC-H dataset (some 600 or so million rows) into Amazon Aurora from a workstation that I connect to using AWS Session Manager. AWS Session Manager is a great way to simplify your life by allowing you to connect to a machine via the AWS console and not worry about having to manage ssh keys or remembering to lock down external public access from the net.
-
Building a culture of security in open source software development
Jul 15, 2020 | 9 minute read
Updated on Jan 18th to remove broken link to report According to a number of recent studies, the use and adoption of open source software continues to rise. From studies such as the State of Enterprise Open Source by Red Hat (in which nearly 70% of respondents stated that open source software is either extremely or very important) or TideLift’s April 2019 survey report (that found more than 90% of professional developers use open source in building their applications) it is clear that developers from startups to highly regulated enterprises have embraced open source solutions.
-
Automating AWS SSO and G-Suite synchronisation with SSO Sync
Jun 3, 2020 | 5 minute read
update-July 28th The ssosync tool has had a lot of interest and the community has updated the tool. This means that you should refer to the project home page https://github.com/awslabs/ssosync and check out the README.md for what changes you might need to make to get this tool working. Next level ssosync In a previous post, I talked about setting up AWS Single Sign On (AWS SSO) with G-Suite, and then using an open source project called ssosync to syncronise users and groups from G-Suite into AWS SSO.
-
Setting up G-Suite, AWS SSO and ssosync
May 27, 2020 | 17 minute read
update-July 28th The ssosync tool has had a lot of interest and the community has updated the tool. This means that you should refer to the project home page https://github.com/awslabs/ssosync and check out the README.md for what changes you might need to make to get this tool working. Enabling AWS SSO with Google G-Suite Many customers have existing directory technologies where they manage their users, and then use this central identity store as a way to simplify the way they authenticate and provide access to applications and other resources.
-
Making the most of mentoring
May 8, 2020 | 5 minute read
Some recent experiences mentoring has provided the motivation for this piece. It is not intended to be right or wrong, but just my personal opinion and experience and I hope it is read that way. I have put this together to share what I think are the critical things that make a mentoring relationship work for both the mentor and mentee. So with that out of the way, I invite you to read on…
-
Mentoring and reverse mentoring
Dec 29, 2019 | 4 minute read
As I reflect on 2019, one of the common themes whilst engaging with builders at the start of their career, has been how do those of us who have deep experience working in the IT industry and technology help bring those who are just starting out? Some common themes when talking that have come up include; How do I get started on Cloud or AWS? What tools and languages should I learn?
-
reInvent 2019 workshop list
Dec 9, 2019 | 1 minute read
So here is a list scraped from Twitter and following various other folk, of just a small taster of the workshops that ran during reInvent. As I find more I will update, and feel free to add yours in the comments (oh, and let me know if any of these are dead links) Serverless - https://github.com/aws-samples/aws-serverless-workshop-innovator-island/ Serverless image process workshop - https://image-processing.serverlessworkshops.io/ Amplify preductions workshop - https://github.com/mlabieniec/IonicPredictions Full stack serverless Amplify lab - https://github.
-
Make your business more resilient in the digital age
Oct 17, 2019 | 1 minute read
Very humbled to write a guest post on Adrian Hornsby excellent blog where he provides guidance to help customers build resilient architecture and champions operational excellence. In this post I talk about what you need to think about to build a more resilient business fit for the digital age. Here is the link: https://medium.com/@adhorn/make-your-business-more-resilient-in-the-digital-age-888da3f5deaf
-
Innovate Machine Learning and AI - learn how to kick start your journey
Oct 4, 2019 | 1 minute read
On October 17th we have a free, online event covering several tracks on Machine Learning. Whether you are a complete beginner or seasoned data scientist, we have gentle introductions to deep dives. What I am most excited about however, is that we will have an AWS DeepRacer racing challenge. You will learn how to create your first reinforcement learning model that will race a virtual car, with prizes for the fastest times.