Insecure By Design

Building Out The HomeLab: Photos with Immich

Fri, 23 Jan 2026 00:00:00 GMT

In my post last month about rethinking my personal tech stack, I had mentioned that I had been using Ente Photos as an alternative to Google Photos, but had also been evaluating Immich. I also mentioned that one day in the future I might move over to Immich after figuring out a better backup strategy for my HomeLab.

That day came this month as I had some time to figure out the last few items I needed to feel secure in using Immich as my primary photo platform, running entirely on my Beelink S12 Pro running Proxmox.

Requirements

Before I had kids, my photo library would fit into the free tier of basically any cloud photo service that is out there. Afterwards, it's an entirely different story - I now have thousands of photos and videos of my kids through their first few years of life. I consider this collection to be one of the most critical pieces of data I have, far surpassing old college papers and the blog posts I cobble together for this site Therefore, making sure I am being careful to protect against data corruption, data loss, and accidental public exposure is of paramount importance. The last one in particular is pretty critical, as protecting photos and videos of my kids from being used to train LLM models is one of the reasons I was looking to get away from Google Photos to begin with!

When looking to fully self-host my photos, I had three main requirements I felt like I had to satisfy before I felt comfortable moving away from Ente:

My photo collection cannot, under any circumstance, be exposed to the public Internet or used for model training/advertising/tracking by a company. For the purposes of facial detection, a model hosted entirely on my own devices is acceptable.
The solution needs to be able to scale to meet my increasing storage needs in a cost effective manner, as I only see myself taking more photos over the years. I really don't like how most cloud photo services automatically send you up to the 1 or 2 terabyte tiers once you exceed 200 GB.
I want to be able share individual photos/videos as well as certain albums with people who do not use the same app as me and do not have access to my home network.

Of these, requirement two is easily met by any service that I can self-host at home. It's requirements 1 and 3 that become much more difficult to meet while attempting to match the functionality that more popular services offer.

Setting Up Immich

I looked up the installation instructions in Immich's excellent documentation and saw that the recommended approach was through Docker compose. I could set it up in a Linux Container (LXC) in Proxmox, but I decided to go for the well supported path and instead provisioned an Ubuntu VM with a static IP address, 2 CPU, 4 GB of RAM, and 40 GB of storage space. My goal is to use my Synology NAS as the storage for all of my photos and just mount the photos share to the VM, so I only need some storage in the VM itself for container images and other files. Once the VM was created, I authorized its static IP to connect to my currently empty NFS photos share on Synology and mounted it in /etc/fstab.

I then built my docker compose file using the instructions on the Immich, taking care to point my photo upload location to the /mnt/photos directory that was the mountpoint for my NFS photos share, then ran docker compose up to bring up the service on my network.

Once it was running, I made an account for myself and installed the mobile app on my phone. However, I didn't login on mobile yet; instead, I wanted to do a bulk import of all of my photos before connecting my phone.

Importing Existing Photos

Immich is a little different than most photo services in that it supports two options of adding photos to your collection - as uploads through your own account, or attached as an external library. It would have been easy to just include my existing photos as an external library, but this would create a weird division between photos I had before installing Immich and photos I took after installing it, which would be uploaded through the mobile app and added to my regular library.

Instead, I wanted to do a full import of all of my existing photos through my actual user account, as if Immich was ingesting them naturally. However, downloading all of my photos to my phone and then uploading them through Immich would take forever. Fortunately, there is a great utility called immich-go that can take a Google Takeout file (or multiple files) and import them into your Immich library using an API key you generate in Immich's administration page. This turned out to be an incredibly fast and effective way of uploading nearly 200 GB of photos and videos into Immich without any errors or corruption issues. As part of the upload, it ended up writing my photos to the photos share on my Synology NAS, which is exactly where I wanted it.

Once that was complete, I logged in through the mobile app and enabled mobile backup from the Recents album on my phone. Since all of my photos had already been imported into Immich, it didn't have to upload anything from my phone, bringing both endpoints in sync.

Immich's Jobs

Inside the Immich administration page are a number of jobs that get run when adding photos to your library or at regular intervals (such as overnight, when most people wouldn't be using the system). These jobs include grouping faces into people for easier identification across photos, analyzing text in images, and transcoding videos for better device compatibility. After the initial import, some of these jobs took a while to run, so I didn't ask too much of Immich for the next day or so.

If you have a GPU in the machine that you are running Immich on, you can use that for the machine learning container and chew through these jobs much quicker. I initially passed the GPU from my Beelink through to this VM to handle the initial import, then removed it so my other LXCs (like Plex) could use it instead. In the future, I may move the machine learning container to my desktop PC so it can use my fully powered Nvidia GPU there instead.

Once these jobs were done, I turned on Immich's Storage Template feature to better sort my photos on my NFS share by date. You can get pretty deep into the customization here, but the default format worked for me - I just wanted to be able to easily drill down into a given year or month on my NAS if need be. After configuring that, I ran the Storage Template Migration job to allow Immich to re-organize the photos I had already uploaded.

I also tested that my preexisting Tailscale LXC setup allowed me access to my server when outside the house without exposing it to the public Internet. I turned off wifi on my phone, connected to my tailnet, and tried to access Immich through the mobile app on my phone. It worked just as planned!

At this point, I felt like I had met requirement 1 - Immich was up and running locally, with a local facial recognition model sorting photos into people.

With my first two requirements met, I set about figuring out how to share photos, videos, and albums from my Immich library with external/public users. There is a whole focused discussion about this on the Immich discord, where users debate the best way of exposing Immich without jeopardizing the privacy of their photos.

Immich has the ability to generate links for selected photos/videos or an entire album, which I would then distribute to people so they are able to view my media. This link needs to be publicly routable for them to be able to view the content; if my Immich server is running off an internal IP or DNS record (something like 192.168.x.x), it's not going to work.

One option is that I could just expose Immich behind a reverse proxy to the entire Internet, but that opens up the potential for a single security flaw in Immich to expose my library. Fortunately, somebody had already recognized this problem and created something called Immich Public Proxy, which acts as an intermediary that I can expose to the public internet since it has no credentials or sensitive paths itself. Instead, all it can do is map external requests to internal API calls in Immich.

This looks great, but now I needed to be able to expose Immich Public Proxy to the public internet. I could open ports on my router itself, as it only requires a single one - but I prefer not to do this if at all possible. Instead, I opted to use a feature from Cloudflare called Cloudflare Tunnels which allows me to place a daemon in my internal infrastructure that creates outbound connections to Cloudflare's network to route traffic as mapped in your Cloudflare DNS records.

Both Immich Public Proxy and the Cloudflare Tunnels daemon (cloudfared) can be added to a Docker network, so I added a bit more to my Docker Compose file for Immich:

immich-public-proxy:
    container_name: immich-public-proxy
    image: alangrainger/immich-public-proxy:latest
    restart: always
    ports:
      - "3000:3000"
    environment:
      PUBLIC_BASE_URL: ${IMMICH_SHARE_URL}
      IMMICH_URL: http://immich-server:2283
    healthcheck:
      test: curl -s http://localhost:3000/share/healthcheck -o /dev/null || exit 1
      start_period: 10s
      timeout: 5s

  tunnel:
    container_name: cloudflared
    image: cloudflare/cloudflared
    restart: unless-stopped
    command: tunnel run
    env_file:
      - .env
    environment:
      - TUNNEL_TOKEN=${TUNNEL_TOKEN}

First, I'm passing in some references that are used to configure these services - the public share base URL I'll be using in my Cloudflare DNS records for shareable links, and the token generated by Cloudflare when creating a tunnel. Then it's a pretty straightforward configuration that points Immich Public Proxy at my existing Immich server. On the Cloudflare side, I configured the tunnel so that a CNAME record for a subdomain gets routed to my Immich Public Proxy container and port that now runs on the same Docker network as cloudflared.

There's one last step I had to do inside Immich itself - in Administration > Settings > Server Settings, I configured the external domain with the CNAME entry I had just hooked up to the tunnel route in Cloudflare. Once that's done, any shareable link I generate in Immich should be able to be routed via the Cloudflare tunnel to my internal Immich instance, but without exposing any ports on my router or making any part of the Immich server itself publicly reachable.

Backups

The last thing I had to resolve was how to back all of this up in a way that made me feel confident that I could use it as my sole photo service moving forward. Immich has a great documentation page about backups, and even just added the ability to do restores through its web UI. In essence, there's two main things you need to worry about when backing up Immich:

The database that Immich uses to underpin the server
All of your photos and videos themselves

Immich has a built in job to create database backups, storing them in $UPLOAD_LOCATION/backups. It creates several of these over the course of ten days since they are pretty small. Combined with your actual photos and videos, this is everything you would need to restore Immich should you encounter disaster.

Since Immich is just another VM on my Proxmox host, I configured Proxmox to back up my Immich VM every day to a secondary disk in the mini PC itself. Then, once a week I make a backup to my Synology NAS in a backups NFS share as well. This means that I always have a semi-recent database backup on my NAS, alongside my photos and videos.

From there, I ended up signing up for Synology C2 to encrypt and backup these database backups as well as my actual photos and videos to the cloud. I had compared Synology C2 to Backblaze and other backup providers, and while Synology is a little more expensive for my current amount of storage needs, I really liked how well integrated it was and how easy it is to roll back to previous versions of a file if I need to.

Conclusion

I've written a lot of words about how I got this running, without actually mentioning how I find the Immich software - but it's quite good! They have put in a ton of work over 2025 and now 2026 to get it a state that I think is pretty close to Google Photos. Most importantly, I've been able to cut paying a subscription fee for my photos and have plenty of runway in my NAS as my storage needs grow. The tradeoff is that I'm now paying for a backup of my backup with Synology C2, but that also allows me to back up a lot more than just photos for the same cost.

Building Out The HomeLab: Proxmox and Tailscale

Mon, 19 Jan 2026 00:00:00 GMT

Several years ago, I bought a Synology DS920+ to use as a network attached storage (NAS) appliance in the house. I don't work as a photographer or video editor so my work doesn't really have any massive storage needs; rather, I wanted someplace to host media for the family that didn't require an ongoing subscription service to access. For a few years, it served that purpose almost exclusively, both as a drive I could mount and as a Plex server.

Recently, I had begun tinkering with a few other services and seeing how far I could push my Synology. It has a Container Manager service that allows you to run Docker containers, although the interface is very GUI-driven and there's always a few gotchas. For example, Synology reserves port 80 and 443 for their own web station and reverse proxy services, which makes it difficult to run your own reverse proxy on the device.

For these reasons, I started to consider buying a separate device to act as a home server and repository for all my containers that I would have full control over, and relegate the Synology back to its role as just dumb storage.

Choosing a Mini PC

I had used Raspberry Pi's here and there over the years for little hobbyist projects, but the prices on newer models have increased to the point where they are getting pretty close to proper PCs. I ended up going with a Beelink Mini S12 Pro, which packs an Intel Alder Lake N100 with 16 GB of memory and a 500GB NVMe SSD. I wanted the Intel CPU to support hardware transcoding for Plex, and the idle power draw for the entire device is really low (somewhere between 6 to 10 Watts).

Having read some reviews and impressions from previous owners of the S12 Pro, it seems like the quality of the SSD that it comes with is questionable. I was probably going to want to upgrade the storage anyway, so I also ordered a Kingston 1TB NVMe SSD to replace it with. The S12 Pro also has an additional connector for a 2.5" SATA drive, so I grabbed a Crucial 1TB SATA SSD as well.

When all the pieces arrived, I opened up the mini PC, swapped out the NVMe drive with a single screw, connected and screwed in the SATA drive, then put the cover back together and powered it on.

Proxmox

Before buying the Beelink, I had put some thought into what I wanted the host OS to be. At one point, I was considering just throwing Ubuntu on there and running something like Coolify to deploy some things. In the end, I settled on Proxmox for a few reasons:

I really like the virtualization approach and the flexibility of using both virtual machines and Linux containers (LXCs) on the same platform.
There is a great community behind Proxmox that has already developed helper scripts for a variety of services. This would save me from having to reinvent the wheel for some commonly used services.
I could easily expand this single Proxmox machine into a cluster eventually for high availability, if things get that serious.
Proxmox has great built-in support for snapshots and using them to roll back to previous states of machines and containers.

I downloaded the Proxmox 9.1 ISO, threw it onto a USB stick, booted the Beelink from it and installed it right onto the SSD, making sure to give it a static IP from my router during installation.

There are a couple of tweaks I had to make as a non-Enterprise user of Proxmox after install. These include disabling the enterprise repositories for package updates and adding ones that don't require a subscription. I made these changes myself in the Proxmox UI under Updates > Repositories, but I could also have grabbed a Helper Script that has been written to do all of this in an automated fashion. I also had to go into Datacenter > Storage and set up my second SSD for it to be usable and visible in the Proxmox UI.

Tailscale

I have used Tailscale for a little while now to give my computers, phone, and tablet connectivity to my NAS even when outside of the house. Synology has a Tailscale app in their package manager that made installation and setup a breeze, and I hadn't needed to go any deeper than that up until this point. But since I'm moving everything over to the Proxmox node, I wanted to put Tailscale on there as well and start taking advantage of its more powerful features.

My requirements for setting this up were simple - I wanted to have a single Tailscale install grant me connectivity to any services I may be running on the server. But I also wanted to run Tailscale in an unprivileged LXC to avoid giving it more permissions than I have to. Tailscale has put out some great tutorial videos about getting started installing it on Proxmox, but their example has you installing Tailscale directly on the Proxmox host itself as root, which is something I didn't want to do.

Installation

Looking at the Helper Scripts repository, there is a script for Tailscale, but it is an "add-on" - something to run inside a previously created and established LXC, not one that will configure an LXC from scratch for you.

I started by creating a new LXC container in Proxmox based on their Ubuntu 24.04 standard template with 10 GB of storage and 512 MiB of memory. After booting up and logging in via the Proxmox console, I made things easier for myself by allowing me to SSH in as root by editing /etc/ssh/sshd_config and commenting out the following line:

PermitRootLogin prohibit-password

After running sudo systemctl restart sshd, I could SSH with the root password from my computer's terminal, which allowed me to have some quality of life stuff, like the ability to copy and paste from the terminal.

Next was updating and upgrading the stuff already in the container:

apt-get update && apt-get upgrade -y

Followed by adding curl, which will help install Tailscale but does not come in LXC containers by default:

apt-get install curl

Finally I was ready to install Tailscale. The Tailscale website has a handy one liner that lets you curl their install script and execute it. In a production environment, this is not something you want to do, but with a home lab, it's up to you to trust the vendor and inspect the script. I've used this before from Tailscale, so I went ahead and ran it:

curl -fsSL https://tailscale.com/install.sh | sh

Finally, I went back to /etc/ssh/sshd_config and uncommented the PermitRootLogin line to block logging in with just the root password again.

Configuration

Now Tailscale is installed. I could start it up, but all it would do right now is allow me connectivity back into this container. I want Tailscale running inside this container to be my access point to the rest of my home lab, including the other containers that will be running on Proxmox. To do that, we have to do three things: enable ip forwarding, grant Tailscale access to the host's network settings, and advertise the subnet that we want Tailscale to route traffic to.

IP Forwarding

This is probably the most straightforward of the tasks. I just needed to edit /etc/sysctl.conf and make sure the following lines are presented and uncommented:

net.ipv4.ip_forward=1

net.ipv6.conf.all.forwarding=1

I saved the changes and that's it for IP forwarding!

Host Networking

Because I'm running Tailscale in an unprivileged container, it does not have access to all of the host's hardware and settings. This is exactly what I want, but in this case I need to make an exception for network settings, as I want Tailscale to be able to route traffic outside its container. Tailscale has a great knowledge base article written up here about this.

The first step is to shut down the LXC container with Tailscale. Then, I opened a shell to the host Proxmox server and edited the file /etc/pve/lxc/[ID].conf, where [ID] is replaced with the ID of my Tailscale container.

Inside, at the very bottom, I added the following two lines:

lxc.cgroup2.devices.allow: c 10:200 rwm
lxc.mount.entry: /dev/net/tun dev/net/tun none bind,create=file

Those two lines will allow my Tailscale container to have access to Proxmox's network settings. I started the container back up, and all that is left is the last step.

Advertise Subnet Routes

Since I'd like to be able to reach the Proxmox web UI via Tailscale without installing Tailscale on the host itself, I need to configure Tailscale in a way that it extends my tailnet to Proxmox itself. Eventually, I will have other containers running here as well and I don't want to have to install Tailscale on each and every one just to have remote access. Tailscale has a feature called subnet routers that is designed just for this purpose. All that is needed is to run the right command when bringing Tailscale up in my container where it is installed, and accepting the routes in the admin console.

The first step is to finally turn on Tailscale with my subnet advertised:

tailscale up --advertise-routes=192.168.50.0/24

Because this is the first time I started Tailscale in this container, it generated a link that I needed to follow to authenticate the container to my Tailscale account. Once that is done, I was able to go into the admin console on the Tailscale web UI and click the Overflow (...) button on my container Tailscale node and select Edit Route Settings. Inside was the subnet I had just advertised and all I had to do was approve it.

Now I was able to use Tailscale to reach my Proxmox console remotely. One last thing I did was go back to the admin console, click the Overflow button again and choose Disable Key Expiry. Normally, Tailscale will refresh its keys for a node every few days, which isn't an issue on systems that you will use and interact with frequently (mobile devices and computers). Since this is a container I'd like to be the main remote entrypoint to my home lab without requiring much intervention, I don't want that - so disabling the key expiry prevents it from silently breaking on me.

Wrap-Up

And that's it! I have Tailscale properly configured and ready to be my remote entrypoint into any lab service I plan on installing. This configuration is a good foundation to start building out more services, as I'll write about in some future posts.

Rethinking My Personal Tech Stack

Thu, 01 Jan 2026 00:00:00 GMT

Like many people, I usually take the start of a new year as an opportunity to reevaluate some choices and habits I have and see if there is room for improvement. One such area up for review is the set of services, apps, and subscriptions I use outside of work.

Much of my current computing habits can trace their origins back to the 2000’s, where Gmail marked the start of Google’s growth beyond just search and ads and smartphones began to take off. That was a very long time ago, and a lot has changed in both my own habits and the tech industry since then - so I used most of 2025 to begin researching and evaluating alternatives to what I was currently using. I placed a point on emphasis on identifying options that were cross-platform (avoiding lock in to any one operating system) and gave preference to alternatives that were open source and/or allowed me to either self host or at least keep a copy of my data synced locally, in case of account lockout. I ended up finding a lot of great options, some of which I’ve switched over to entirely.

This post will highlight some of these alternatives - I plan on making more in depth posts about each one specifically at some point in the future.

Email

I got my Gmail invite sometime in the middle of 2004, and for over twenty years it has been my main email address. I've used some others over the years for various purposes (most notably my university one), but in the end I forwarded most of those to Gmail. For many years, the benefits Gmail provided were great - a really fast interface built for modern web browsers, ample storage space, and integration with the messaging service Google Talk, which everybody migrated to from AIM in the late 2000's.

Over time, each of these advantages have disappeared. Google fumbled Google Talk, turning it into Hangouts so it could be used in the Google+ push - and when that failed, it became Google Chat and Meet. The storage space became shared across other Google products like Drive and Photos, driving you towards buying more of it with a Google One subscription. And the Gmail interface has become pretty bloated - it's impossible to ignore the slow page load times when navigating through emails and organizing my inbox.

I looked around at a bunch of different email providers including Proton, Tuta, even Mailbox.org. In the end, I went with Fastmail, a service that has been around since 1999 but I had never paid much attention to before now. Fastmail earns its name with how quick its web interface is, and it focuses just on email, calendar, and contacts - although it does have some basic support for Files. Unlike Gmail, it has first class support for using your own domain name and many aliases, giving you flexibility in how you want to setup your mailbox and rules. Its ability to effortlessly migrate my Gmail inbox and calendar cannot be overlooked, and I love how it integrates with my password manager of choice, 1Password, to generate masked emails automatically if I have to sign up for something on a website. Fastmail also seems to be the only service that supports push notifications on the Apple Mail app (on both macOS and iOS) if that matters to you - a feature that not even iCloud Mail can claim!

If you end up checking out Fastmail, feel free to use my referral link to save some money.

Photos

I was a pretty early user of Picasa, and that eventually morphed into Google Photos, transitioning my photos through to it. Through both Android devices and iPhone, I have been steadily adding to Google Photos over the years, but the number of pictures I store there really exploded once my kids were born. It has been a solid service all this time, but recently they have been shoving AI features to the forefront all over the app, which greatly annoys me. I had two other concerns with continuing to entrust all my photos to Google: being on a subscription treadmill to a company that will most likely raise rates constantly over the years to come, and the lack of support if I was to ever become locked out of my Google account.

So for most of 2025, I've been experimenting with two different services - Ente Photos and Immich. Both are open source and both can be self hosted, but Ente Photos allows you to pay for them to operate the service and backup your photos. All of the machine learning for facial recognition is done on your device and then the results are synced synced across the other devices. Immich relies on a server (more specifically, a machine learning container) to do all of that work. I like Immich's interface a little bit more, but I am not yet at the point where my home lab is robust enough to entrust it with being the primary mechanism to organize and store my photos for me and the rest of the family.

So for now, I've moved onto Ente Photos, which I quite like. The facial recognition confused my two sons quite a bit initially, but once I spent a little time training it, it's been much better. The iOS app also feels a lot snappier than either Google Photos or Immich, which the less technical members of my family appreciate. One day in the future I may switch over to Immich, once its a little more mature and my home lab is built out a bit more (including a robust backup strategy).

If you want to check out Ente, you can use the referral code MISCZAK to get 10 GB for free once you sign up.

Browser

I've been using Firefox for a number of years due to its Multi-Account Containers feature to isolate trackers from companies like Facebook, Amazon, and Google, and I doubled down on my usage of it this past year when Chrome disabled Manifest V2, thereby curtailing support for extensions like uBlock Origin. I am not opposed to paying for content and services on the web (as this blog post will illustrate), but I find it hard to tolerate the onslaught of ad tech that appears everywhere. Firefox also added vertical tabs this year, a feature that I've found that I cannot live without - it's just a much smarter way of keeping tabs organized and readable, even when you have a bunch of them open.

One area where I'm not satisfied is my ability to use the same browser on iOS. Firefox is essentially a reskin of Safari there, but it can't use any extensions, which means that I'm once again without ad blocking. Because of this, I have tended to use Safari on my iPhone with some ad blocking extensions installed. If Apple would allow true browser competition in the app store, I would most likely use an enhanced version of Firefox there.

Files

Much like I used Gmail and Google Photos previously, Google Drive became my de facto way of keeping files synced across machines. However, I can count on one hand the number of times over the last decade that I've needed access to some documents when I'm not at home - and when that does happen, I usually have time to prepare and bring them with me.

Therefore, I've moved my primary file storage to my Synology NAS, which allows me to keep them synced across my desktop and laptop when at home. I can also use Synology's built in backup and sync features to keep a copy with a cloud provider if I want to - which I do with Dropbox.

I had forgotten that I had about 17 GB of free storage there, as I was an early adopter and beta tester of their Linux client in 2008, and was able to refer a number of friends in exchange for free storage. 17 GB is more enough to keep an off-site backup of some of my most important files, and their integration with Fastmail makes it easy to include them as attachments if I do need them one day.

Notes

Finally, a service that isn't Google! While I did try Google Keep about a decade ago when I had an Android phone, Apple Notes has been the go-to for a lot of quick notes over the last few years. I have been using Obsidian during that time for more detailed notes, like technical documentation that relies on Markdown, which Apple Notes doesn't support. I like how Obsidian is essentially a nested folder structure of .md files on my local storage, so I have consolidated all my note taking into Obsidian for now.

One thing that has held me back from Obsidian in the past is the difficulty in syncing it to all platforms I want to use. I use Obsidian on my iPhone, my laptop (Macbook Pro), and my desktop PC. To keep Obsidian in sync on the iPhone, I could keep it in iCloud Storage - but then that complicates syncing it to Windows/Linux devices. If I use something like Dropbox, then I don't have automatic, consistent sync on iPhone. In the end, I ended up resubscribing to Obsidian's built-in sync service, where I am grandfathered into an early bird discount.

I did pick up a trial of Notesnook in order to take it for a spin, but I don't like the interface as much as Obsidian.

Search

While I have used Google Search for a long time, their results have recently fallen off a cliff. It's become almost impossible to find good results when entering search queries; most times, I'm relieved if there's a Reddit discussion as one of the top results since it will usually have some relevance and quality.

For that reason, I'm switched to Kagi as my search engine. The professional plan seems quite steep at first ($10/month for search!) but it cannot be overstated how nice it is to have search results that are actually usable again. Kagi also has a lot of features that make it really easy to finetune your results to improve your signal-to-noise ratio, such as raising/lowering domains in your search results (and comparing your choices to a global leaderboard) and the ability to filter results through lenses, which are preconfigured lists of sources, so that you can get meaningful results even for queries that contain common names or phrases.

Kagi is also working on a browser (Orion) and a Maps service, both of which I'll be keeping an eye on.

Operating System

While I continue to use a Macbook Pro for my day job and as a personal laptop, I've maintained a personal desktop for nearly 20 years that has always been Windows based as its primary purpose is PC gaming. However, Windows has become so enshittified over the last year or two that i felt like my hand was forced to jump into Linux on the PC, even if there will be some casualties (like multiplayer games that use anticheat that is unsupported on Linux).

So before the holidays last month, I installed CachyOS on my desktop and haven't looked back. This will definitely get a more in depth post at some point, but I am enjoying how lightweight and snappy it feels. I haven't noticed any adverse impact on game performance in the titles I've tried - which include Star Wars Jedi Survivor, Witcher 3, Dead Space Remake, and Hades 2. It's my first time really using an Arch-based distro as a daily driver, and the fact that most of the stuff I need can be installed right out of the Arch User Repository has made everything really easy.

Refreshing the List

I plan on continuing to evaluate additional options, and will keep an up to date list on what I currently use on my About page. I also plan on making a post every year with any major changes or developments.

About the Ruby Central Security Incident

Fri, 10 Oct 2025 00:00:00 GMT

Recently, Ruby Central, the non-profit responsible for maintaining the RubyGems package manager, suffered a security incident where they temporarily lost control of their AWS account. They have since published a post-mortem of this event that is ostensibly aimed at putting their community's minds at ease. Unfortunately, it had the opposite effect on me, causing me to come away with more questions than answers.

The Incident

Let's begin by looking at the timeline that is included in the post-mortem. All of the events in the initial timeline take place within a few hours on September 30, 2025. Three particular entries stand out:

17:23 UTC: A former maintainer, André Arko, emails the Director of Open Source at Ruby Central stating that he still has access to the RubyGems.org production environment and associated monitoring tools.

17:30 UTC: Joel Drapper (unaffiliated with Ruby Central) publishes a public blog post within minutes describing this access with screenshots taken earlier that day showing root account access.

18:20 UTC: Ruby Central begins its emergency review and learns that the existing credentials for the AWS root account in our password vault are no longer valid.

This is the earliest time that Ruby Central learns something is amiss with their AWS account. However, if you scroll down to their "Analysis of Events" section you'll see a number of events stretching all the way back to September 18. It turns out that this root password change took place over ten days prior on September 19, 2025, without anyone at Ruby Central being any the wiser about it. This is an astonishing admission, most notably because AWS will send an email notification to the root user email address whenever the password is changed. Apparently, nobody at Ruby Central is looking at the emails sent to this address - otherwise, they would have noticed this initial reset on September 19, eleven days before they eventually did.

They are also extremely fortunate that the threat actor did not end up changing the root account email as well; there seems to be a reason for that, which I'll discuss below. Since this email was not changed, they are able to regain control of the account in the same manner that they lost it:

18:24 UTC: Ruby Central initiates an AWS password-reset procedure, validates multi-factor authentication, and regains control of the AWS root account.

An Incomplete Post-Mortem

What's even more perplexing is how quickly this post-mortem jumps to to definitively concluding the scope of the incident. Writing under the section "Extent of the Incident":

After a careful review, Ruby Central is relieved to report that we see no evidence that this security incident compromised end user data, accounts, gems, or infrastructure availability In addition:

RubyGems.org remained fully operational throughout.

No personally identifiable information (PII) of RubyGems.org users nor Ruby Central financial data was accessed or transferred.

The production database, S3 buckets, and CI/CD pipeline were unaffected.

At first glance, this feels like a pretty positive outcome of this event. They reclaimed control of the account and it looks like no irreversible damage occurred. But it's the next section that makes me question just how much we can trust these statements:

After regaining control of the AWS account, Ruby Central:

Revoked all existing root and IAM credentials, created new MFA-protected access, and moved them to a restricted vault with per-user audit logs.

Rotated all related secrets and tokens, including DataDog, GitHub Actions, and other external system integrations.

Enabled AWS CloudTrail, GuardDuty, and DataDog alerting for any root login, password change, or IAM modification.

The third item there is a pretty big red flag. Prior to this incident, they weren't configuring CloudTrail (audit logs), Guard Duty (threat detection), or just basic alerting for the account. Without this context being collected and (ideally) being forwarded to a separate repository outside of AWS that would require separate access to tamper with, it casts a lot of doubt on their earlier conclusions.

As bad as that line sounds, AWS still collects and retains 90 days of CloudTrail logs for your account by default, something that can not be disabled. Since the whole timeline of this incident is a little less than two weeks, that falls well within the retention period for an investigation.

But (and this is a big but!) the issue is that this default CloudTrail configuration only includes management events. Data events that capture actions like users downloading objects from S3 buckets, users uploading objects to S3 buckets, API activity on an RDS database cluster, and executing Lambda functions are not included in that default configuration. Therefore, it is quite possible that Ruby Central does not have any logs that can conclusively say that the actor did not download objects from S3 or touch the production database. Of course, they may know this, which is why their statement is structured to say they "see no evidence" - they don't have any to check.

I would've also hoped for them to provide a full list of actions the actor took that they can see in the CloudTrail management events to further support their timeline of events - but alas, that was not included either.

The Other Side of the Story

The plot thickens, however, with a blog post published by the alleged actor, André Arko. He acknowledges the rotation of the root user password:

Given Marty’s claims, the sudden permission deletions made no sense. Worried about the possibility of hacked accounts or some sort of social engineering, I took action as the primary on-call engineer to lock down the AWS account and prevent any actions by possible attackers. I did not change the email addresses on any accounts, leaving them all owned by a team-shared email at rubycentral.org, to ensure the organization retained overall control of the accounts, even if individuals were somehow taking unauthorized actions.

This sequence of actions is pretty interesting for a few reasons. First, if I am the on-call engineer and a security incident is occurring that forces me to change a credential, I am absolutely notifying the other parties that may need that credential that I am rotating it. This is even more important when its something as significant as an AWS account's root credential. In any incident response process, I am also updating the credential with the rotated value in the shared password vault/secrets manager that my team would use. If the rest of the team is asleep or its the weekend, I might send them a note or a Slack message - but as soon as they come back into business hours, I would make sure they acknowledged the change. Instead, André waits for 11 days for any kind of acknowledgement:

Within a couple of days, Ruby Central made an (unsigned) public statement, and various board members agreed to talk directly to maintainers. At that point, I realized that what I thought might have been a malicious takeover was both legitimate and deliberate

The fact that André waited so long to acknowledge this change in credentials and his prolonged access initially made me have some suspicion about his intention here. However, the fact that he left the root user email address the same (thereby enabling Ruby Central's account recovery days later) shows that maybe his actions were just misguided. And he seems to call out the pretty immature security posture of Ruby Central in general:

In addition to ignoring the (huge) question of how Ruby Central failed to secure their AWS Root credentials for almost two weeks, and appearing to only be aware of it because I reported it to them, their reply also failed to ask whether any other shared credentials might still be valid. There were more.

Conclusion

All in all, this whole incident and the write-ups that have come out of it do not reflect well on the Ruby community, and would very much shake my confidence in the ability of Ruby Central to be good stewards of RubyGems moving forward.

Recapping the Biggest Pre:Invent Announcements

Wed, 20 Nov 2024 00:00:00 GMT

While AWS re:Invent doesn't officially kick off until December 2, we are now officially in the lead up to the event, a time that often seea a flurry of new feature and service announcements. While these announcements are often overshadowed by new products that debut at the conference, they are often just as important (if not more so) to the work I do every day. This week has been the strongest example of this trend to date, with several announcements that made me genuinely excited. In the interest of processing that enthusiasm, I decided to write a recap of what I view to be the biggest ones.

Centrally Managing Root Access for Customers Using AWS Organizations

Blog Post

If you've gone through the AWS Trusted Advisor journey with your account team, you're most likely familiar with one of the automated findings being that your accounts' root users are not configured with MFA. This finding was an example of AWS' own framework not really keeping up with the practices that most security teams were pursuing; that is, not enabling root user accounts at all for any new accounts added to your AWS Organization. Even in these mature environments, there were some accounts that predated the arrival of the AWS Organizations feature, which meant that root users did exist, and the credentials still existed somewhere posting some sort of risk. Often times, security teams had to go through an exercise of tracking down these legacy credentials, making sure they were stored securely, rotating them at regular intervals, and adding some form of MFA - a lot of work for something that shouldn't really exist anymore, but was still needed for actions like unlocking S3 bucket policies.

This new feature allows a security team to simply remove root user credentials from all accounts in their AWS Organization, block any attempt to 'recover' them in the future (by authorized or unauthorized party), and use short-lived root sessions to still perform the actions that require root access, preserving compatibility. Put together, these feature enhancements support the existing best practices (namely, avoiding IAM users and always choosing short lived sessions with roles), finally bringing privileged user management in alignment with regular user management. I would expect many teams to begin putting this into practice shortly if they haven't done so already.

Block Public Access for Amazon Virtual Private Cloud

Blog Post

VPCs are an essential component of any AWS account, allowing for multiple workloads in a single AWS account to maintain network isolation. Traffic in and out of VPCs can be governed in a few different ways, with security group rules typically being the most popular mechanism for teams to use. One of the problems with that approach, however, is that security group rules cannot be controlled proactively with Service Control Policies, unlike many other areas prone to misconfigurations in AWS. This gap forces teams to often be reactive when it comes to dealing with overly permissive security groups.

Block Public Access gives security teams some peace of mind by providing a proactive tool to override any misconfiguration done at the individual VPC level itself. Similar to the S3 Block Public Access setting that acts as a master switch to compensate for bad bucket policies, teams can use Block Public Access for Amazon VPC to limit internet connectivity for just ingress traffic, or both ingress and egress. When blocking just ingress, any egress traffic must go through a NAT Gateway or Egress-Only Internet Gateway (for IPv6). The best part of this feature, however, is the ability to add granular exclusions to allow individual VPCs or subnets to bypass this enforcement - so you can still make exceptions when you need to while keeping the safety net intact for everything else.

This capability already has me thinking about a multitude of ways we can use this in our organization, ranging from driving compliance with network architecture standards and best practices to playing a critical role in incident response, allowing us to quickly isolate a VPC during an investigation if necessary.

Resource Control Policies

Blog Post

At AWS re:Inforce this year, I had a conversation with a few AWS people who seemed to imply that they'd like to introduce more security boundary controls on resources themselves - and that seems to be the case with Resource Control Policies (RCPs). Following the debut of Service Control Policies (SCPs) in 2019, RCPs give your team the ability to set universal access controls right on a resource itself. These can be used to block specific actions to an entire service or on specific resources themselves. A key distinction is that SCPs get evaluated for principals trying to perform an action; RCPs get evaluated when a resource receives an access request.

RCPs get attached to OUs and specific AWS accounts in a similar way to SCPs, and therefore should be tested just as much before rollout. However, I foresee the time to develop and implement new RCPs will be significantly quicker than SCPs in organizations, as it will be easier to scope RCPs to just protect specific resources (such as security tooling). Your security team can prevent anyone else from using, modifying or removing a resource that it uses to integrate with a configuration management database or cloud security posture management (CSPM) tool, while still allowing regular users to have full control over other resources in their account.

Scaling to 0 capacity with Aurora Serverless v2

Blog Post

Okay, this one may not be as directly related to security as the others, but it fixes a personal pain point I've experienced in the past, so I wanted to highlight it. Several years ago when I stood up the first version of my team's configuration management database, we were using Aurora Serverless v1. One of the reasons we used Aurora Serverless v1 at the time was the fact that it could scale down to zero active nodes, only spinning up when needed. This allowed it to sit 'idle' for the 23 hours of the day that we weren't running the sync job for our configuration management tool, significantly reducing the cost of operating this solution. Aurora Serverless v2 had several benefits we wanted to use, such as IAM authentication, but only scaled down to 0.5 Aurora Capcity Units (ACU) per hour. We ended up migrating to a MongoDB Atlas database for this and other reasons.

Therefore, I'm glad to see that Aurora Serverless v2 now pauses after a period of inactivity, with no charges for compute while paused. This change makes it a viable option again for some smaller services and projects I want to build without worrying about throwing money away if it sits idle for prolonged periods of time.

An Easier Way to Enable IMDS Defaults Across All Regions

Thu, 28 Mar 2024 00:00:00 GMT

An update on AWS enabling account-level defaults for the Instance Metadata Service.

Introduction

EC2 instances in AWS can have access to something called the instance metadata service, which makes information (namely metadata) about an instance available to applications, services, code, etc. that runs on the instance. For example, a piece of code can query the metadata service to learn what region it is currently running in.

When an EC2 instance is assigned an instance profile that grants it permissions to access other AWS services, the temporary credentials for that instance profile can also be retrieved from the instance metadata service. This setup is extremely useful, as it allows your code to make authenticated calls to AWS APIs without having to use hardcoded credentials, environment variables, or frequent calls to some off-host credential manager.

However, the initial version of the Instance Metadata Service (IMDSv1) had a few issues that opened the door for Server Side Request Forgery attacks against applications running on an EC2 instance to steal AWS credentials from IMDS and use them to take action elsewhere in an account. This flaw was the cause of the Capital One breach back in 2019 as well as a number of other security incidents across the industry.

Since then, AWS made several improvements in a second version of IMDS called, obviously, IMDSv2. However, this setting was not turned on by default for fear of breaking compatibility with existing workloads. Users can opt into this newer version of IMDS when launching new instances, putting the onus on security teams to put up guard rails to ensure that teams were moving to IMDSv2 and burning down the existing use of IMDSv1.

A New Default

Back in November 2023, AWS announced that by mid 2024, all EC2 instance types will only use IMDSv2. At the same time, they made it so that all quick launches through the AWS console would only use IMDSv2. These were welcome changes, but did not provide additional tools to allow admins to begin enforcing defaults before the mid-2024 date.

All of that changed this past week, as AWS finally rolled out the ability to set defaults for the Instance Metadata Service at the account level. However, these settings are specific to a single region, meaning that if your account(s) use a large number of regions, you would have to go region by region (through either the CLI or console) to change these settings.

That inspired me to throw together a very, very quick Python script that aims to first look at the active regions in an account and then cycle through each region and set these new defaults in an automated fashion. It aims to be the least disruptive to existing workflows by only changing the IMDS required version and hop limit, leaving the default setting for enabling IMDS and the ability to pass tags to it alone, unless otherwise specified. You can find this script here.

While this script is scoped to a single account, if you administer multiple accounts, you can combine this with tools such as aws-sso-util and aws-vault to quickly change this setting across your entire fleet.

I hope this helps others speed up the rollout of IMDSv2 defaults across their organization and allows us to collectively put this issue to bed by the end of 2024.

Google's Professional Cloud Security Engineer Certification

Fri, 18 Aug 2023 00:00:00 GMT

My experience preparing for and passing the GCP Professional Cloud Security Engineer exam.

Introduction

Earlier this year, my team was provided access to a training budget for Google Cloud Platform that we could use in various ways to purchase classroom trainings or licenses to Google's on-demand training platform, Cloud Skills Boost as well as a number of exam vouchers that could be used on various GCP certification exams. While I had previously worked almost exclusively with AWS (only using GCP for some testing of private service connect), I knew that I had some upcoming projects that required me to build services on GCP. For that reason, I jumped at the opportunity to take advantage of this training and hopefully prepare myself enough to write an exam at the end of it.

Unbeknownst to me at the time, Google was also offering us a number of seats in what they call their Certification Journey. This program is essentially a focused, six week schedule to work through the Cloud Skills Boost content for your selected certification, with a few extra bonuses layered in. After doing some of the introductory Cloud Skills Boost content and building on GCP myself for a few months, I decided to enroll in the Certification Journey for the Professional Cloud Security Engineer certification. My cohort was scheduled to start in the middle of May, wrapping up by the end of June.

Preparing for the Exam

The Certification Journey was an interesting program that lays somewhere between formal classroom SANS training an the open free-for-all that is regular Udemy courses with Discords attached. First, you don't undertake the program alone; you are scheduled into a cohort with others who are aiming for the same certification at that time, and have opted into the program themselves. There's a Google Group set up for discussion among members of the cohort on topics covered in the program, although mine was pretty sparsely used.

The other interesting wrinkle to the Certification Journey program is that your cohort is assigned an instructor, who holds group "office hours" once a week for 90 minutes to go over that week's content, walk you through sample questions similar to those found on the exam, and most importantly explain to you why the correct answer is the right answer. I was skeptical of these sessions initially, but soon came to find them incredibly valuable. Our instructor pointed out pitfalls I had missed when reading documentation or playing around in GCP myself - such as the fact that BigQuery has its data access logs enabled by default, the only service in GCP to do so.

The Cloud Skills Boost content is, for the most part, of a very high quality. This content is accessible outside the Certification Journey program with just a regular license that runs $29 per month or $300 per year. There are some lectures in there that are less interesting than others, but they're usually kept to byte-sized videos (4-7 minutes in length) that make it easy to fit in between other tasks or meetings in your day. The real value of Cloud Slills Boost, however, are the labs, as they spin up an ephemeral project for you to create resources and mess around in. Once the lab is complete, they tear down the project and any resources within. It's great for peace of mind, as you don't have to worry about making sure you disposed of every resource you used in a lab that could end up billing you.

In addition to all this content, each week there was a series of links to documentation, blog posts, and videos for further reading and watching. I held off on watching these until I got through most of the Cloud Skills Boost content, knowing that each one would be part of a deeper dive into a service or feature. In the last week or so before my exam, I was working through these links non-stop and watching any video I could find on each service. There are some services I would probably never get the chance to play around with hands on (like setting up Cloud Interconnect for the first time), but I felt prepared for everything else.

Interestingly, I didn't feel the need to do a practice exam, as the sample questions from the instructor office hours gave me a good preview of what the questions from the exam itself would be like. These sample questions were scenario based, with each possible answer often being a series of steps and actions you would take in GCP - not just simple recall of a service's name or feature. The questions also leaned away from any "gotchas" where the right answer is surrounded by wrong answers that are just spelled differently, or something similar. This setup made me feel confident that I wouldn't have to memorize the exact form of every IAM predefined role or setting, as just knowing what role types would have what permissions would suffice.

Sitting the Exam

I've done a couple of certification exams before, and the registration and check-in process was pretty similar to those. The exam itself was 45 questions, and I flew through the first ten questions before finding the next thirty-five much more detailed and challenging. However, I didn't have any questions where I was completely at a loss to answer - instead, if I wasn't 100% sure of my answer, I marked it for review and moved on. At the end of my first pass of the exam, I had about nineteen questions marked for review and did another loop through those, which whittled it down to nine questions I still was 50/50 on between two answers. I spent some additional time on those and then submitted my exam. Total time was about one hour, ten minutes.

Results

The test results screen immediately showed that I had achieved a provisional pass, but would have to wait for an email from Google Cloud for confirmation. This took about a day and a half to arrive, with links to setup a profile for my badge on Accredible. There was also a token to use in the swag store for those who passed Professional tier GCP exams, which allowed me to have a certification welcome kit to be mailed to me. I've heard in years past this used to be a hoodie or something similar, but the only option on the store available to me was the thermal mug seen in the picture below.

All in all, I would say my experience of preparing for and achieving this certification was a positive one. The material did not feel endless, instead really focusing in on the best ways to set up secure organizations and projects in GCP. I came to appreciate the format of the questions, which avoided focusing on what I would call trivia and instead emphasizing really understanding the sequence of steps and nuance needed for good security.

Automatically tagging resources in AWS with Owner Information

Mon, 31 Jul 2023 00:00:00 GMT

A detailed write-up on extending an AWS Cloud Operations & Migrations Blog post example for automatically tagging EC2 instances in AWS.

Background

One of AWS' best practices for building and managing infrastructure in the cloud is to use consistent, accurate tags for the purposes of cost management, correct owner attribution (during operations and security incidents), and even attribute-based access control. This type of tag compliance can be easier said than done, however, especially when dealing with an organization that has a number of accounts owned by different teams with very different tooling and working styles.

My team has dealt with a number of security incidents over the last year where finding the correct owner of an EC2 instance took too much time, in our eyes, at the beginning of the incident. No tags were added to the instance besides Name, and while CloudTrail would capture the the events that created the resource (and the principal involved), some resources predated our currently retained CloudTrail logs - meaning we did not have a record of the user that created them.

While it is possible to require tags to be added to resources at the time of creation (otherwise blocking the creation of the resource), my team preferred not to pursue this approach for a few reasons. Implementing that type of compliance check would take a very long time to incorporate into the various workflows across the dozens of accounts for which we have oversight. Moreover, restricting resource creation until somebody enters the correct tag information can make the security team look like blockers instead of team players, which is something we are always looking to avoid.

For these reasons, I wanted to see what could be done to leverage information AWS has about our resources in order to attach tags about the owner to the instance's metadata itself without, so it would live alongside the instance for the duration of its lifetime. The preferred approach would not require any additional work on behalf of the individual or team creating the resources in the first place, and not interfere with their existing workflows.

Researching Previous Work

A lot of times when trying to work on projects such as this one, I end up with the thought process that somebody, somewhere, has to have solved this particular problem before. So I began my research process across the Internet, Reddit, and even the Cloud Security Slack.

What became immediately apparent is that AWS already has this information available to you; however, it's just not in a format (or part of a service) that makes it easy to extract. There is a tag called aws:createdBy that can be activated in the Billing Console via the management account, after which AWS will apply it to a subset of resources.

The only issue with this tag is that it is only available in the Billing console and reports - it does not appear alongside other tags for your resources. There are some great projects out there that look to make this report information more easily queryable - but they still didn't feel like the right fit for us, as many of the projects pushed the tag information into a separate repository, with tag information about newly created resources susceptible to a delay.

Finally, I found a post from the AWS Cloud Operations & Migrations blog from back in November 2020 that was exactly what I was looking for. It was a solution that used CloudWatch events and a Lambda function to read incoming RunInstances CloudTrail events and use the information from within the event to tag the newly created resource. It even had sample Python code for the Lambda function! There were only one problem with this example that prevented it from being a slam dunk - it was focused on only one account and one region. To make this solution useful in our environment, we would have to extend it to work across all of the accounts in a certain OU, and across all regions.

The Auto Tagging Solution, Explained

In case you don't want to click through to the AWS blog post itself, I'll briefly explain the solution. When a principal starts an EC2 instance in AWS, a RunInstances event is generated and logged by CloudTrail. These events contain a variety of information, including the time of the event, the principal that triggered it, the account that the instance was created in, and the region that the instance was created in. All of this information is extremely useful to capture.

A service called AWS EventBridge (formerly CloudWatch Events) allows for rules to be created that can monitor CloudTrail for specific events, and then perform some action based on the event. The auto tagging solution includes a rule that monitors for the RunInstances events, and then sends the information from a captured event to an AWS Lambda function. This Lambda function reads the information in the event (mostly the instance ID and principal information), and then uses that information to apply a tag to the instance with the information about the principal.

The chief limitation of this solution, as presented in the blog post, is that CloudTrail in a given region will only capture the RunInstances events that happen in that region. Additionally, a Lambda function would only be able to apply tags to instances in the same account as the function itself.

Modifying the AWS Example

The first problem - that the example solution only covered a single region - can be solved easily enough. Whereas the example used a single Event Rule, we would have to end up deploying the same rule in every region (of every account) that we planned on supporting as part of this solution. This requirement could be met by running a CloudFormation StackSet that created the rule in each region that each account uses.

The second problem - that the example focused on just one account - was a little trickier to think about. Event rules can send events to Lambda functions only in the same account as the rule. We needed a way to send events from other accounts in the organization to our account, and then another mechanism to apply the tags back onto the correct instances in those various accounts.

We ended up creating an event bus in our Security account, solely for the purpose of receiving these resource creation events from other accounts. Each event rule would send their events to this event bus instead of directly to a Lambda function. The bus, in turn, would be the doorway to our Lambda function, as it would live in the same account as the function itself.

Next, we created two IAM roles in each managed account. One role has the PutEvent permission, used by the Event Rule to send events to the event bus in the Security account. The second role has the ec2:CreateTags and ec2:DescribeVolumes permission, and its trust policy allows our Lambda's execution role to assume it. CloudFormation is again used to help deploy these to all accounts in scope. It's best to deploy the CloudFormation for creating the IAM roles first, as you can then reference the PutEvent role in your CloudFormation for deploying the Event Rule itself.

Lastly, we added permissions to our Lambda execution role; namely, a policy to assume the auto tagging role in any account:

{
    "Statement": [
        {
            "Action": [
                "sts:AssumeRole"
            ],
            "Effect": "Allow",
            "Resource": "arn:aws:iam::*:role/auto-tag-role",
            "Sid": "ResourceAutoTaggerAssumeRole"
        }
    ],
    "Version": "2012-10-17"
}

Once the Lambda function was invoked, it would mostly use the code provided by the AWS blog post example. However, we modified a couple of key areas. We used the source code from the blog post example GitHub repo and set about modifying resource-auto-tager.py. In the lambda_handler function, we added some code to pull the account ID and region from the CloudTrail event:

# Parse the passed CloudTrail event and extract pertinent EC2 launch fields
    event_fields = cloudtrail_event_parser(event)
    accountId = event_fields.get("account_id")
    region = event_fields.get("region")
    log.info(f"Instance created in Account: {accountId} in region: {region}\n")

We then pass accountID and region to the set_ec2_instance_attached_vols_tags function, alongside the ec2_instance_id and resource_tags as seen in the example. Once inside that function, we use the account_id to assume the IAM role that we deployed to every account for the purposes of applying the tags, and build an EC2 client object on the back of that assumed role session and the region.

def set_ec2_instance_attached_vols_tags(ec2_instance_id, resource_tags, account_id, region):
    try:
            # First assume the role created in the account that has permission to tag the resources
            log.info("Attempting to assume role in target account\n")
            assumed_role_response = sts_client.assume_role(
                RoleArn=f"arn:aws:iam::{account_id}:role/auto-tag-role",
                RoleSessionName="auto-tag-session"
            )
            assumed_role_session = boto3.Session(aws_access_key_id=assumed_role_response['Credentials']['AccessKeyId'],
                        aws_secret_access_key=assumed_role_response['Credentials']['SecretAccessKey'],
                        aws_session_token=assumed_role_response['Credentials']['SessionToken'])
            
            assumed_role_ec2_client = assumed_role_session.client("ec2", region_name=region)
            log.info("Assumed role in target account successfully\n")

Because we have used CloudFormation to deploy the same role to every account, it will be waiting for us, no matter which account_id gets passed in here and used as part of the ARN. And because the Lambda's execution role is trusted by each of those roles in the various accounts, it will be able to assume it without issue. The rest of the code is almost entirely unmodified.

Stepping back and looking at the complete solution, it can be summarized by this diagram:

Applying the Service Control Policy

Once the tags are created, we want to prevent someone from removing them, either accidentally or in an attempt to cover their tracks. This can be handled by a pretty straightforward service control policy (SCP) that just blocks the tags that you have decided to apply. In this case, we denied the ec2:DeleteTags action for the owner and dateCreated tags, and attached it to the OU of accounts where we would deploy the solution.

data "aws_iam_policy_document" "auto_tag_delete_policy" {
    statement {
        effect          = "Deny"
        actions         = [
            "ec2:DeleteTags"
        ]
        resources       = ["*"]
        condition {
            test        = "ForAnyValue:StringEquals"
            variable    = "aws:TagKeys"

            values      = [
                "dateCreated",
                "owner"
            ]
        }
    }
}

resource "aws_organizations_policy" "prevent_auto_tag_delete" {
    name                = "prevent-auto-tag-modification"
    description         = "Prevents removal of tags automatically applied by InfoSec"

    content             = data.aws_iam_policy_document.auto_tag_delete_policy.json
}

resource "aws_organizations_policy_attachment" "managed_ou"{
    policy_id           = aws_organizations_policy.prevent_auto_tag_delete.id
    target_id           = var.managed_ou
}

Results and Additional Considerations

Once everything was in place and enabled, tags began being automatically applied to our EC2 instances as they were created.

Through an extensive testing and monitoring process, I learned a few things and wanted to point out some additional things to keep in mind:

Regions further away from where your event bus and Lambda function are will take a few seconds longer to have tags applied to them, due to round trip times. Make sure you adjust the default Lambda timeout threshold to something like 3 minutes just to make sure you're not losing any tagging due to a slightly delayed event.
While the SCP prevents someone from deleting the tags, it does nothing to stop them from modifying them. Overwriting tags falls under the ec2:CreateTags permission, which is what is used to apply these tags in the first place. Therefore, it would be trivial for someone to edit the owner tag after the fact to cover their own tracks. It is recommended you combine this solution with a tool or service that takes routine inventory snapshots of your instance fleet, such as CloudQuery so that you can track changes to tags over time.
We actually ended up adding an additional tag called "WhatIsThis" with a value that was a URL that pointed to a page on our internal security wiki explaining what auto-tagging was and why it was being applied to resources. While we sent out communications to the account owners and administrators prior to rolling the solution out, there was still going to be some users who would be confused why certain tags were being applied to instances that they had created. Depending on your organization's culture, it may be a good idea to include something like this to prevent a torrent of emails or Slack messages inquiring about the tags.
If your organization has teams already using infrastructure as code to manage their resources, they may need to add some configuration to ignore these tags as sources of drift. For example, Terraform has an ignore_tags configuration block for the AWS provider that will allow the automatically applied tags to be ignored.
For accounts that routinely spin up large numbers of EC2 instances at the same time, make sure you review the current rate limits and quotas for the DescribeInstances, CreateTags, and DescribeVolumes API calls, as you can quickly hit the rate limit when large numbers of resources are created concurrently.
Although this solution works great for newly created EC2 instances, it does nothing for instances that already exist. For those use cases, you're probably better off using something like Mark Wolfe's excellent project that automates pushing Cost and Usage report information into an Athena table.

Building on GCP

Mon, 27 Feb 2023 00:00:00 GMT

Recently, I had to complete a project that involved running the open source tool Cloudquery to create an inventory of resources in a GCP organization. This assignment turned out to be a great introduction to learning how Google Cloud Platform works, as I had almost exclusively used AWS previously (with only minor trials in both Azure and GCP during that time). Throughout the project, I found myself often mapping certain concepts back to what they would be called or how they would be done in AWS. In doing so, I started to keep score of what I liked more than AWS and what I liked less - and now I'm finally sitting down to organize some of those thoughts.

What I Like More on GCP

1. The Organizaton-Project-Folder Hierarchy

In GCP, the top node in your environment is called an organization; under that, you can have projects, which act like AWS accounts but tend to be even more logically segmented. The reason for that strict segmentation is that you'll often place projects under folders to better organize them under a team or department or business unit. Projects allow you to easily create and control very fine grained permissions, as you can place just a few resources in a project to limit the blast radius of any permissions granted there - without having to implement lots of IAM policies.

AWS has a concept of an organization for central management, but it is by no means a requirement to get started. You can still see its DNA as a feature that came later on, as you have to invite existing accounts to the organization. And the management account still runs the show in the organization, so you'll still have one account that quite a bit more powerful than the others.

2. Almost Every Aspect of IAM

Building on the organization-folder-project hierarchy, the IAM structure in GCP immediately seems more logical once you understand how it was constructed. Every principal in GCP is tied to an email address - a byproduct of an organization requiring a domain to get started. That principal's email address can be added to any organization, folder, or project and granted permissions in that entity. These permissions can cascade from the top of the hierarchy downwards; for example, you can grant permissions at the Engineering folder level and have it apply to that principal for every project underneath it.

Where this really becomes powerful (and convenient) is providing different permissions at different levels. For example, I may want to give the service account that my workload will use the 'Security Reviewer' predefined role at the organization level, which grants List permissions for nearly every service as well as Get (Read) permissions for many of them. Those permissions will cascade to every folder and therefore every project in that organization. But then, in a few specific projects, I can add the principal for my service account and grant it more specific, powerful permissions, such as Storage Object Creator to write objects to Cloud Storage Buckets in those projects. Essentially, two CLI commands are needed to set up these different tiers of permissions throughout the organization.

On the AWS side, I would most likely need to get roles or trust relationships added to every account in my organization for my workload to use to emulate this setup. And even then, the policies for those roles have to probably be tailored to specific resources because each account contains lots of resources my workload shouldn't have access to.

3. Less Tedious Networking (For the Most Part)

Some of the biggest differences in comparing GCP to AWS come at the network level; specifically, VPCs in GCP are global and go across regions, while subnets are regional and can go across zones. This allows me to setup a highly available workload spanning multiple regions while using just a single VPC. Additionally, I can spread my subnets across different zones for better fault tolerance. While you can easily emulate this on AWS with multi-region setups, there's some additional effort that goes into designing different VPCs for each region and different subnets for each zone.

The benefits become more clear when you realize that in GCP, you don't attach a CIDR block at the VPC level - instead, it's done at the subnet level. Such a setup creates an interesting tradeoff when it comes to firewall rules for these networks; while you can have a firewall rule apply for an entire global VPC and all of its subnets, that firewall rule is inherently tied to that VPC and cannot be repurposed elsewhere, like you can with security groups.

One last note here - I love how every subnet comes with an option to enable 'Private Google Access', which just allows Compute Engine instances in that subnet to reach the external IP addresses of GCP services even if they only have internal IP addresses. This is a really simple option to toggle without having to worry about setting up a VPC endpoint, as you do in AWS.

4. Focus on CLI commands, Even in the Console

For many pages that I visited in the console to create a resource or change a configuration, there was an option near the bottom that allowed me to see and copy the equivalent CLI command for the changes I had just made. By putting this front and center during console use (as opposed to burying it in a reference page somewhere), it became really simple for me to just copy those commands to my personal notes, making my configuration repeatable in the future if I wanted to replicate it in another project or environment.

What I Like More on AWS

1. No domain requirement

As I mentioned above, the organization in GCP is tightly coupled with a domain, for good reason - principals and resources become entities under that domain. But this can be a pain when trying to spin up a new environment as a sandbox for your team or if you just want to start a new project that can be brought under the management of an organization at a later date. While it's easy to undestand that every principal is an email address in GCP, I found myself still preferring to just reference a principal by account number and role name. Some of the email addresses for principals in GCP can get extremely long and be very close in terms of spelling and project IDs - I once granted the permission to the the Compute Engine Service Account used for my workload instead of the Compute System Service Agent.

2. APIs are Already Enabled

The first time you attempt to call the API of a service in a GCP project, you'll have to Enable it which requires you to accept the Terms of Service and billing responsibility for the API. While this is done presumably to prevent a lower privileged user from attempting to use and consume valuable (and costly) resources in your project, it can be extremely counterintuitive, especially when you are trying to engineer and debug tools that are reaching across projects and using lots of different APIs.

In the case of using the CloudQuery tool, I was getting lots of inconsistent results - for example, the tool was not returning any data for certain resources, like Cloud Functions, that I knew existed in the organization. You can get around this by either enabling all of the APIs in your project so that they can be used by your workload, or by granting your workload the serviceusage.services.enable permission so it can enable the APIs on its own as it needs to.

3. No Service Account/Service Agents Confusion

As mentioned in the first item of this section, I once granted the Compute Instance Admin predefined role to a GCP service account in my project instead of the Compute System Service Agent. This role was required to allow my Compute Engine instances to make use of Instance Schedules, which would start and stop them at predefined times.

The reason for this mix-up was partially due to the fact that Service Agents are hidden on the IAM page until you toggle the box labeled Include Google-provided role grants. I also found it counterintuitive because every other permission I needed for my workload to function correctly was granted to the service account that I had attached to the Compute Engine instance running the workload.

I can't recall having a mix-up like this while using AWS; there is a pretty clear map of what permissions need to be granted to which roles and how you can attach that role to compute workloads (instance profile, IRSA, etc.).

4. The ARN

The Amazon Resource Name, or ARN, is a unique identifier assigned to an AWS resource. An ARN can represent a role, an EC2 instance, an S3 bucket, a VPC, and so on. Maybe it is the fact that I learned AWS first, but I have become so accustomed to looking for the ARN and using that as the identifier for a resource that it felt like my toolbox was missing something without an equivalent in GCP. Instead, for some GCP resources, you'll have a name in the format of something like projects/[PROJECT NAME]/locations/[REGION]/functions/my-function, but that name is not as useful for you in other projects and across the organization. Whereas AWS will often ask you to provide the ARN for a resource in a permissions policy or for a principal in a trust policy, you're more likely to be splitting resources into projects in GCP.

5. Private Service Connect

While Private Google Access was easier to setup for the use case of a workload in a subnet with only private IPs trying to access Google services, Private Service Connect supports the use case of establishing a one way connection between your VPC to another VPC, whether that is in your organization or another organization - such as those of service providers. This makes it roughly equivalent to AWS' Private Link.

However, where Private Link allows you to create some pretty interesting architectures by providing it with peering for transitive routing and sharing of endpoints, Private Service Connect endpoints used for third party services can only be accessed from within the VPC where they are created, and only by subnets in the same region. You cannot use a peering connection to "hop" to the VPC and region where the Private Service Connect endpoint lives, and then use that to travel further along to your destination in a transitive manner.

The one exception to this is that you can use a Cloud VPN to get to the VPC with the Private Service Connect endpoint; in some cases, it may be recommended to use a Cloud VPN between two VPCs if you want to share a Private Service Connect endpoint between them. Obviously, this introduces additional complexity over the Private Link setup required by AWS.

Conclusion

All in all, I've enjoyed my time building on GCP, as it's been a great learning experience and a useful method of challenging assumptions that I've developed over time by heavily using AWS. Since I spent some time learning how it does IAM in and out, I feel pretty comfortable using it to support our current project and any future ones that may come up.

AWS re:Invent 2022 Security Recap

Thu, 15 Dec 2022 00:00:00 GMT

The annual AWS re:Invent conference has come and gone. As usual, there is an overwhelming amount of new product launches, feature enhancements, and other service offering announcements to parse through. You could spend several days just sifting through all of the information on 'new stuff'. What I was interested this time, however, was some of the announcements for security-related services and features, especially those that solve pain points I've experienced in the past.

Amazon Security Lake - Preview

This one seemed to get all the press and attention from the jump, and for good reason - it's an easy way to centralize data from security-related sources in both AWS and your on-premises environment. They're doing some automatic conversions to Open Cybersecurity Schema Framework for supposed interoperability. A number of AWS services like Route 53, CloudTrail, Lambda, S3, Security Hub, Guard Duty, and Macie will support this right away, but you can also create your own source integrations as long as you give your source the ability to write to Security Lake and invoke AWS Glue with the following policies:

AWSGlueServiceRole Managed Policy
The following inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3WriteRead",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject"
            ],
            "Resource": [
                    "arn:aws:s3:::aws-security-data-lake-{region}-xxxx/*"
            ]
        }
    ]
}

However, the aspect of this that really caught my eye was the ecosystem of third-party integrations they already available for use in the Preview. Security lake has source integrations for a lot of commonly used services like Ping Identity, Okta, Orca, CrowdStrike, Zscaler, and more - but the subscriber integrations is even more interesting. Sure, you can integrate with Splunk or Sumo Logic, but there's also large consulting firms such as PwC, Deloitte, Accenture, and others that offer to do the analysis and anomaly detection for you. Security Lake seems like it could be a boon to firms that act as Managed Security Service Providers (MSSPs) by really streamlining the ability to aggregate and provide access to data from an organization's disparate systems.

CloudWatch Logs Sensitive Data Protection

This one is for anyone who has tried to run an App Sec program and get various application teams to go back and fix their logging. Instead of trying to convince teams to spend precious sprint cycles on fixing logging, you can just set up a data protection policy in the application's CloudWatch Log Group and specify the data that you want to have redacted.

This approach doesn't have to be something you continuously come back and check on either - you can have alarms that fire on the LogEventsWithFindings metric that will tabulate how many times sensitive information was redacted in a log group. That metric could also be useful if you want to show improvement across teams as you burn down this particular area of risk. Additionally, you can offload reports of these findings to another log group, S3, or through Kinesis to the destination of your choosing.

CloudWatch Cross-Account Observability

AWS accounts are often treated as an isolation boundary in organizations, with individual teams having some level of control over their own account, even if they are part of the same AWS organization. However, there may be times when you want to implement some form of log or telemetry data capture from many accounts without imposing an unncessary burden on them with bespoke tooling or excessive permissions.

Cross-Account Observability in CloudWatch sets out to solve exactly that problem by allowing you to designate a central "monitoring account" and one or more "source accounts" that will feed the monitoring account data and logs. Instead of having to manually implement some form of regular data capture-and-forward, CloudWatch will do the plumbing for you, after you provide the list of source accounts and opt-in to the sharing from each source account.

One caveat with this feature - it will only work for the region it is configured in. If you span multiple regions in the source accounts, you'll have to configure this in each region to feed into the monitoring account in all of those regions.

VPC Lattice - Preview

I've been in a lot of AWS networking discussions that involve some combination of VPC Peering, Transit Gateways, Private Endpoints, and VPN attachments. Depending on the requirements, there's often at least one good answer and design that can be solutioned out, but it will often come with some drawbacks - additional infrastructure to set up and manage, additional safeguards that have to be put into place, or even a rearrangement of existing resources in terms of VPC and subnet topology.

VPC Lattice hopes to provide another solution for this type of problem by implementing a logical service network to abstract away the realities of networking and allow services in different accounts and VPCs to talk to each other via DNS. If the picture below reminds you of ELBs, you're not wrong - a lot of the same terminology and principals apply.

There's listeners that dictate what type of traffic is expected; those listeners have rules with priority and conditions to dictate which actions to take to forward traffic to the appropriate group of targets. The really nice part about this service, however, is that you can associate an IAM resource policy with individual services in the network to only allow certain services and principals access to designated services.

It's networking without networking - and yet it's all still networking.

KMS External Key Store

AWS services that aim to encrypt data at rest use a key for their specific service, known as the data encryption key. However, because the service needs access to that key, it has to remain with the service itself. But this setup means that the key likely lives next to the data it's protecting - if this one area of storage is compromised, everything is lost.

To solve this problem, the data encryption key is itself encrypted by a root key that the customer manages in AWS Key Management Service (KMS). The root key is generated and stored in a hardware security module (HSM) that is tamper resistant. The key material never leaves the HSM. This works fine if you are using KMS to manage your root key and only need to encrypt and decrypt data keys for AWS services - but what happens if you want to integrate with services that don't live on AWS and don't have any connectivity to KMS?

That's where AWS KMS External Key Store (XKS) comes into play. Instead of using AWS KMS and forcing every service to talk to it, your root key can be generated and remain inside an HSM outside of AWS. For services that live outside of AWS, they can talk directly to this external HSM as they normally would. But what about services you may still be using that reside in AWS?

With XKS, these AWS services will still make their API calls to AWS KMS; however, KMS will be aware that an External Key Store is configured, and instead forward the requests to your external HSM via an XKS proxy. This proxy is designed to translate the AWS KMS API request to a format that the external HSM expects. This setup lets you run services both inside and outside of AWS with just one location for your root keys that can remain firmly under your control.

Summary

That's just a drop in the bucket of announcements. There was also improvements to managing controls in Control Tower, Verified Access Preview (which I have yet to dig into) that aims to allow for secure remote access without a VPN, and more improvements for finding sensitive data in S3 with Macie. Hopefully I'll have time to try out each of them before reInforce sneaks up on me later this year.

MongoDB World 2022 Talk: Look Ma, No Public IP!

Tue, 16 Aug 2022 00:00:00 GMT

This is a lightning talk I gave at MongoDB World 2022.

As your application continues to grow and scale, you may choose to take advantage of powerful MongoDB Atlas features, such as multi-region clusters and sharding, in order to provide a good user experience. While these features are useful, they can also introduce complexities to your application’s architecture if configured incorrectly. In this session, we’ll look at common architectures when designing multi-region sharded clusters and walk through how MongoDB Atlas allows your team to securely connect to each of the clusters without exposing unnecessary components to the public internet.

You can watch the talk here.

Tracking Temperature and Humidity at Home with Time Series Data

Mon, 30 May 2022 00:00:00 GMT

Generating temperature and humidity time series data and using MongoDB's time series collections, charts and window functions to analyze it.

Background

Several years back, I received a Raspberry Pi 3 Model B+ kit from CanaKit alongside this book. I spent some time doing the basic stuff with it (blinking LEDs, running a Linux server, etc.) but eventually turning it into a RetroPie and installing lots of retro games on it. After a while, I lost interest in tinkering with it and just let it collect dust in my office for a number of years.

I was recently cleaning out my office and rediscovered both the old RaspberryPi and the book, and was thumbing through the projects when I noticed one that caught my eye. Listed in the book as Project 12, it's a simple temperature and humidity data logger that didn't really draw my interest a few years ago. Since then, however, I've moved to a new place where the bedrooms are essentially on the third floor (making them much warmer than the rest of the house). Because of this setup, one of the things that my wife and I constantly ask each other is how warm or cold it is in my son's nursery compared to the downstairs portions of the house. With this in mind, I decided to see if I could combine this simple Raspberry Pi project with the new Time Series collections that MongoDB offers starting with version 5.0. The idea was to make a tool that would display the current conditions of my son's nursery while also giving me the ability to show how the conditions changed throughout the day and look for patterns if I felt the need to do so.

Acquiring the Hardware

To build the temperature and humidity sensor, the book lists the following components:

Raspberry Pi (obviously)
Breadboard
DHT22 temperature and humidity sensor (can substitute DHT11 or AM2302 as well)
4.7k ohm resistor
Jumper wires

That's not an overwhelming number of components, but it's still a bunch of pieces that will make for a pretty messy package if you're trying to leave it somewhere in a discreet manner. You would have to connect GND on the Pi to the breadboard's blue rail, 3.3V on the Pi to the breadboard's red rail, and then connect the sensor to the breadboard and Pi as well, using the resistor on the 3.3V connection for DHT22 pin 2.

Instead, I found this DHT22 sensor that comes with a handy 3-pin wire that you can plug directly into the Pi. DOUT goes to GPIO (pin 4) on the Pi, VCC goes to pin 1 on the Pi, and GND goes to pin 6 on the Pi. Using the case that came with my Pi from CanaKit, I was able to enclose the Pi and just had the wires escape out the top to the sensor, greatly cleaning up the overall appearance.

Booting the Pi

This was by far the easiest part of this project. The Raspberry Pi OS (formerly known as Raspbian) gives you an easy to use Linux OS. Better yet, the Imager utility helps you set up your install so it will work exactly the way you want right out of the box. You can setup a user for SSH so once it powers up, you can connect directly to it in a headless fashion. Once you have your Pi booted up and are able to SSH into it, the real work begins.

Creating the Time Series Database

We'll be using the Time Series collections feature of MongoDB 5.0. The easiest way to get started is to spin up a free tier cluster on MongoDB Atlas and then connect in with the new mongo shell, mongosh. You'll also want to make sure you create a database user with credentials that the script can use to connect to the database, as well as add an entry on the IP Access List to allow the connection in the first place.

Once connected to your cluster through mongosh, you'll want to create the time series collection as follows:

use ts
db.createCollection("homeclimate", { 
    timeseries: { 
        timeField: "timestamp", 
        metaField: "metadata", 
        granularity: "seconds" 
    }, expireAfterSeconds: 604800 
    } )

The above command does the following:

Creates a time series collection called 'homeclimate' in the 'ts' database
Establishes a granularity of seconds for document ingestion
Tells MongoDB that the 'timestamp' field will be used to represent the time of each reading, and the 'metadata' field will hold the information used to identify where the reading is coming from (such as what sensor)
Makes documents in this collection expire after 604800 seconds or a little over 7 days, as we don't want to pay for an ever increasing data size and any data older than that is probably of low analytical value

Scripting Data Collection

The projects book linked to the Adafruit Python DHT library but if you go to the Github repo, you'll see that it's deprecated. The new version is CircuitPython which can be easily installed on the Raspberry Pi OS using pip:

pip3 install adafruit-circuitpython-dht

From there, it's time to actually start pulling data from the sensor with a continuously running script. Create a new Python script and edit it (I recommend using Visual Studio Code in Remote Mode to make this easier) to contain the following:

import time
import board
import adafruit_dht
import datetime
import pymongo

# Initial the dht device, with data pin connected to:
dhtDevice = adafruit_dht.DHT22(board.D4)

# Set up python client for MongoDB
client = pymongo.MongoClient(<MONGODB CONNECTION STRING>)
db = client.ts
collection = db.homeclimate
sensor = 1

while True:
    try:
        # Pull the values right from the sensor device
        temperature_c = dhtDevice.temperature
        temperature_f = temperature_c * (9 / 5) + 32
        humidity = dhtDevice.humidity

        # Write a document with the temperature and humidity to the time series collection
        document = {"metadata": {"sensorId": sensor, "type": "climate"},
                    "timestamp": datetime.datetime.utcnow(),
                    "temperature_fahrenheit": temperature_f,
                    "temperature_celsius": temperature_c,
                    "humidity": humidity }

        doc_id = collection.insert_one(document).inserted_id
        print("Recorded reading to document {}".format(doc_id))
        time.sleep(10)

    except RuntimeError as error:
        # Errors will happen but need to keep going. Can make up for missed readings 
        # in the Time Series collection
        print(error.args[0])
        time.sleep(2.0)
        continue
    except Exception as error:
        dhtDevice.exit()
        raise error

    time.sleep(10.0)

There's a few things going on in this script:

First, we're setting up the actual sensor by using the driver to build a device object and saying its connected on pin 4 (GPIO) on the Raspberry Pi.
We're creating a connection to our MongoDB cluster and specifying the database and collection we'll be storing data in. We also setup this sensor's identifier. Since this was my first sensor, I gave it a sensor ID of 1. Future sensors will increment this value.
In a infinite while loop, we grab the temperature and humidity values from the sensor. Since I live in the US, I also convert this temperature to farenheit and put both temperature values alongside with humidity into a MongoDB document. Since we're using a Time Series collection, I have an embedded document with my metadata values - a sensorID and a sensor type that may come in handy in the future as I scale these out. The only other value in the document is the current UTC timestamp, which is critical for helping build the time series view.
After taking the reading and writing it to the database, we wait ten seconds until we do it again. This time can be adjusted up or down depending on how granular you want your time series data to be.

If you've configured your MongoDB connection string correctly and added the entry in the IP Access List in Atlas, running this script should start writing to your Time Series collection. You should see the results after just a few seconds by going to the Collections view in Atlas:

Using MongoDB Charts to Visualize the Data

Now that the readings are coming into MongoDB, you can use the Charts feature of MongoDB Atlas to create a dashboard and display useful information and visualizations derived from the raw data. Start by clicking on the Charts tab at the top of MongoDB Atlas, then select Data Sources in the left hand menu. On that page, click the Add Data Source button in the upper right and then select your cluster. It should only have one database - ts - and one collection - homeclimate. Select those and click Finish to set up the data source.

Now that the time series collection shows up in the Data Sources list, look to the Pipeline column near the right hand side. Click the Add Pipeline button. We're going to use the $setWindowFields aggregation operator to build a window function on the time series collection; specifically, we're going to run a calculation on only the readings from the last hour.

Inside the Aggregation Pipeline Edit modal that appears, paste in the following pipeline:

[
  {$setWindowFields: {
    partitionBy: "$metadata.sensorId",
    sortBy: { timestamp: 1},
    output: {
      lastHourAverageTemp: {
        $avg: "$temperature_fahrenheit",
        window: {
          documents: [-360, 0]
        }
      },
      lastHourAverageHumidity: {
        $avg: "$humidity",
        window: {
          documents: [-360, 0]
        }
      }
    }
  }
} ]

This window function creates two new fields for us to use in MongoDB Charts - the lastHourAverageTemp and lastHourAverageHumidity. Note that I used the original Fahrenheit temperature as the source for this new average temperature field - depending on where you're located, you may want to swap in Celsius instead. Since we're collecting readings every 10 seconds in our original script, we go back 360 readings (6 readings a minute multipled by 60 minutes in an hour). Click Save when you've pasted this in.

Now click on Dashboards on the left hand navigation, then select the Add Dashboard button on the far right. Call the dashboard whatever you want. Once it's open, we'll have to add some charts. Click on Add Chart and you'll be brought to the MongoDB Charts editor. If you've never used it before, I recommend checking out the Charts documentation to check out all the features it has.

For now, we're going to add a simple chart to display the current temperature. In the upper left under data source, select ts.homeclimate which should be the only option available. Choose Text as the Chart Type, then select Top Item as want the most recent temperature reading. Now drag 'timestamp' to the Sort field placeholder, and click the sort button to the right of it to make sure it's in descending order. Drag temperature_fahrenheit to the Display field placeholder and you should now get the most recent reading's Fahrenheit temperature value. Click Save and Close at the top.

Now we can add another chart and do the same exact thing, but instead of the regular 'temperature_fahrenheit' field being used as the reading, we can use the 'lastHourAverageTemp' field that was added via the pipeline we attached to the data source. This will give us a view of the average temperature of the last hour in case the most recent temperature becomes an outlier, like if someone opens a window or starts running a heater.

Then we can add a third chart to visualize the change in temperature over time from the sensor. Add a new chart and this time, for Chart Type select Line. In the Encode tab, for the X-Axis select timestamp and turn on Binning, selecting Hour from the dropdown. For the Y-Axis select temperature_fahrenheit, and under the Aggregate dropdown select Mean. The combination of these two settings will combine all of an hour's readings into one group and take the average to represent that hour on the line chart. Finally, under Series we can select sensorId under metadata. While we only have one sensor currently, if we add more sensors in the future, this will show different lines for each sensor.

Now on the Filter tab, drag timestamp in as the filter and select Period, Previous, and 3 Day to limit the chart to only consider the previous three days worth of data. Again, you can adjust this period up or down to suit your needs.

If you save and close, you'll have a nice dashboard of three charts showing you some basic temperature inforamtion from your sensor. As your sensor continues writing data, the line chart will become more useful as well. You can repeat this process to create three charts with the humidity data as well to give you a full representation of readings from the sensor.

Building Event Driven Experiences with MongoDB Realm and AWS

Sun, 20 Mar 2022 00:00:00 GMT

Most developers by now are familiar with MongoDB, a NoSQL database that stores data as documents instead of rows. MongoDB's fully managed service product, MongoDB Atlas, comes with MongoDB Realm, which is a set of services that help facilitate mobile and web development by providing a scalable, serverless backend for your application. Realm offers a lot of services (more than we can cover in just this post), but today I wanted to focus on how to use two of them in tandem to help connect a MongoDB Atlas database into a distributed, event driven architecture that can be built on AWS.

Pre-requisites

Before we get started, there are a couple of things you should have setup already. These include:

An AWS account with an IAM user created, and an access key created for that IAM user. The user should have permissions to create, edit, and retrieve parameters from AWS Systems Manager and put objects, list buckets, and get objects from S3.
A MongoDB Atlas account. It is free to sign up, and there is a free tier that you can use to follow along with this post. The Community version of MongoDB does not include MongoDB Realm, so this post does not apply to it.

Emulating an application environment

We're going to have to setup a few things to make this look and feel like an environment a real application is using. Most modern applications are not just writing data to a database and reading it back; rather, they are consuming events from/publishing events to a message queue or stream, interacting with cloud storage, and relying on slight changes to data to kick off workflows. To keep things simple, our goal will be to monitor our database for new documents that are inserted and to upload a copy of them to S3 at the exact moment they are written to the database. In our theoretical application, we are using the copy of the document to do some additional processing or analysis with an AWS service that is reading from S3.

Set up MongoDB Cluster

Log into your MongoDB Atlas account. Create a project to act as the container for your cluster, then create a cluster. If you want to use the free tier, choose the Shared category at the top of the Create Cluster page. Use a recommended, free-tier region such as us-east-1 or us-east-2 and then scroll down select the M0 Sandbox Cluster Tier. Change the name of the cluster from Cluster0 to Sandbox and click on Create Cluster.

Set up the S3 Bucket

While the MongoDB cluster is creating, head over to the AWS console and login as your IAM user. Go to Amazon S3 in the AWS console and choose Create bucket. Give it a name like {yourname}-event-bucket and select the same region that you created your MongoDB Atlas cluster in. Leave the other options as the default selections and click Create bucket at the bottom of the page.

Set up parameters in SSM

In the AWS console, search for AWS Systems Manager (SSM) and navigate to the service page. Once there, select Parameter Store on the left hand navigation, under Application Management. Select Create Parameter.

On the Create parameter page, enter event-target-bucket as the name for the parameter. Leave the Tier as Standard, the Type as String, and the Data type as text. Under value, enter the name of the S3 bucket you created in the previous step.

Configuring MongoDB Atlas and Realm

Load Data Into Cluster

Return to the MongoDB Atlas console. When your sandbox is finished provisioning, click the ... button and choose Load Sample Dataset. This will give us a few databases and collections of data to use for our testing purposes.

Set up Realm App

While that data is being loaded, click on the Realm tab near the top of the screen. We want to select Create a New App and name it AWS-Event-App. Under "Link your Database", select Use an existing MongoDB Atlas Data Source and choose the Sandbox cluster you created. Then click Create Realm Application.

Set up Realm Values and Secrets

In order to hook into our AWS account and use the bucket and the parameter we just set up, we have to authenticate to the account from MongoDB Realm. Since we'll be using our AWS IAM user's access key and secret key, we don't want to hardcode it into any code we end up writing or reading from source control. Therefore, we will use Realm's Values feature to securely store these values.

Realm's Values feature allows you to store two types of values - regular values and secrets. Secrets cannot be directly accessed by the Realm API; instead, they must be mapped to their own regular value in order to be able to be retrieved. This prevents you from inadvertantly exposing a secret you did not intend to.

Click on Values in the left hand navigation of your Realm app. Click on Create New Value. For the value's name, enter AccessKeyID. Choose Value as the type, then for Content select Custom Content and paste in your Access Key wrapped in double quotations to mark it as a string. Then click on Save Draft.

Click on Create New Value again. Enter SecretAccessKeySecret as the value name, and this time select Secret under Type. For the value of secret, paste in your AWS IAM user's Secret Access Key, then click Save Draft.

We've entered the Secret Access Key, but we cannot directly access it via the Realm API since it is a secret. We now have to create a value and map it to that secret. Click on Create New Value a third time. Enter SecretAccessKey and select Value as the type. Under Add Content, choose Link to Secret and select the SecretAccessKeySecret you just created. Click Save Draft.

We have one more value to set up before we're done. Click on Create New Value again, and this time enter Region as the value name. Leave Type as Value and Add Content as Custom Content. In the content field, enter the AWS region where you set up your S3 bucket and SSM parameter, again wrapped in double quotations. Then click Save Draft.

You may ask why we bothered with SSM at all to store the parameter of the S3 bucket name. It is true that we could also store the bucket name here in Realm's values; however, for the scope of the post we will assume the application environment has already been using SSM for parameters such as these, and therefore it will be less work for us to leave it that way.

Create Realm Functions

Now we are ready to start writing some code. First, let's bring in the AWS SDK which we will use in our code to make calls to the specific AWS services we will use. Select Functions in the left hand navigation then click the Dependencies tab at the top of the page. Select Add Dependency. On the modal that appears, enter aws-sdk as the package name. You can leave the version blank, but if you run into issues, you may want to come back and change the version to 2.737.0, which is what I used to write this post. When you're done, click Add.

The aws-sdk node library should be added as a dependency. Now click Create New Function button in the upper right. In the Add Function page, enter MoveDocToQueue as the function and for now choose System under Authentication. Click on the Function Editor tab at the top, and paste this code in over what is currently there.

exports = async function(event){
  
  const AWS = require('aws-sdk');
  
  const config = {
    accessKeyId: context.values.get("AccessKeyID"),
    secretAccessKey: context.values.get("SecretAccessKey"),
    region: context.values.get("Region")
  };
  
  const SSMparams = {
    Name: 'event-target-bucket',
    WithDecryption: false
  };
  
  const doc = JSON.stringify(event.fullDocument);
  let SSM = new AWS.SSM(config);
  
  const ssmPromise = await SSM.getParameter(SSMparams).promise();
  bucketName = ssmPromise.Parameter.Value;

  const S3params = {
    Bucket: bucketName,
    Key: "queue-" + event.fullDocument._id,
    Body: doc
  };
  
  let S3 = new AWS.S3(config);
  
  const s3Promise = S3.putObject(S3params).promise();
  s3Promise.then(function(data) {
    console.log('Put Object Success');
  }).catch(function(err) {
    console.log(err);
  });
};

Normally with Realm functions, you can click on Run near the bottom right to test out the function. However, since this function will require a change event in the database to run, we will need to map it to a trigger first and modify some data to test it out. For now, click on Save Draft.

We now have the actual code we want to run to move documents to our S3 bucket. But how do we get it to run whenever we insert data? The answer is a Realm Trigger. Select Triggers on the left hand navigation, then add a new trigger.

On the Add Trigger page, enter moveDocToQueueTrigger as the name, then select your Sandbox cluster under Trigger Source Details. For database, select sample_mflix and for collection select movies. Under Operation type, ensure Insert is checked as we want this trigger to fire on every new document inserted into the sample_mflix.movies collection. Keep scrolling down and ensure that Full Document is turned to ON. Under Select an Event Type, choose Function and select the MoveDocToQueue function that we just created. Then click Save.

Deploying and Testing

We are now ready to try our first test! On the blue banner at the top of the page, you can now choose to Deploy the changes to your Realm app. The values, secrets, function, and trigger should now be ready to go. Open up a connection to your MongoDB cluster and insert a new document into the sample_mflix.movies collection. You can do this via the mongo shell, a MongoDB driver, MongoDB Compass, or just by using the Data Explorer built into Atlas to duplicate an existing document in that collection. Keep in mind that if you are using the shell, a driver, or Compass, you will need to add your current IP address to the IP Access list in your Atlas project.

After you have inserted a new document, return to MongoDB Realm and select Logs in the left hand navigation. You should an entry in the History showing when your trigger was invoked. Expanding the entry will show you the logs from that particular function - if you set up your values and secrets correctly, you should see the log message "Put Object Success", meaning that our attempt to put the document into the S3 bucket appears to be successful. (If you do not see this output, double check your values and secrets, your IAM user permissions, and your S3 and SSM configuration.)

Let's double check that this actually sent the document to S3 by going back to our AWS console and navigating to the S3 service page. Find the bucket that you created at the beginning of this process and open it up. You should see an object in there starting with queue- and then the _id value of the document you inserted. The object is named this way because we set this as the Key value when setting up the S3 parameters in our Realm function. If you see the document, then your setup is successful!

You can continue experimenting by bringing in other AWS services as well, such as sending the document to a Kinesis data stream instead of just an S3 bucket.

Insecure By Design

Building Out The HomeLab: Photos with Immich

Requirements​

Setting Up Immich​

Importing Existing Photos​

Immich's Jobs​

Public Sharing​

Backups​

Conclusion​

Building Out The HomeLab: Proxmox and Tailscale

Choosing a Mini PC​

Proxmox​

Tailscale​

Installation​

Configuration​

IP Forwarding​

Host Networking​

Advertise Subnet Routes​

Wrap-Up​

Rethinking My Personal Tech Stack

Email​

Photos​

Browser​

Files​

Notes​

Search​

Operating System​

Refreshing the List

About the Ruby Central Security Incident

The Incident​

An Incomplete Post-Mortem​

The Other Side of the Story​

Conclusion​

Recapping the Biggest Pre:Invent Announcements

Centrally Managing Root Access for Customers Using AWS Organizations​

Block Public Access for Amazon Virtual Private Cloud​

Resource Control Policies​

Scaling to 0 capacity with Aurora Serverless v2​

An Easier Way to Enable IMDS Defaults Across All Regions

Introduction​

A New Default​

Google's Professional Cloud Security Engineer Certification

Introduction​

Preparing for the Exam​

Sitting the Exam​

Results​

Automatically tagging resources in AWS with Owner Information

Background​

Researching Previous Work​

The Auto Tagging Solution, Explained​

Modifying the AWS Example​

Applying the Service Control Policy​

Results and Additional Considerations​

Building on GCP

What I Like More on GCP​

1. The Organizaton-Project-Folder Hierarchy​

2. Almost Every Aspect of IAM​

3. Less Tedious Networking (For the Most Part)​

4. Focus on CLI commands, Even in the Console​

What I Like More on AWS​

1. No domain requirement​

2. APIs are Already Enabled​

3. No Service Account/Service Agents Confusion​

4. The ARN​

5. Private Service Connect​

Conclusion​

AWS re:Invent 2022 Security Recap

Amazon Security Lake - Preview​

CloudWatch Logs Sensitive Data Protection​

CloudWatch Cross-Account Observability​

VPC Lattice - Preview​

KMS External Key Store​

Summary​

MongoDB World 2022 Talk: Look Ma, No Public IP!

Tracking Temperature and Humidity at Home with Time Series Data

Background​

Acquiring the Hardware​

Booting the Pi​

Creating the Time Series Database​

Scripting Data Collection​

Requirements

Setting Up Immich

Importing Existing Photos

Immich's Jobs

Public Sharing

Backups

Conclusion

Choosing a Mini PC

Proxmox

Tailscale

Installation

Configuration

IP Forwarding

Host Networking

Advertise Subnet Routes

Wrap-Up

Email

Photos

Browser

Files

Notes

Search

Operating System

The Incident

An Incomplete Post-Mortem

The Other Side of the Story

Conclusion

Centrally Managing Root Access for Customers Using AWS Organizations

Block Public Access for Amazon Virtual Private Cloud

Resource Control Policies

Scaling to 0 capacity with Aurora Serverless v2

Introduction

A New Default

Introduction

Preparing for the Exam

Sitting the Exam

Results

Background

Researching Previous Work

The Auto Tagging Solution, Explained

Modifying the AWS Example

Applying the Service Control Policy

Results and Additional Considerations

What I Like More on GCP

1. The Organizaton-Project-Folder Hierarchy

2. Almost Every Aspect of IAM

3. Less Tedious Networking (For the Most Part)

4. Focus on CLI commands, Even in the Console

What I Like More on AWS

1. No domain requirement

2. APIs are Already Enabled

3. No Service Account/Service Agents Confusion

4. The ARN

5. Private Service Connect

Conclusion

Amazon Security Lake - Preview

CloudWatch Logs Sensitive Data Protection

CloudWatch Cross-Account Observability

VPC Lattice - Preview

KMS External Key Store

Summary

Background

Acquiring the Hardware

Booting the Pi

Creating the Time Series Database

Scripting Data Collection

Using MongoDB Charts to Visualize the Data

Pre-requisites

Emulating an application environment

Set up MongoDB Cluster

Set up the S3 Bucket

Set up parameters in SSM

Configuring MongoDB Atlas and Realm

Load Data Into Cluster

Set up Realm App

Set up Realm Values and Secrets

Create Realm Functions

Deploying and Testing