Git

NPM: No user exists for uid 1000

I thought I would write a quick post about this issue, as I’ve encountered it several times. Note, this user could be any ID, not just 1000 or 1001, etc. — it all depends on what user has launched your build container from your deployment software.

The issue: When performing a build step with npm in a Docker container, it throws this error on git checkouts:

npm ERR! Error while executing:
npm ERR! /usr/bin/git ls-remote -h -t ssh://[email protected]/repo.git
npm ERR! 
npm ERR! No user exists for uid 1000
npm ERR! fatal: Could not read from remote repository.
npm ERR! 
npm ERR! Please make sure you have the correct access rights
npm ERR! and the repository exists.
npm ERR! 
npm ERR! exited with error code: 128
 
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2023-05-08T19_50_34_229Z-debug.log

What was equally frustrating was that in testing the same command in the same container locally (instead of inside our deployment tools), it had no issues.

What causes this issue?

The crux of the issue is this: When NPM/node is trying to checkout from git, it uses the permissions of the node_modules or package.json configurations to determine which user should be used to pull git packages.

When you’re mounting the Docker container to your build/deploy tool, the user owning the files there might not exist in your container. And it also might not be the user that you want to be checking out the files either! By default, like it or not, Docker logs you into the container as “root” user.

So to summarize:

Docker logs you in as root to perform the npm build commands
The files you’ve mounted into the container might be owned by a user that only exists on the deployment server and not inside the Docker container
NPM defaults to use the owner of the node_modules files to choose which user it should use to perform git/ssh commands
This results in the error that the user does not exist

The fix

The fix in this case is just to perform a “chown” of the files you’ve mounted from the deployment server prior to running your npm build commands.

For example, given the scenario that I’ve mounted my files to /source on the container, and my build files are now inside /source/frontend:

$ chown -R root:root /source/frontend/; cd /source/frontend && npm run build

You can replace the path and the npm command with whatever your npm build script is for your own environment. The important part is the change of the ownership at the beginning of the command.

Have you had issues with this error in NPM? Have experiences you want to share? Feel free to leave a comment, or contact me.

SSH: You Don’t Exist!

If you’ve ever been told you don’t exist by a software package, you…

Might be a DevOps Engineer
Might also start questioning your life decisions

Over the holidays we have been in the process of adding a new git repository as an NPM package to our build step for a number of projects. This can be problematic in a pipeline, since for git you also have to manage things like SSH keys, and in a pipeline most often the environment may be obscured to where you can’t really add them.

This situation caused a number of errors in our build step that we (incorrectly) assumed were caused by a bad underlying server. Turns out… we just played ourselves.

The authenticity of host ‘(your git host)’ can’t be established.

The first of this comedy of errors came from this wonderful prompt. You’ve probably seen it before, the first time you connect to a new git host. Usually you can just say “Yes, git overlords, I accept your almighty fingerprint” and we all move on with our lives. But in a container in our build step, it’s not an interactive shell. So instead, it just hangs there forever until someone wonders why that deploy never happened, and checks on it.

After only 94 deploy attempts in an effort to figure this out, we finally realized two things:

The npm install was taking place in a cached build step (that our deploy system conveniently placed at the very bottom of the configuration page instead of, you know, before the build steps).
All our attempts to fix the issue were being placed in the actual build step which takes place after the npm install and were therefore fruitless.

Anyways, once we figured that simple piece of wisdom out, we were able to resolve it by adding this line before the npm install:

mkdir -p -m 0600 ~/.ssh && ssh-keyscan <your git host> > ~/.ssh/known_hosts

Could not create leading directories

The next error we encountered was something that was probably changed on the underlying server but we can’t be certain. All of a sudden, public git packages started giving an error because git couldn’t write to a cache/tmp directory — the intermediary directories didn’t exist first.

[?25hnpm ERR! code 128
npm ERR! Command failed: git clone --depth=1 -q -b v1.4.1 git://github.com/hubspot/messenger.git /root/.npm/_cacache/tmp/git-clone-2e2bbd46
npm ERR! fatal: could not create leading directories of '/root/.npm/_cacache/tmp/git-clone-2e2bbd46': Permission denied

The issue in this case was that the user wasn’t able to create this new directory for the git clone, because the parent directory(ies) didn’t exist, or the user didn’t have permissions to write to them. As this wasn’t an issue before, we believe the issue was that the directory permissions for executables on the underlying server. Ultimately what fixed it was changing the prefix for npm to somewhere that both exists and is writeable:

npm config set prefix /usr/local

You don’t exist, go away!

And finally, this supremely unhelpful error. In doing some research, this is actually an SSH error. It occurs when you’re trying to SSH as a user ID that doesn’t exist. So like, I guess it makes sense in that exact situation. But, our user is “root” and it definitely exists. If it didn’t, this whole environment would probably collapse in on itself.

[?25hnpm ERR! Error while executing:output npm ERR! /usr/bin/git ls-remote -h -t <your git host>
npm ERR!
npm ERR! You don't exist, go away!
npm ERR! fatal: The remote end hung up unexpectedly
npm ERR!
npm ERR! exited with error code: 128

This error presented itself when trying to install a private git repository as a NPM package for the first time (for this particular app and container).

After about 59 tries to figure out what exactly was wrong with the user, container, and anything else in the environment, we finally noticed something different in this project’s package.json file — it was doing the npm install with the “global” -g flag. Thinking back to the last issue, I decided to try to change the prefix (which I had already tried, and it didn’t help), but this time with the -g flag as well.

npm config set -g prefix /usr/local

Like magic, it worked.

Conclusion

Build steps can be a frustrating troubleshooting environment. When you don’t have access to the server itself, it can be cumbersome and loud to try to find the cause of errors. And, those errors don’t always present themselves in the same way. Most of these errors did not occur when testing from the same container on local. And, many of these errors produced little to no results in doing a google search. I hope this article helps some weary DevOps souls out there! Feel free to comment with other weird build step issues you’ve encountered as well, or contact me.

Adding version control to an existing application

Most of us begin working on projects, websites, or applications that are already version controlled in one way or another. If you encounter one that’s not, it’s fairly easy to start from exactly where you are at the moment by starting your git repository from that point. Recently, however, I ran into an application which was only halfway version controlled. By that I mean, the actual application code was version controlled, but it was deployed from ansible code hosted on a server that was NOT version controlled. This made the deploy process frustrating for a number of reasons.

If your deploy fails, is it the application code or the ansible code? If the latter, is it because something changed? If so, what? It’s nearly impossible to tell without version control.
Not only did this application use ansible to deploy, it also used capistrano within the ansible roles.
While the application itself had its own AMI that could be replicated across blue-green deployments in AWS, the source server performing the deploy did not — meaning a server outage could mean a devastating loss.
Much of the ansible (and capistrano) code had not been touched or updated in roughly 4 years.
To top it off, this app is a Ruby on Rails application, and Ruby was installed with rbenv instead of rvm, allowing multiple versions of ruby to be installed.
It’s on a separate AWS account from everything else, adding the fun mystery of figuring out which services it’s actually using, and which are just there because someone tried something and gave up.

As you might imagine, after two separate incidents of late nights trying to follow the demented rabbit trail of deployment issues in this app, I had enough. I was literally Lucille Bluth yelling at this disaster of an app.

Do you ever just get this uncontrollable urge to take vengeance for the time you’ve lost just sorting through an unrelenting swamp of misery caused by NO ONE VERSION-CONTROLLING THIS THING FROM THE BEGINNING? Well, I did. So, below, read how I sorted this thing out.

Start with the basics

First of all, we created a repository for the ansible/deployment code and put the existing code on this server in place. Well, kind of. It turns out there were some keys and other secure things that shouldn’t be just checked into a git repo willy-nilly, so we had to do some strategic editing.

Then I did some mental white-boarding, planning out how to go about this metamorphosis. I knew the new version of this app’s deployment code would need a few things:

Version control (obviously)
Filter out which secure items were actually needed (there were definitely some superfluous ones), and encrypt them using ansible-vault.
Eliminate the need for a bastion/deployment server altogether — AWS CodeDeploy, Bitbucket Pipelines, or other deployment tools can accomplish blue-green deployments without needing an entirely separate server for it.
Upgrade the CentOS version in use (up to 7 from 6.5)
Filter out unnecessary work-arounds hacked into ansible over the years (ANSIBLE WHAT DID THEY DO TO YOU!? :sob:)
Fix the janky way Passenger was installed and switch it from httpd/apache as its base over to Nginx
A vagrant/local version of this app — I honestly don’t know how they developed this app without this the whole time, but here we are.

So clearly I had my work cut out for me. But if you know me, you also know I will stop at nothing to fix a thing that has done me wrong enough times. I dove in.

Creating a vagrant

Since I knew what operating system and version I was going to build, I started with my basic ansible + vagrant template. I had it pull the regular “centos/7” box as our starting point. To start I was given a layout like this to work with:

+ app_dev
  - deploy_script.sh
  - deploy_script_old.sh
  - bak_deploy_script_old_KEEP.sh
  - playbook.yml
  - playbook2.yml
  - playbook3.yml
  - adhoc_deploy_script.sh
  + group_vars
    - localhost
    - localhost_bak
    - localhost_old
    - localhost_template
  + roles
    + role1
      + tasks
        - main.yml
      + templates
        - application.yml
        - database.yml
    - role2
      + tasks
        - main.yml
      + templates
        - application.yml
        - database.yml
    - role3
      + tasks
        - main.yml
      + templates
        - application.yml
        - database.yml

There were several versions of old vars files and scripts leftover from the years of non-version-control, and inside the group_vars folder there were sensitive keys that should not be checked into the git repo in plain text. Additionally, the “templates” seemed to exist in different forms in every role, even though only one role used it.

I re-arranged the structure and filtered out some old versions of things to start:

+ app_dev
  - README.md
  - Vagrantfile
  + provisioning
    - web_playbook.yml
    - database_playbook.yml
    - host.vagrant
    + group_vars
      + local
        - local
      + develop
        - local
      + staging
        - staging
      + production
        - production
    + vaulted_vars
      - local
      - develop
      - staging
      - production
    + roles
      + role1
        + tasks
          - main.yml
        + templates
          - application.yml
          - database.yml
      - role2
        + tasks
          - main.yml
      - role3
        + tasks
          - main.yml
  + scripts
    - deploy_script.sh
    - vagrant_deploy.sh

Inside the playbooks I lined out the roles in the order they seemed to be run from the deploy_script.sh, so they could be utilized by ansible in the vagrant build process. From there, it was a lot of vagrant up, finding out where it failed this time, and finding a better way to run the tasks (if they were even needed, as often times they were not).

Perhaps the hardest part was figuring out the capistrano deploy part of the deploy process. If you’re not familiar, capistrano is a deployment tool for Ruby, which allows you to remotely deploy to servers. It also does some things like keeping old versions of releases, syncing assets, and migrating the database. For a command as simple as bundle exec cap production deploy (yes, every environment was production to this app, sigh), there was a lot of moving parts to figure out. In the end I got it working by setting a separate “production.rb” file for the cap deploy to use, specifically for vagrant, which allows it to deploy to itself.

# 192.168.67.4 is the vagrant webserver IP I setup in Vagrant
role :app, %w{192.168.67.4}
role :primary, %w{192.168.67.4}
set :branch, 'develop'
set :rails_env, 'production'
server '192.168.67.4', user: 'vagrant', roles: %w{app primary}
set :ssh_options, {:forward_agent => true, keys: ['/path/to/vagrant/ssh/key']}

The trick here is allowing the capistrano deploy to ssh to itself — so make sure your vagrant private key is specified to allow this.

Deploying on AWS

To deploy on AWS, I needed to create an AMI, or image from which new servers could be duplicated in the future. I started with a fairly clean CentOS 7 AMI I created a week or so earlier, and went from there. I used ansible-pull to checkout the correct git repository and branch for the newly-created ansible app code, then used ansible-playbook to work through the app deployment sequence on an actual AWS server. In the original app deploy code I brought down, there were some playbooks that could only be run on AWS (requiring data from the ansible ec2_metadata_facts module to run), so this step also involved troubleshooting issues with these pieces that did not run on local.

After several prototype servers, I determined that the AMI should contain the base packages needed to install Ruby and Passenger (with Nginx), as well as rbenv and ruby itself installed into the correct paths. Then the deploy itself will install any additional packages added to the Gemfile and run the bundle exec cap production deploy, as well as swapping new servers into the ELB (elastic load balancer) on AWS once deemed “healthy.”

This troubleshooting process also required me to copy over the database(s) in use by the old account (turns out this is possible with the “Share” option for RDS snapshots from AWS, so that was blissfully easy), create a new Redis instance, copy over all the s3 assets to a bucket in the new account, and create a Cloudfront instance to serve those assets, with the appropriate security groups to lock all these services down. Last, I updated the vaulted variables in ansible to the new AMIs, RDS instances, Redis instances, and Cloudfront/S3 instances to match the new ones. After verifying things still worked as they should, I saved the AMI for easily-replicable future use.

Still to come

A lot of progress has been made on this app, but there’s more still to come. After thorough testing, we’ll need to switch over the DNS to the new ELB CNAME and run entirely from the new account. And there is pipeline work in the future too — whereas before this app was serving as its own “blue-green” deployment using a “bastion” server of sorts, we’ll now be deploying with AWS CodeDeploy to accomplish the same thing. I’ll be keeping the blog updated as we go. Until then, I can rest easy knowing this app isn’t quite the hot mess I started with.

Rewriting history — git history, that is

If you’ve ever worked with a team of contractors in software development, you may notice some of their (or even your!) commits are made under the incorrect email address or username. If you need to clean things up for compliance or maybe just your own sanity, turns out there’s a fairly easy way to rewrite any of those invalid authors’ bad commits.

Gather your list(s)

Start by checking out all branches of your repository on your local machine — we’ll need to scan them for invalid authors. Here’s a nifty shell script from user octasimo on github that does just that:

#!/bin/bash

for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master `; do

   git branch --track ${branch#remotes/origin/} $branch

done

Once all branches are checked out, we’ll need to scan them for invalid authors. Use this command to get a list of those emails:

git log --all --format='%cE' | sort -u

Now you can sort that list to show only the invalid ones. For example, if my corporate company email is “@company.com” I can use grep to sort out the bad ones like so:

git log --all --format='%cE' | sort -u | grep -v '@company.com'

Now you have a list of all the invalid emails that have been used to commit changes to your repository. From there, make a list determining which proper (corporate) email should have been used.

Pipe the list into the script

For the rewriting of these emails, you can use the script provided by Github themselves. This script asks you for an “old email” (the list of bad emails we found above) and a “correct email” — the email that should have been used to make the commit.

#!/bin/sh

git filter-branch --env-filter '
OLD_EMAIL="[email protected]"
CORRECT_NAME="Your Correct Name"
CORRECT_EMAIL="[email protected]"

if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_COMMITTER_NAME="$CORRECT_NAME"
    export GIT_COMMITTER_EMAIL="$CORRECT_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_AUTHOR_NAME="$CORRECT_NAME"
    export GIT_AUTHOR_EMAIL="$CORRECT_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags

To use this script, just insert the invalid email as the value of the “OLD_EMAIL” variable, the correct name for this user as the value for the “CORRECT_NAME” variable, and the corporate email as the value for the “CORRECT_EMAIL” variable.

Quick tip: if you have to rewrite multiple emails as I did, you will want to change that first line to git filter-branch -f

Once you’ve run the script for each email that needs to be rewritten, push your changes to the repository with git, and all should be cleaned up!

Git