Linux

Resolving 502, 503, and 504 errors

If you’ve ever run into 502, 503, or 504 errors (we’ll refer to them as 50x errors from here on out), you probably know how frustrating they can be to troubleshoot. Learn what each means in this article!

What are 50x errors?

Let’s start at the beginning: what does a 50x error mean? According to the HTTP Status Code guide, here’s what each translates to:

502 Bad Gateway: “the server, while acting as a gateway or proxy, received an invalid response from the upstream server“

503 Service Unavailable: “the server is not ready to handle the request.”

504 Gateway Timeout: “the server, while acting as a gateway or proxy, cannot get a response in time.”

Those descriptions, unfortunately, aren’t specific enough to be very useful to most users.

Here’s how I would describe these errors:

A service (whichever service first received the request) attempted to forward it on to somewhere else (proxy/gateway), and didn’t get the response it expected.

The 502 error

In the case of a 502 error, the service which forwarded the request onward received an invalid response. This could mean that the request was killed on the receiving service processing the request, or that the service crashed while processing it. It could also mean it received another response that was considered invalid. The best source to look would be in the logs of the service that received the request (nginx, varnish, etc), and the logs of the upstream service to which it is proxying (apache, php-fpm, etc).

For example, in a current server setup I am managing, I have nginx sitting as essentially a “traffic director” or “reverse proxy” that receives traffic first on the server. It then forwards the request to backend processing service for PHP called php-fpm. When I received a 502 error, I saw an error like this in my nginx error logs:

2019/01/11 08:11:31 [error] 16467#0: *7599 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.0.0.1, server: localhost, request: "GET /example/authenticate?code=qwie2347jerijowerdsb23485763fsiduhwer HTTP/1.1", upstream: "fastcgi://unix:/run/php-fpm/www.sock:", host: "example.com"

This error tells me that nginx passed the request “upstream” to my php-fpm service, and did not receive response headers back (i.e. the request was killed). When looking in the php-fpm error logs, I saw the cause of my issue:

[11-Jan-2019 08:11:31] WARNING: [pool www] child 23857 exited on signal 11 (SIGSEGV) after 33293.155754 seconds from start
[11-Jan-2019 08:11:31] NOTICE: [pool www] child 20246 started

Notice the timestamps are exactly the same, confirming this request caused php-fpm to crash. In our case, the issue was corrupted cache – as soon as the cache files were cleared, the 502 error was gone. However, often times you will need to enable core dumps or strace the process to diagnose further. You can read more about that in my article on Segmentation Faults.

A 502 error could also mean the upstream service killed the process (a timeout for long processes, for example), or if the request is proxying between servers, that the destination server is unreachable.

The 503 error

A 503 error most often means the “upstream” server, or server receiving the request, is unavailable. I’ve most often experienced this error when using load balancers on AWS and Rackspace, and it almost always means that the server configured under the load balancer is out of service.

This happened to me once or twice when building new servers and disabling old ones, without adding the new configuration to the load balancer. The load balancer, with no healthy hosts assigned, receives a 503 error because it could not forward the request to any host.

Luckily this error is easily resolved, as long as you have the proper access to your web management console to edit the load balancer configuration! Simply add a healthy host into the configuration, save, and your change should take effect pretty quickly.

The 504 error

Last but not least, a 504 error means there was a gateway timeout. For services like Cloudflare, which sit in front of your website, this often means that the proxy service (Cloudflare in this example) timed out while waiting for a response from the “origin” server, or the server where your actual website content resides.

On some web hosts, it could also mean your website is receiving too much concurrent traffic. If you are using a service like Apache as a backend for PHP processing, it’s likely you have limited threading capabilities, limiting the number of concurrent requests your server can accommodate. As a result, requests left waiting to be processed could be kicked out of queue, resulting in a 504 error. If your website receives a lot of concurrent traffic, using a solution like nginx with php-fpm is ideal in that it allows for higher concurrency and faster request processing. Introducing caching layers is another way to help requests process more quickly as well. In this situation, note that 504 errors will likely be intermittent as traffic levels vary up and down on your website.

Last, checking your firewall settings is another good step. If the “upstream” service is rejecting the request or simply not allowing it through, it could result in a networking timeout which causes a 504 error for your users. Note that in this scenario, you would see the 504 error consistently rather than intermittently as with high traffic.

Conclusion

To wrap things up, remember that a 50x error indicates that one service is passing a request “upstream” to another service. This could mean two servers talking to each other, or multiple services within the same server. Using the steps above will hopefully help in guiding you to a solution!

Have you encountered other causes of these errors? Have any feedback, suggestions, or tips? Let me know in the comments below, or contact me.

Writing a basic cron job in Linux

Cron jobs are one of the common ways developers schedule repeated tasks on a Linux system. A cron job is a simple way to schedule a script to run on your system on a regular basis. Writing a cron job happens in three basic steps:

Write the script you wish to execute (e.g. mycron.php)
Determine how often you’d like the script to run
Write a crontab for it

Writing the script

Start by writing your script or command. It can be in any language you like, but most often crons are written in bash or PHP. Create your cron.txt or cron.sh file if needed. Here’s a basic PHP file that just prints “hello world” in stdout (standard out):

<?php
echo "hello world";
?>

Or a bash script that does the same thing:

!/usr/bin/env bash
echo "hello world";

Or if you want to have some fun, on a Mac you can use the say command to have your computer speak a line instead.

say -v fred "I love Linux"

You can save the say command as a bash file (cron.sh for example) and invoke it with bash.

Save your file as the appropriate file extension, then move on to the next step: determining how often the command should run.

Determining cron schedule

Now that you have your script written, it’s time to make some decisions about how frequently you wish the script to be executed.

Crontab allows you the freedom to run the script or command every minute, hour, day, month, or any combination thereof. If you want the script to be run at 5pm every 3rd Thursday of the month, or only Wednesday through Friday at 10am daily, that’s possible with cron.

When it comes to deciding a cron schedule, less is more. By that I mean, for the health of your server it’s best to run it as infrequently as possible to maximize your server resources. Keep in mind that the more resource intensive your script is, the more possible it is that your crons could be disrupting to the experience of your end users.

The basic format of a cron is:

* * * * * command to run

Each of those asterisks has a particular unit of time it represents, however.

minute (0-59) of the hour you wish the command to execute. * means every minute of the hour.
hour (0-23) of the day you wish the command to execute. * means every hour of the day.
day (1-31) of the month you wish the command to execute. * means every day of the month.
month (1-12) of the year you wish the command to execute. * means every/any month of the year.
day of the week (0-6) you wish the command to execute. * means any day of the week.

Using these guidelines you can format your schedule. Below are some examples:

# every minute of every hour/day/month/year
* * * * *

# every hour, on the hour
0 * * * * 

# every day at 10am
0 10 * * *

# every Thursday at midnight
0 0 * * 4

# every Monday, Wednesday, and Friday at 3:30pm
30 15 * * 1,3,5

If you want to check your syntax, there’s a handy website called crontab.guru which can help you test out some schedule combinations.

Writing the crontab

The crontab command is how scheduled cron jobs are executed. With crontab we will take the script and the schedule and combine them to actually execute the script or command on the schedule you’d like.

Start by listing any existing cron schedules on your machine with crontab -l

If you’ve never written a cron before, you’ll probably see the following:

$ crontab -l
crontab: no crontab for [yourusername]

Now to edit the crontab, we’ll use crontab -e. This will automatically put you in the vim editor for Linux, so here’s a brief tutorial if you’ve never used it before.

Hit i to enter “insert” mode and copy your schedule first. Then you can either type the command directly after. If your cron script is written in PHP and is located in /var/www/html/cron.php you can use the syntax below:

* * * * * /usr/bin/php /var/www/html/cron.php

Or if you want to use the command right in crontab, you can do that too:

* * * * * echo "I love Linux" >> /var/log/cron-stdout.log

Notice in the command above I also added a log file to store the stdout results of the command. This is an option as well and can help you ensure your cron is running as scheduled!

And of course, if you wanted to prank a coworker when they leave their Mac unlocked at work:

* * * * * say -v fred "I am buying nachos for everyone at 5pm"

Once you’ve finished editing, hit the esc key to exit “insert” mode, then type wq! to force write and quit the file. You should see a message, “crontab: installing new crontab” after saving.

… And that’s it! Now your cron will execute on your schedule. Before we part ways, a few words of caution:

Most managed web hosts don’t allow you to edit system crons. Most times this requires root SSH access on the Linux machine.
If you’re using WordPress, you can easily get around this restraint by using WordPress cron, and managing it using a simple tool like WP Crontrol.
Cron is a word that’s easy to mistype. Pro-tip: nobody knows what you want when you’re asking about “corn jobs” 😉

How about you? Any pro-tips for those looking to create a cron job? Leave a comment, or Contact Me.

How the InnoDB Buffer Pool Allocates and Releases Memory

As you may know or have noticed, Memory utilization can be a difficult to truly understand. While tools like free -m can certainly help, they aren’t necessarily a true indication of health or unhealth. For example, if I see 90% Memory utilization, that’s not exactly an indication that it’s time to add more resources. The nature of Memory is to store temporary data for faster access the next time it is needed, and because of this, it tends to hold onto as much temporary data as it can, until it needs to purge something out to make more space.

About the InnoDB Buffer Pool

InnoDB (a table storage engine for MySQL), has a specific pool of Memory allocated to MySQL processes involving InnoDB tables called the InnoDB Buffer Pool. Generally speaking, it’s safest to have an InnoDB Buffer Pool at least the same size as your database(s) on your server environment to ensure all tables can fit into the available Memory.

As queries access various database tables, they are added to Memory in the InnoDB Buffer Pool for faster access by CPU processes. And if the tables being stored in Memory are larger than what is allocated, the tables will be written to swap instead. As I covered in my recent article on Memory and IOWait, that makes for increasingly painful performance issues.

The InnoDB Buffer Pool is clingy

Yep, that’s right. Like I mentioned above, Memory tends to hold onto the things it’s storing for faster access. That means it doesn’t purge items out of Memory until it actually needs more space to do so. Instead, it uses an algorithm called Least Recently Used (LRU) to identify the least-needed items in cache, and purge that one item out to make room for the next item. So unless your server has simply never had the need to store much in the InnoDB Buffer Pool, it will almost always show high utilization–and that’s not a bad thing! Not unless you are also seeing swap usage. That means something (in my experience, generally MySQL) is overusing its allocated Memory and is being forced to write to disk instead. And if that disk is a rotational disk (SATA/HDD) instead of SSD, that can spiral out of control very easily.

All this to say, the InnoDB Buffer Pool will hang onto stuff, and that’s because it’s doing its job–storing database tables for faster access the next time they are needed. So don’t take high utilization as a sign of outright unhealth! Be sure to factor swap usage into the equation as well.

Allocating Memory to the InnoDB Buffer Pool

InnoDB Buffer Pool size and settings are typically configured in your /etc/mysql/my.cnf file. Here you can set variables like:

innodb-buffer-pool-size = 256M
innodb_io_capacity = 3000
innodb_io_capacity_max = 5000

…And more! There’s a whole host of settings you can configure for your InnoDB Buffer Pool in the MySQL documentation. General guidelines for configuring the pool settings: Ensure it’s smaller than the total amount of Memory on your server, and ensure it’s larger or the same size as the database(s) on your server. From there you can perform testing on your website while fine tuning the settings to see which size is most effective for performance.

Have any comments or questions? Experience to share regarding the InnoDB Buffer Pool? Let me know in the comments, or Contact Me.

Troubleshooting with ngrep

If you’ve ever wanted to monitor and filter through network traffic in realtime, ngrep is about to be your new best friend.

ngrep stands for “network grep” and can be a very useful tool for packet sniffing, troubleshooting, and more. It’s like standard GNU grep (which I talk about a lot in my parsing logs article) but for the network layer. That means you can use regex (or regular expressions) to filter and parse through active network connections. Check out some common examples in the ngrep documentation here. In the following sections we’ll explore what packet sniffing is and why it might be useful to you.

What is packet sniffing?

In short, packet sniffing allows you to inspect the data within each network packet transmitted through your network. Packet sniffing is very useful when troubleshooting network connections. It can show you information like the size of packet data sent and received, headers, set cookies and cookie values, and even *yikes* if sent over HTTP (unencrypted), form data including site login details.

Packet sniffing helps you do a lot of detective work specifically around what is sent over the network. This means it can help you troubleshoot everything from bandwidth usage spikes to identifying network security issues.

That being said, packet sniffers can also be used by hackers and users with malicious intentions to “listen in” on your network. This is one reason why HTTPS is so important–it encrypts the data being transmitted between the web browser and the web server for your site.

Using ngrep to packet sniff

Now let’s dive into some usage examples of ngrep. Please note, in order to use ngrep you will need to be using a compatible operating system (Linux and Mac OS X are both supported), and you will need root access on your server.

Start by connecting to your server via SSH and entering a sudo screen. If you’re not familiar, you can open a sudo screen with the following command, provided you have the right access level:

sudo screen -S SCREENNAME

Once you’re logged into your screen, start simple by watching any traffic crossing port 80 (HTTP traffic is processed by this port):

ngrep -d any port 80

You’ll notice some information flowing across the screen (providing the server is receiving some traffic on port 80 currently), but a lot of it will be super unhelpful junk like this:

In order to get actually useful information we’ll need to filter the results. Let’s try separating it out with “-W byline” instead, and filter for only results that include “GET /” on port 80.

ngrep -q -W byline -i "GET /" port 80

This should yield some more readable results. You should now see lines for headers, remote IP addresses, and set cookies.

Using this same syntax you can grep for more things, like for example:

ngrep -q -W byline -i "POST /wp-login.php" port 80

Be aware, the command above will show any username and passwords sent from the browser to the web server in plain text. However, if you are using SSL/TLS encryption to serve your website via HTTPS instead, this information will be sent over port 443 instead and will be encrypted. A great example of why SSL is so important!

ngrep options

Once you learn the basics like the examples above, you can experiment with the optional flags available with ngrep. Below are some examples of interesting and helpful flags available, though you can find the full list on the MAN page.

-e – show empty packets as well. Normally these are discarded.

-W normal|byline|single|none – specify how you’d like to see the packet data. “Byline” is perhaps most useful in that it allows you to view the data mostly without wrapped content, making the packet entries more easily readable,

-O – dump the output to a file

-q – quiet mode. Only outputs the headers and related payloads.

-i – ignore case (matches UPPERCASE, lowercase, or MiXeD).

-n num – match only num number of packets before exiting.

-v – invert and instead show only things that DON’T match your regex.

host host - specify the hostname or IP address to filter for.

-I pcap_dump – use a file named “pcap_dump” as input for ngrep.

In conclusion

Ngrep can help is a helpful tool for monitoring and filtering packets of data sent over the network. I hope this guide has helped you learn more about ngrep! Have any favorite ngrep commands you use, use cases to share, or questions? Let me know in the comments, or contact me!

I/O, IOWait, and MySQL

Memory can be a fickle, difficult thing to measure. When it comes to server performance, Memory usage data can be misleading. Processes tend to indicate they are using the full amount of Memory allocated to them when viewing server status in tools like htop. In truth, one of the only health indicators for Memory is swap usage. In this article we will explain swap, Memory usage, IOWait, and common issues with Memory.

Web Server Memory

On a web server, Memory is allocated to the various services on your serve: Apache, Nginx, MySQL, and so on. These processes tend to “hold on” to the memory allocated to them. So much so, it can be nearly impossible to determine how much Memory a process is actively using. On web servers, the files requested by services are in cached Memory (RAM) for easy access. Even when files are not actively being used, the Memory holding the files still looks as though it is being utilized. When a file is always being written or read, it is much faster and efficient for the system to store the file in cached Memory.

Measuring Memory usage with the “free” command

In Linux you can use the free command to easily show how much Memory is being utilized. I like to use the -h flag as well, to more easily read the results. This command will show where your Memory is being utilized: total, free, used, cache, and buffers.

Perhaps most importantly, the free command will indicate whether or not you are writing to swap.

Swap

In a web server environment, when a service over-utilizes the allocated Memory, it will begin to write to swap. This means the web server is writing to disk space as a supplement for Memory. Writing to swap is slow and inefficient, causing the CPU to have to wait while the Memory pages are being written to disk. The most obvious warning flag for Memory concern is swap usage. Writing to swap is a clear indicator that Memory is being overused in some capacity. You can measure swap usage using the free command described above. However, it may be more useful to look at a live monitor of usage like htop instead.

htop will show whether Memory as a whole on a web server is being over-utilized, or whether a specific service is over-utilizing its allocated Memory. A good indicator is to look at the total Memory row compared to the swap row. If Memory is not fully utilized but there is still swap usage, this indicates a single service is abusing Memory.

Why is writing to swap slow?

So why would writing to swap be slow, while writing to Memory (RAM) is not? I think this article sums it up best. But basically, there’s a certain amount of latency involved in rotating the disk to the correct storage point. During this time, the CPU (processor) is idle, making for IOWait.

I/O and IOWait

Any read/write process, including writing and reading pages from Memory, is an I/O process. I/O stands for input/output, but for the purposes of this article you can consider I/O to be read and write operations. Writing and reading pages to and from Memory tends to take a few milliseconds. However, writing and reading from swap is a different story. Because swap is disk space being used instead of Memory, the latency caused by rotating the disk to the correct location to access the correct information adds up to IOWait. IOWait is time the processor (CPU) spends waiting for I/O processes to complete.

IOWait can be problematic on its own, but the problem is compounded by IOPs rate limiting. Some datacenter providers have a low threshold for input/output operations. When the rate of I/O operations increases beyond this limitation, these operations are then throttled. This compounds our IOWait issue, because now the CPU must wait even longer for I/O processes to complete. If the throttling or Memory usage becomes too egregious, your data center might even have a trigger to automatically reboot the server.

MySQL and IOWait

In my experience with WordPress, the service that tends to use the most Memory is MySQL by far. This can be for a number of reasons. When a WordPress query accesses a MySQL database, the tables, rows, and indexes must be stored in Memory. Most modern servers have an allocation of Memory for MySQL called the InnoDB Buffer Pool. If this pool is overutilized, MySQL will begin to store those tables, rows, and indexes to swap instead. A common cause of Memory overutilization is extremely large database tables. If these large tables are used often, they will need to be stored in Memory. If your InnoDB Buffer Pool is smaller than your large table, MySQL will write this data to swap instead.

Most often when troubleshooting Memory issues, I find the cause to be unoptimized databases. By ensuring the proper storage engine and reducing database bloat, many Memory and IOWait issues can be avoided from the start. If your database cannot be optimized further, it’s time to optimize your InnoDB Buffer Pool or server hardware instead. MySQL has a guide to optimizing InnoDB Disk I/O you can use for fine tuning.

Table storage engines

Another common MySQL issue happens when the MyISAM table storage engine is used. MyISAM tables cannot use the InnoDB Buffer Pool as they do not use the InnoDB storage engine. Instead, MyISAM uses a key buffer for storing indexes directly from disk cache. As aforementioned, disk cache is not nearly as performant as Memory. And, reading and writing from disk cache is an I/O operation that can easily cause IOWait.

Beyond the performance implications from not using the InnoDB Buffer Pool, MyISAM is not as ideal for databases on production websites that are frequently writing data to tables. MyISAM will lock an entire table while a write operation is updating or adding a row. This means any other requests or MySQL connections attempting to update the table at the same time might experience errors or delays. By contrast, InnoDB allows row-level locking. With a WordPress website, transients, settings, posts, comments and more data are frequently updating the database. This makes the InnoDB table storage engine much more optimal for WordPress websites.

Partitions and Drives

One way hosting providers have found to avoid IOWait issues is to separate MySQL into its own partition or disk. While this does not necessarily remove the IOWait altogether, it logically separates the partition experiencing IOWait from the web server. This means the partition serving website traffic is not impacted beyond slow query performance in high IOWait conditions. For even faster performance, consider SSD for your MySQL partition. SSD, or Solid State Drives, use non-rotational storage known as “flash.” While the cost per GB of storage space is high with SSDs, they are far more performant in terms of IOPs.

Troubleshooting Broken Proxies

If you’re using a performance-optimized server ecosystem, there’s a good chance that you’re using at least one proxy relationship. Whether the proxy is from another server or firewall, or from one web server to another within the same environment, broken proxies can sometimes be nebulous to troubleshoot.

Today we’ll look specifically at the Nginx and Apache proxy relationship, when using both web servers. Curious about the benefits of using Nginx, Apache, or both? Check out the Web Server Showdown.

What is a proxy?

Before we dive in too far, let’s examine: what is a proxy? A proxy, sometimes referred to as an “application gateway,” is a web server that acts as an intermediary to another web server or service. In our example Nginx functions as a proxy server, passing requests to your caching mechanism or to Apache. Apache processes the request, passing it back to Nginx. Nginx in turn passes it to the original requestor.

What is a broken proxy?

Now that we understand the proxy, we can answer the question: what is a broken proxy? A broken proxy refers to when the intermediary service passes a request, but doesn’t get it back. So in our example, Nginx passes the request to Apache. Something happens at the Apache-level to where the request is now gone. Apache now has nothing to hand back to Nginx.

Nginx however is still responsible to the original requestor to tell them… something! It responds by telling the original requestor that it had a bad gateway (proxy) with a 502 or 504 HTTP Response.

Troubleshooting broken proxies

A common problem with proxies is that they can be difficult to troubleshoot. How do you know which service did not respond to the request Nginx (the proxy server) sent? And how do you know why the service did not complete the request?

A good place to start is your logs. Your Nginx error logs will indicate when an upstream error occurred, and may help offer some context, such as the port the request was sent to. These logs will usually be in the log path on your server (/var/log/nginx/ for many), labeled error.log.

Your Nginx error log files will usually show which port produced the upstream error, which is your first clue. In my example, I can look for which service is running on that port. If I know Apache or my caching mechanism is operating on that port, I can know that service is responsible for the error.

6789 in the example in the picture was my Apache service, so I know Apache did not fulfill the request. Now I can check my Apache error logs for more information. These error logs are also generally where your logs are stored on the server, like /var/log/apache2/error.log. If you have multiple sites on the same server, you may have each site’s errors logged in separate files here instead.

Some common reasons Apache might not complete a request:

The request timed out (max_execution_time reached)
The request used too much Memory and was killed
A segmentation fault occurred
The Apache service is not on or currently restarting

Many times your Apache error logs will let you know if the above is causing the issue. If it doesn’t, you may need to consult your firewall or security services on the server to see if the requests were blocked for other security reasons.

Caveats

Keep in mind: even if Apache experiences errors (like a 500 error due to theme or plugin code), as long as Apache entirely processes the request it will simply pass this HTTP status code up to Nginx to serve to your users. So remember, 502 errors will typically only result if there is no response from Apache back to Nginx.

And also remember that broken proxies are not always within the same server environment. If you use a firewall or full site CDN service, the requests are proxied through these external servers as well. If you experience a 502 error and can’t find that request in your access logs, looking to the logs on your external firewall should be your next step.

Have you experienced issues with 502 errors on your server? What was the cause? Have any other solutions or recommendations to include? Let me know in the comments, or Contact Me.