• Skip to main content
  • Skip to footer

TechGirlKB

Performance | Scalability | WordPress | Linux | Insights

  • Home
  • Speaking
  • Posts
    • Linux
    • Performance
    • Optimization
    • WordPress
    • Security
    • Scalability
  • About Janna Hilferty
  • Contact Me

Linux

Writing a basic cron job in Linux

Cron jobs are one of the common ways developers schedule repeated tasks on a Linux system. A cron job is a simple way to schedule a script to run on your system on a regular basis. Writing a cron job happens in three basic steps:

  1. Write the script you wish to execute (e.g. mycron.php)
  2. Determine how often you’d like the script to run
  3. Write a crontab for it

Writing the script

Start by writing your script or command. It can be in any language you like, but most often crons are written in bash or PHP. Create your cron.txt or cron.sh file if needed. Here’s a basic PHP file that just prints “hello world” in stdout (standard out):

<?php
echo "hello world";
?>

Or a bash script that does the same thing:

!/usr/bin/env bash
echo "hello world";

Or if you want to have some fun, on a Mac you can use the say command to have your computer speak a line instead.

say -v fred "I love Linux"

You can save the say command as a bash file (cron.sh for example) and invoke it with bash. 

Save your file as the appropriate file extension, then move on to the next step: determining how often the command should run. 

Determining cron schedule

Now that you have your script written, it’s time to make some decisions about how frequently you wish the script to be executed. 

Crontab allows you the freedom to run the script or command every minute, hour, day, month, or any combination thereof. If you want the script to be run at 5pm every 3rd Thursday of the month, or only Wednesday through Friday at 10am daily, that’s possible with cron.

When it comes to deciding a cron schedule, less is more. By that I mean, for the health of your server it’s best to run it as infrequently as possible to maximize your server resources. Keep in mind that the more resource intensive your script is, the more possible it is that your crons could be disrupting to the experience of your end users.

The basic format of a cron is:

* * * * * command to run 

Each of those asterisks has a particular unit of time it represents, however.

  1. minute (0-59) of the hour you wish the command to execute. * means every minute of the hour.
  2. hour (0-23) of the day you wish the command to execute. * means every hour of the day.
  3. day (1-31) of the month you wish the command to execute. * means every day of the month.
  4. month (1-12) of the year you wish the command to execute. * means every/any month of the year.
  5. day of the week (0-6) you wish the command to execute. * means any day of the week.

Using these guidelines you can format your schedule. Below are some examples:

# every minute of every hour/day/month/year
* * * * *

# every hour, on the hour
0 * * * *

# every day at 10am
0 10 * * *

# every Thursday at midnight
0 0 * * 4

# every Monday, Wednesday, and Friday at 3:30pm
30 15 * * 1,3,5

If you want to check your syntax, there’s a handy website called crontab.guru which can help you test out some schedule combinations.

Writing the crontab

The crontab command is how scheduled cron jobs are executed. With crontab we will take the script and the schedule and combine them to actually execute the script or command on the schedule you’d like.

Start by listing any existing cron schedules on your machine with crontab -l

If you’ve never written a cron before, you’ll probably see the following:

$ crontab -l
crontab: no crontab for [yourusername]

Now to edit the crontab, we’ll use crontab -e. This will automatically put you in the vim editor for Linux, so here’s a brief tutorial if you’ve never used it before.

Hit i to enter “insert” mode and copy your schedule first. Then you can either type the command directly after. If your cron script is written in PHP and is located in /var/www/html/cron.php you can use the syntax below:

* * * * * /usr/bin/php /var/www/html/cron.php

Or if you want to use the command right in crontab, you can do that too:

* * * * * echo "I love Linux" >> /var/log/cron-stdout.log

Notice in the command above I also added a log file to store the stdout results of the command. This is an option as well and can help you ensure your cron is running as scheduled! 

And of course, if you wanted to prank a coworker when they leave their Mac unlocked at work:

* * * * * say -v fred "I am buying nachos for everyone at 5pm"

Once you’ve finished editing, hit the esc key to exit “insert” mode, then type wq! to force write and quit the file. You should see a message, “crontab: installing new crontab” after saving. 

… And that’s it! Now your cron will execute on your schedule. Before we part ways, a few words of caution: 

  • Most managed web hosts don’t allow you to edit system crons. Most times this requires root SSH access on the Linux machine. 
  • If you’re using WordPress, you can easily get around this restraint by using WordPress cron, and managing it using a simple tool like WP Crontrol. 
  • Cron is a word that’s easy to mistype. Pro-tip: nobody knows what you want when you’re asking about “corn jobs” 😉

How about you? Any pro-tips for those looking to create a cron job? Leave a comment, or Contact Me. 

How the InnoDB Buffer Pool Allocates and Releases Memory

As you may know or have noticed, Memory utilization can be a difficult to truly understand. While tools like free -m can certainly help, they aren’t necessarily a true indication of health or unhealth. For example, if I see 90% Memory utilization, that’s not exactly an indication that it’s time to add more resources. The nature of Memory is to store temporary data for faster access the next time it is needed, and because of this, it tends to hold onto as much temporary data as it can, until it needs to purge something out to make more space.

90% utilization but no swap usage

About the InnoDB Buffer Pool

InnoDB (a table storage engine for MySQL), has a specific pool of Memory allocated to MySQL processes involving InnoDB tables called the InnoDB Buffer Pool. Generally speaking, it’s safest to have an InnoDB Buffer Pool at least the same size as your database(s) on your server environment to ensure all tables can fit into the available Memory.

As queries access various database tables, they are added to Memory in the InnoDB Buffer Pool for faster access by CPU processes. And if the tables being stored in Memory are larger than what is allocated, the tables will be written to swap instead. As I covered in my recent article on Memory and IOWait, that makes for increasingly painful performance issues. 

The InnoDB Buffer Pool is clingy

Yep, that’s right. Like I mentioned above, Memory tends to hold onto the things it’s storing for faster access. That means it doesn’t purge items out of Memory until it actually needs more space to do so. Instead, it uses an algorithm called Least Recently Used (LRU) to identify the least-needed items in cache, and purge that one item out to make room for the next item. So unless your server has simply never had the need to store much in the InnoDB Buffer Pool, it will almost always show high utilization–and that’s not a bad thing! Not unless you are also seeing swap usage. That means something (in my experience, generally MySQL) is overusing its allocated Memory and is being forced to write to disk instead. And if that disk is a rotational disk (SATA/HDD) instead of SSD, that can spiral out of control very easily. 

All this to say, the InnoDB Buffer Pool will hang onto stuff, and that’s because it’s doing its job–storing database tables for faster access the next time they are needed. So don’t take high utilization as a sign of outright unhealth! Be sure to factor swap usage into the equation as well.  

Allocating Memory to the InnoDB Buffer Pool

InnoDB Buffer Pool size and settings are typically configured in your /etc/mysql/my.cnf file. Here you can set variables like:

innodb-buffer-pool-size = 256M
innodb_io_capacity = 3000
innodb_io_capacity_max = 5000

…And more! There’s a whole host of settings you can configure for your InnoDB Buffer Pool in the MySQL documentation. General guidelines for configuring the pool settings: Ensure it’s smaller than the total amount of Memory on your server, and ensure it’s larger or the same size as the database(s) on your server. From there you can perform testing on your website while fine tuning the settings to see which size is most effective for performance. 


Have any comments or questions? Experience to share regarding the InnoDB Buffer Pool? Let me know in the comments, or Contact Me. 

Troubleshooting with ngrep

If you’ve ever wanted to monitor and filter through network traffic in realtime, ngrep is about to be your new best friend. 

ngrep stands for “network grep” and can be a very useful tool for packet sniffing, troubleshooting, and more. It’s like standard GNU grep (which I talk about a lot in my parsing logs article) but for the network layer. That means you can use regex (or regular expressions) to filter and parse through active network connections. Check out some common examples in the ngrep documentation here. In the following sections we’ll explore what packet sniffing is and why it might be useful to you.

What is packet sniffing?

In short, packet sniffing allows you to inspect the data within each network packet transmitted through your network. Packet sniffing is very useful when troubleshooting network connections. It can show you information like the size of packet data sent and received, headers, set cookies and cookie values, and even *yikes* if sent over HTTP (unencrypted), form data including site login details. 

Packet sniffing helps you do a lot of detective work specifically around what is sent over the network. This means it can help you troubleshoot everything from bandwidth usage spikes to identifying network security issues. 

That being said, packet sniffers can also be used by hackers and users with malicious intentions to “listen in” on your network. This is one reason why HTTPS is so important–it encrypts the data being transmitted between the web browser and the web server for your site. 

Using ngrep to packet sniff

Now let’s dive into some usage examples of ngrep. Please note, in order to use ngrep you will need to be using a compatible operating system (Linux and Mac OS X are both supported), and you will need root access on your server. 

Start by connecting to your server via SSH and entering a sudo screen. If you’re not familiar, you can open a sudo screen with the following command, provided you have the right access level:

sudo screen -S SCREENNAME

Once you’re logged into your screen, start simple by watching any traffic crossing port 80 (HTTP traffic is processed by this port):

ngrep -d any port 80

You’ll notice some information flowing across the screen (providing the server is receiving some traffic on port 80 currently), but a lot of it will be super unhelpful junk like this:

In order to get actually useful information we’ll need to filter the results. Let’s try separating it out with “-W byline” instead, and filter for only results that include “GET /” on port 80.

ngrep -q -W byline -i "GET /" port 80

This should yield some more readable results. You should now see lines for headers, remote IP addresses, and set cookies. 

Using this same syntax you can grep for more things, like for example:

ngrep -q -W byline -i "POST /wp-login.php" port 80

Be aware, the command above will show any username and passwords sent from the browser to the web server in plain text. However, if you are using SSL/TLS encryption to serve your website via HTTPS instead, this information will be sent over port 443 instead and will be encrypted. A great example of why SSL is so important!

ngrep options

Once you learn the basics like the examples above, you can experiment with the optional flags available with ngrep. Below are some examples of interesting and helpful flags available, though you can find the full list on the MAN page. 

-e – show empty packets as well. Normally these are discarded.

-W normal|byline|single|none – specify how you’d like to see the packet data. “Byline” is perhaps most useful in that it allows you to view the data mostly without wrapped content, making the packet entries more easily readable, 

-O – dump the output to a file

-q – quiet mode. Only outputs the headers and related payloads.

-i – ignore case (matches UPPERCASE, lowercase, or MiXeD).

-n num – match only num number of packets before exiting.

-v – invert and instead show only things that DON’T match your regex.

host host - specify the hostname or IP address to filter for.

-I pcap_dump – use a file named “pcap_dump” as input for ngrep.

In conclusion

Ngrep can help is a helpful tool for monitoring and filtering packets of data sent over the network. I hope this guide has helped you learn more about ngrep! Have any favorite ngrep commands you use, use cases to share, or questions? Let me know in the comments, or contact me!

I/O, IOWait, and MySQL

Memory can be a fickle, difficult thing to measure. When it comes to server performance, Memory usage data can be misleading. Processes tend to indicate they are using the full amount of Memory allocated to them when viewing server status in tools like htop. In truth, one of the only health indicators for Memory is swap usage. In this article we will explain swap, Memory usage, IOWait, and common issues with Memory.

Web Server Memory

On a web server, Memory is allocated to the various services on your serve: Apache, Nginx, MySQL, and so on. These processes tend to “hold on” to the memory allocated to them. So much so, it can be nearly impossible to determine how much Memory a process is actively using. On web servers, the files requested by services are in cached Memory (RAM) for easy access. Even when files are not actively being used, the Memory holding the files still looks as though it is being utilized. When a file is always being written or read, it is much faster and efficient for the system to store the file in cached Memory.

Measuring Memory usage with the “free” command

In Linux you can use the free command to easily show how much Memory is being utilized. I like to use the -h flag as well, to more easily read the results. This command will show where your Memory is being utilized: total, free, used, cache, and buffers.

Perhaps most importantly, the free command will indicate whether or not you are writing to swap.

Swap

In a web server environment, when a service over-utilizes the allocated Memory, it will begin to write to swap. This means the web server is writing to disk space as a supplement for Memory. Writing to swap is slow and inefficient, causing the CPU to have to wait while the Memory pages are being written to disk. The most obvious warning flag for Memory concern is swap usage. Writing to swap is a clear indicator that Memory is being overused in some capacity. You can measure swap usage using the free command described above. However, it may be more useful to look at a live monitor of usage like htop instead.

htop will show whether Memory as a whole on a web server is being over-utilized, or whether a specific service is over-utilizing its allocated Memory. A good indicator is to look at the total Memory row compared to the swap row. If Memory is not fully utilized but there is still swap usage, this indicates a single service is abusing Memory.

Why is writing to swap slow?

So why would writing to swap be slow, while writing to Memory (RAM) is not? I think this article sums it up best. But basically, there’s a certain amount of latency involved in rotating the disk to the correct storage point. During this time, the CPU (processor) is idle, making for IOWait.

I/O and IOWait

Any read/write process, including writing and reading pages from Memory, is an I/O process. I/O stands for input/output, but for the purposes of this article you can consider I/O to be read and write operations. Writing and reading pages to and from Memory tends to take a few milliseconds. However, writing and reading from swap is a different story. Because swap is disk space being used instead of Memory, the latency caused by rotating the disk to the correct location to access the correct information adds up to IOWait. IOWait is time the processor (CPU) spends waiting for I/O processes to complete.

IOWait can be problematic on its own, but the problem is compounded by IOPs rate limiting. Some datacenter providers have a low threshold for input/output operations. When the rate of I/O operations increases beyond this limitation, these operations are then throttled. This compounds our IOWait issue, because now the CPU must wait even longer for I/O processes to complete. If the throttling or Memory usage becomes too egregious, your data center might even have a trigger to automatically reboot the server.

MySQL and IOWait

In my experience with WordPress, the service that tends to use the most Memory is MySQL by far. This can be for a number of reasons. When a WordPress query accesses a MySQL database, the tables, rows, and indexes must be stored in Memory. Most modern servers have an allocation of Memory for MySQL called the InnoDB Buffer Pool. If this pool is overutilized, MySQL will begin to store those tables, rows, and indexes to swap instead. A common cause of Memory overutilization is extremely large database tables. If these large tables are used often, they will need to be stored in Memory. If your InnoDB Buffer Pool is smaller than your large table, MySQL will write this data to swap instead.

Most often when troubleshooting Memory issues, I find the cause to be unoptimized databases. By ensuring the proper storage engine and reducing database bloat, many Memory and IOWait issues can be avoided from the start. If your database cannot be optimized further, it’s time to optimize your InnoDB Buffer Pool or server hardware instead. MySQL has a guide to optimizing InnoDB Disk I/O you can use for fine tuning.

Table storage engines

Another common MySQL issue happens when the MyISAM table storage engine is used. MyISAM tables cannot use the InnoDB Buffer Pool as they do not use the InnoDB storage engine. Instead, MyISAM uses a key buffer for storing indexes directly from disk cache. As aforementioned, disk cache is not nearly as performant as Memory. And, reading and writing from disk cache is an I/O operation that can easily cause IOWait.

Beyond the performance implications from not using the InnoDB Buffer Pool, MyISAM is not as ideal for databases on production websites that are frequently writing data to tables. MyISAM will lock an entire table while a write operation is updating or adding a row. This means any other requests or MySQL connections attempting to update the table at the same time might experience errors or delays. By contrast, InnoDB allows row-level locking. With a WordPress website, transients, settings, posts, comments and more data are frequently updating the database. This makes the InnoDB table storage engine much more optimal for WordPress websites.

Partitions and Drives

One way hosting providers have found to avoid IOWait issues is to separate MySQL into its own partition or disk. While this does not necessarily remove the IOWait altogether, it logically separates the partition experiencing IOWait from the web server. This means the partition serving website traffic is not impacted beyond slow query performance in high IOWait conditions. For even faster performance, consider SSD for your MySQL partition. SSD, or Solid State Drives, use non-rotational storage known as “flash.” While the cost per GB of storage space is high with SSDs, they are far more performant in terms of IOPs.

 

Troubleshooting Broken Proxies

If you’re using a performance-optimized server ecosystem, there’s a good chance that you’re using at least one proxy relationship. Whether the proxy is from another server or firewall, or from one web server to another within the same environment, broken proxies can sometimes be nebulous to troubleshoot.

Today we’ll look specifically at the Nginx and Apache proxy relationship, when using both web servers. Curious about the benefits of using Nginx, Apache, or both? Check out the Web Server Showdown.

What is a proxy?

Before we dive in too far, let’s examine: what is a proxy? A proxy, sometimes referred to as an “application gateway,” is a web server that acts as an intermediary to another web server or service. In our example Nginx functions as a proxy server, passing requests to your caching mechanism or to Apache. Apache processes the request, passing it back to Nginx. Nginx in turn passes it to the original requestor.

What is a broken proxy?

Now that we understand the proxy, we can answer the question: what is a broken proxy? A broken proxy refers to when the intermediary service passes a request, but doesn’t get it back. So in our example, Nginx passes the request to Apache. Something happens at the Apache-level to where the request is now gone. Apache now has nothing to hand back to Nginx.

Nginx however is still responsible to the original requestor to tell them… something! It responds by telling the original requestor that it had a bad gateway (proxy) with a 502 or 504 HTTP Response.

Troubleshooting broken proxies

A common problem with proxies is that they can be difficult to troubleshoot. How do you know which service did not respond to the request Nginx (the proxy server) sent? And how do you know why the service did not complete the request?

A good place to start is your logs. Your Nginx error logs will indicate when an upstream error occurred, and may help offer some context, such as the port the request was sent to. These logs will usually be in the log path on your server (/var/log/nginx/ for many), labeled error.log.

Your Nginx error log files will usually show which port produced the upstream error, which is your first clue. In my example, I can look for which service is running on that port. If I know Apache or my caching mechanism is operating on that port, I can know that service is responsible for the error.

6789 in the example in the picture was my Apache service, so I know Apache did not fulfill the request. Now I can check my Apache error logs for more information. These error logs are also generally where your logs are stored on the server, like /var/log/apache2/error.log. If you have multiple sites on the same server, you may have each site’s errors logged in separate files here instead.

Some common reasons Apache might not complete a request:

  • The request timed out (max_execution_time reached)
  • The request used too much Memory and was killed
  • A segmentation fault occurred
  • The Apache service is not on or currently restarting

Many times your Apache error logs will let you know if the above is causing the issue. If it doesn’t, you may need to consult your firewall or security services on the server to see if the requests were blocked for other security reasons.

Caveats

Keep in mind: even if Apache experiences errors (like a 500 error due to theme or plugin code), as long as Apache entirely processes the request it will simply pass this HTTP status code up to Nginx to serve to your users. So remember, 502 errors will typically only result if there is no response from Apache back to Nginx.

And also remember that broken proxies are not always within the same server environment. If you use a firewall or full site CDN service, the requests are proxied through these external servers as well. If you experience a 502 error and can’t find that request in your access logs, looking to the logs on your external firewall should be your next step.


Have you experienced issues with 502 errors on your server? What was the cause? Have any other solutions or recommendations to include? Let me know in the comments, or Contact Me.

 

 

Installing Varnish on Ubuntu

In a few of my posts I’ve talked about the benefits of page cache systems like Varnish. Today we’ll demonstrate how to install it! Before continuing, be aware that this guide assumes you’re using Ubuntu on your server.

Why use Varnish?

Firstly, let’s talk about why page cache is fantastic. For dynamic page generation languages like PHP, the amount of server processing power it takes to build a page compared to serving a static file (like HTML) is substantially more. Since the page has to be rebuilt with each new user to request it, the server does a lot of redundant work. But this also allows for more customization to your users since you can tell the server to build the page differently based on different conditions (geolocation, referrer, device, campaign, etc).

That being said, using persistent page cache is an easy way to get the best of both worlds: cache holds onto a static copy of the page that was generated for a period of time, and then the page can be built as new whenever the cache expires. In short, page cache allows your pages to load in a few milliseconds rather than 1+ full seconds.

Installing Varnish

To install Varnish on a system using Ubuntu, you’ll use the package installer. While logged into your server (as a non-root user), run the following:

sudo apt install varnish

Be sure the Varnish service is stopped while you configure it! You can stop the Varnish service like this:

sudo systemctl stop varnish

Now it’s time to configure the Varnish settings. Make a copy of the default configuration file like so:

cd /etc/varnish
sudo cp default.vcl mycustom.vcl

Make sure Varnish is configured for the right port (we want port 80 by default) and the right file (our mycustom.vcl file):

sudo nano /etc/default/varnish
DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/mycustom.vcl \
-S /etc/varnish/secret \
-s malloc,256m"

Configuring Varnish

The top of your mycustom.vcl file should read like this by default:

backend default {
.host = "127.0.0.1";
.port = "8080";
}

This line defines the “backend,” or which port to which Varnish should pass uncached requests. Now we want to configure the web server to listen on the right port. Nginx will listen on port 8080 by default, but if you’re using Apache you may need to modify the port in your /etc/apache2/ports.conf file and /etc/apache2/sites-enabled/000-default.conf to reference port 8080.

From here you can begin to customize your configuration! You can tell Varnish what requests to add X-Group headers for, which pages to strip out cookies on, how and when to purge the cache, and more. You probably only want to cache GET and HEAD methods for requests, as POST requests should always be uncached. Here’s a basic rule that says to add a header saying not to cache anything that’s not GET and HEAD:

sub vcl_recv {
if (req.request != "GET" && req.request != "HEAD") {
set req.http.X-Pass-Method = req.request;
return (pass);
}
}

And here’s an excerpt which says not to cache anything with the path “wp-admin” (a common need for sites with WordPress):

sub vcl_recv
{
if (req.http.host == "mysite.com" &&
req.url ~ "^/wp-admin")
{
return (pass);
}
}

There’s a ton of other fun custom configurations you can add. To research the available options and experiment with them, check out the book from Varnish.

Once you’ve added in your customizations, be sure to start Varnish:

sudo systemctl start varnish

Now what?

Now you have Varnish installed and configured! Your site will cache pages and purge the cache based on the settings you’ve configured in the mycustom.vcl file. Using cache and caching heavily will heavily benefit your site performance. And, it’ll help your site scale to support more traffic at a time. Enjoy!

 

Have more questions about Varnish? Confused about how cache works? Any cool cache rules you use in your own environment? Let me know in the comments or contact me.

  • « Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Next Page »

Footer

Categories

  • Ansible
  • AWS
  • Git
  • Linux
  • Optimization
  • Performance
  • PHP
  • Scalability
  • Security
  • Uncategorized
  • WordPress

Copyright © 2021 · Atmosphere Pro on Genesis Framework · WordPress · Log in