The Amazon AWS Java SQS Client is *Not* Thread-safe

I recently added long polling to an Amazon SQS project that processes thousands of messages a minute. The idea was simple:

  • Spawn N number of threads (let’s say 60) that repeatedly check an SQS queue using long polling
  • Each thread waits for at most one message for maximum concurrency, restarting if no message is found
  • Each time a message is found, the thread processes it and ACK’s via deleteMessage() (failure to do so causes the message to go back on the queue after the visibility timer is reached)

For convenience, I used the Java Concurrency API ScheduledExecutorService.scheduleWithFixedDelay() method, setting each thread with 1 millisecond delay, although I could have accomplished the same thing using the Thread class and an infinite while() loop. With short polling, this kind of structure would tend thrash, but with long polling, each thread is just waiting when there are no messages available. Note: For whatever reason, Java does not allow a 0 millisecond delay for this method, so 1 millisecond it is!

Noticing the Problem
When I started testing my new version based on long polling, I noticed something quite odd. While the messages all seem to be processed quickly (1-10 milliseconds) and there were no errors in the logs, the AWS Console showed 50+ messages in-flight. Based on the number of messages being processed a second and the time it was taking to process them, the in-flight counter should have been only 3-4 messages at any given time but it consistently stayed high.

Isolating the Issue
I knew it had something to do with long polling, since previously with short polling I never saw that many messages consistently in flight, but it took a long time to isolate the bug. I discovered that in certain circumstances the Amazon AWS Java SQS Client is not thread-safe. Apparently, the deleteMessage() call can block if too many other threads are performing long polling. For example, if you set the long polling to 10 seconds, the deleteMessage() can block for 10 seconds. If you set long polling to 20 seconds, the deleteMessage() can block for 20 seconds, and so on. Below is a sample class which reproduces the issue. You may have to run it multiple times and/or increase the number of polling threads, but you should see intermittent delays in deleting messages between Lines 25 and 27.

package net.selikoff.aws;

import java.util.concurrent.*;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.sqs.*;
import com.amazonaws.services.sqs.model.*;

public class SQSThreadSafeIssue {
	private final String queueName;
	private final AmazonSQS sqsClient;
	private final int numberOfThreads;
	
	public SQSThreadSafeIssue(Regions region, String queueName, int numberOfThreads) {
		super();
		this.queueName = queueName;
		this.sqsClient = AmazonSQSClientBuilder.standard().withRegion(region).build(); // Relies on locally available AWS creds
		this.numberOfThreads = numberOfThreads;
	}
	
	private void readAndProcessMessages(ReceiveMessageRequest receiveMessageRequest) {
		final ReceiveMessageResult result = sqsClient.receiveMessage(receiveMessageRequest);
		if(result!=null && result.getMessages()!=null && result.getMessages().size()>0) {
			result.getMessages().forEach(m -> {
				final long start = System.currentTimeMillis();
				System.out.println("1: Message read from queue");
				sqsClient.deleteMessage(new DeleteMessageRequest(queueName, m.getReceiptHandle()));
				System.out.println("2: Message deleted from queue in "+(System.currentTimeMillis()-start)+" milliseconds");
			});
		}
	}
	
	private void createMessages(int count) {
		for(int i=0; i<count; i++) {
			sqsClient.sendMessage(queueName, "test "+System.currentTimeMillis());
		}
	}
	
	public void produceThreadSafeProblem(int numberOfMessagesToAdd) {
		// Start up and add some messages to the queue
		createMessages(numberOfMessagesToAdd);
		
		// Create thread executor service
		final ScheduledExecutorService queueManagerService = Executors.newScheduledThreadPool(numberOfThreads);
		
		// Create reusable request object with 20 second long polling
		final ReceiveMessageRequest receiveMessageRequest = new ReceiveMessageRequest();
		receiveMessageRequest.setQueueUrl(queueName);
		receiveMessageRequest.setMaxNumberOfMessages(1);
		receiveMessageRequest.setWaitTimeSeconds(20);
		
		// Schedule some thread processors
		for(int i=0; i<numberOfThreads; i++) {
			queueManagerService.scheduleWithFixedDelay(() -> readAndProcessMessages(receiveMessageRequest),0,1,TimeUnit.MILLISECONDS);
		}
	}
	
	public static void main(String[] args) {
		final SQSThreadSafeIssue issue = new SQSThreadSafeIssue(Regions.YOUR_REGION_HERE,"YOUR_QUEUE_NAME_HERE",60);
		issue.produceThreadSafeProblem(5);
	}
}

And below is a sample output of this, showing that each message took 20 seconds (the long polling time) to be deleted.

1: Message read from queue
1: Message read from queue
1: Message read from queue
1: Message read from queue
1: Message read from queue
2: Message deleted from queue in 20059 milliseconds
2: Message deleted from queue in 20098 milliseconds
2: Message deleted from queue in 20024 milliseconds
2: Message deleted from queue in 20035 milliseconds
2: Message deleted from queue in 20038 milliseconds

Note: The SQSThreadSafeIssue class requires Java 8 or higher along with the following libraries to compile and run. It uses the latest version of the Amazon AWS Java SDK 1.11.278 available from AWS (although not in mvnrepository.com yet):

Understanding the Problem
Now that we see messages are taking 20 seconds (the long polling time) to be deleted, the large number of messages in-flight makes total sense. If the messages are taking 20 seconds to be deleted, what we are seeing is the total number of in-flight messages over the last 20 second window waiting to be deleted, which is not a ‘true measure’ of in-flight messages actually being processed. The more threads you add, say 100-200, the more easily the issue becomes to reproduce. What’s especially interesting is that the polling threads don’t seem to be blocking each other. For example, if 50 messages come in at once and there are 100 threads available, then all 50 messages get read immediately, while not a single deleteMessage() is allowed through.

So where does the Problem lie? That’s easy. Despite being advertised as @ThreadSafe in the API documentation, the AmazonSQS client is certainly not thread-safe and appears to have a maximum number of connections available. While I imagine this doesn’t come up often when using the default short-polling, it is not difficult to reproduce this problem when long-polling is enabled in a multi-threaded environment.

Finding a Solution
The solution? Oh, that’s trivial. So trivial, I was tempted to leave as an exercise to the reader! But since I’m hoping AWS developers will read article and fully understand the bug, so they can apply a patch, here goes….

You just need to create two AmazonSQS instances in the constructor of SQSThreadSafeIssue, one for reading (Line 21) and one for deleting (Line 26). Once you have two distinct clients, the deletes all happen within a few milliseconds. Once applied to the original project I was working on, the number of in-flight messages dropped significantly to a number that was far more expected.

Although this work-around fix is easy to apply, it should not be necessary, aka you should be able to reuse the same client. In fact, AWS documentation often encourages you to do so. The fact that the Amazon SQS client is not thread-safe when long polling is enabled is a very serious issue, one I’m hoping AWS will resolve in a timely manner!

completing toastmasters pathways level 2 before level 1 is approved

See my main Presentation Mastery Pathways page for some context.

I completed all the projects in level 1 on January 4. My level 1 actually got approved today due to some difficulties in processing. I had lots of speaking opportunities in the club though and I successfully completed all the level 2 projects before I obtained access in Base Camp. This blog post is about that journey!

Each path has three required projects for level 2. All of them have the “Introduction to Toastmasters Mentoring” project. The Presentation Mastery path also has “Understanding your Communication Style” and “Effective Body Language.”

For level 2, you can do the three projects in any order. I describe them here in the order I did them.

Introduction to Toastmasters Mentoring

On January 10th, a speaker at our Speechcraft session cancelled so I jumped in with this speech. Since level 2 was locked, I went online to see if anyone had uploaded the PDF. I found out that a club shared the “Introduction to Toastmasters Mentoring” PDF online (link no longer works). The description on mentoring vs coaching was excellent. I like how the project has you speak about a time you were mentored.

For more on this project or how to download the evaluation sheet, see my starting level 2 blog post.

Effective Body Language

On January 16th, I was giving members of my New York club an unofficial preview of Pathways. I choose to use the “Effective Body Language” speech for this. I couldn’t find the PDF manual online. Instead I went to the “Speeches and Evaluations” section of Base Camp and downloaded the evaluation sheet. I gave my speech and got evaluated.

Now (as I write this blog post), I’m reading the actual project. I learned that I was supposed to get feedback from a mentor or reviewer while practicing. Oops. The online project also contains good tips on posture, stance, position and movement. I need to move more deliberately when I speak! There was also good descriptions of the four different types of gestures: descriptive, emphatic, suggestive and prompting. There was a video and great interactive exercises. Finally, there were references to culture and the visual impaired.

Understanding your Communication Style

One of my clubs meets on Thursdays lunchtime. At the January 18th meeting, we had a speaker cancel the evening before. This happens sometimes. Work is of course the priority! As a DTM, I knew it would be no trouble to put together a speech the night before. And it was a perfect opportunity to complete level 2!

The same club that shared the “Introduction to Toastmasters Mentoring” also club shared the “Understanding your Communication Style” PDF online.

I read the PDF. It contains an excellent 12 question “test” where you answer how you view yourself. Then you add up the scores to determine your communication style:

  • Analytical
  • Direct
  • Initiating
  • Supporting

You can then read how each communication style interacts with the others. (The online version is better because it automatically tallies your score and lets you control the order in which you read about the styles.)

Not surprisingly, I’m mainly analytical/direct. Then I had to write a speech. Since Pathways is new, I chose to include a couple sentences on what each was. Then I had the audience vote on which they thought was my predominant style. The majority of my speech was me telling a story of a strength (or perceived weakness) of my interactions with each of the four styles. It wound up being a great speech. Our VPE even suggested that I save it for the humorous speech contest.

This is a great project and really shows the benefit of the Pathways educational program!

Submitting level 2

Since I did all three projects on paper, I went back and clicked through in Base Camp. Then I emailed my evaluations to our club leadership for approval. And now they know what to do so getting access to Level 3 should be fast!

 

Approving a Pathways level request

I submitted my Pathways Level 1 award in early January. Due to a combination of club officer vacations and the officers not knowing what to do (since Pathways is new), it took over a month to get it approved. I completed level 2 in that time.

Today, I screenshared with the club President and we approved my Level 1 award. Now that we know what to do,we know it should take under ten minutes. Here’s the process to quickly and easily approve a level for members of your Toastmasters club!

Step 1 – the club officer gathers info that the member  has completed the projects in the path.

There are a few options for doing this

  1. The member provides sufficient evidence that he/she has completed the speeches in the path. I went this route and emailed all my evaluations to the President and VPE. (I had chose this option because I’m a member of two clubs so the officers of the Pathways club have no other ways of validating)
  2. Past meeting agendas
  3. Speak easy or other online tracking system.

I recommend having the member at least provide you with the dates for validation if not the evaluations.

Step 2 – sign in to base camp manager

The club President, VPE or Secretary has to do this step.

  1. Sign into toastmasters.org
  2. Click “pathways”
  3. Click “go to basecamp”
  4. In the middle tile, you’ll have two options – you could choose “log in as a member” – but don’t. That takes you to Base Camp rather than Base Camp Manager. Instead, click the button under it to go to Base Camp Manager.
  5. Click “Pending Requests”
  6. Clcik the members name to view the transcript to verify. This could be cross referencing PDF evals or looking at agendas
  7. Then click the green checkbox to approve or the red x to reject. Either way, you can leave an optional comment to submit.

This process (starting from step 5), is described in the official docs with screenshots.

The member gets an email. I got mine a few minutes after the club officer hit approve.

Step 3 – getting DCP credit

Then go back into toastmasters.org and file an educational award. This increases the member’s title and gets credit towards the club’s DCP. It’s easy though – no need to type in titles:

  1. Club central
  2. Submit education awards
  3. Select member from pull down
  4. Education – level 1 (or whatever level)
  5. Submit