How We Talk to Governments In a Modern Way

Chris Newhouse
Flexport Engineering
9 min readAug 19, 2019

--

Governments, as we would recognize them, have existed for thousands of years, but trade between states and communities has been around even longer. When governments and trade started to co-exist, customs was born. Most of Flexport’s business is focused on moving containers from their origin to their destination (an already tricky challenge). However, customs has a different set of concerns: What are the actual products in the container? Where were they made? How much do they cost? Who are the transacting parties involved?

As a software engineer on the customs team, I get to work on the problems related to these questions, including how we interact with governments in a modern way in order to provide global customs services to our clients.

Today, there are many 3rd party software vendors (3PVs) that specialize in the various parts of the process of interacting with governments regarding customs requirements. In the early days of Flexport customs, it made more sense to “buy” these services from these vendors than it did to build these services in-house. This was primarily because:

  • We didn’t have to be experts in all the minutiae of the customs requirements and logic that these vendors had already figured out and handled for us
  • We could use familiar APIs (e.g. RESTful endpoints with JSON payloads)
  • We could get things working in a few days vs. much longer times to build things in-house
  • We didn’t have to maintain and update the resulting system

Over time, and in the interest of becoming The Operating System for Global Trade, the arguments for “building” started to pick-up steam and eventually outweighed the “buy” option. Building our own government integrations would allow us to:

  • Offer new capabilities that weren’t possible through our preferred 3PVs
  • Generate and capture more data with higher fidelity
  • Increase the speed of filings and queries
  • Reduce the delay in receiving and processing responses from government agencies
  • Increase security by reducing the amount of parties with access to our data
  • Reduce the costs to our clients

Challenges

Eventually, we built our own integrations with various government agencies and it’s been a big success for us. However, achieving this did not come easy, as it introduced a large set of business and technical challenges. Let’s dive into a few of the more interesting technical hurdles we had to clear with US Customs and Border Protection (CBP) in order to make this happen.

Multiple Systems

Most API providers I’ve used will consolidate all the capabilities they’d like to expose under one gateway/interface: there is one set of documentation, one authentication scheme, one host, one request/response payload format, etc. This is not (currently) the case with CBP. There are several different systems with different capabilities and interfaces, requiring us to thoroughly learn, implement and maintain abstractions around each of them. Which one(s) to use depends entirely on what you are trying to accomplish.

CATAIR (vs JSON)

Most modern APIs support one or more of the popular data formats (usually JSON or XML). They are familiar to developers, have robust support in most languages and frameworks, and are also relatively human-readable. Depending on which of CBPs systems you are interacting with, you’re actually much more likely to encounter something like CATAIR — a fixed-width, looping, plain-text based format. It might look something like this:

Example of a CATAIR request

MQ (vs HTTPS)

Most modern APIs are implemented using a combination of REST over an HTTPS protocol. This combination is also familiar and well supported today nearly everywhere (e.g. curl). CBP’s systems, however, require communication via an MQ protocol with an MQ server. MQ has a lot of positive qualities and use-cases where it’s preferable over REST/HTTPS, but it’s also harder to work with than a REST protocol for a few reasons:

  • MQ is less common, and therefore less supported. You will almost certainly need special drivers/libraries for your language/framework, and in some cases you may not be able to find any.
  • MQ is an asynchronous paradigm (whereas REST/HTTPS is synchronous). This means that messages and requests from Flexport are written to CBP via one queue, and the responses from CBP are read by us from another. It’s kind of like cooking a pizza in an oven fed by a conveyor belt: you put the uncooked pizza on the belt, it disappears inside the oven, and a bit later it comes out the other side all cooked.
  • Because it’s asynchronous, there’s also no guarantee that responses will come quickly, or in the same order as the requests that triggered them. Also, unexpected “push” messages are occasionally sent to our read queue, like when customs updates the status of an entry, or updates HTS Codes.

VPN (vs open web)

For added security, in addition to a normal set of user:pass credentials, CBP requires that you communicate with its systems over a VPN rather than across the open web. This means any Flexport server that would like to directly communicate with any of CBP’s systems must route its traffic through a secure tunnel from our network to theirs, or our connection attempts will not go anywhere.

Solution(s)

As you can see, sometimes communicating with external systems of regulatory bodies in the world of trade is quite difficult, and there are a number of challenges to overcome. As a technology company, solving problems like these is part of the value and differentiation Flexport brings to the table, and part of opportunity we are tapping into. Here’s a simplified outline of some of the ways we tackled the challenges described above.

VPN

We set up a group on AWS that will allow all hosts it contains to route appropriate traffic from our network through the required VPN to CBPs network. All the parts of our system that need to communicate with CBP directly must be deployed into this group, but otherwise that particular challenge is solved.

MQ

We needed a way to communicate with MQ. At the time our solution was originally developed, Flexport was nearly entirely a Rails shop, so we had hoped to find a package that would allow us to communicate with MQ directly from our backend. Unfortunately, we could not find anything robust enough for our needs in Rubyland, and the next best options for us were written for Java and Node. We chose to develop this bit of the system in Node, and to implement it as a microservice that simply shuffles messages back-and-forth between SQS on the Flexport side and MQ on the CBP side. SQS has support in nearly all modern languages, and can be accessed from any machine in our system with the proper credentials. We call the resulting microservice “the Shim”.

We created this service by running a cluster of Docker containers on hosts that have access to the VPN, and there are a number of positive characteristics of the design:

  • The cluster only needs to communicate with 2 pretty specific egress and ingress points (SQS on the one side, MQ on the other), and the rest can be locked down.
  • Since the intention of the Shim is to simply shuffle messages back and forth, the code is very stable and we don’t need to update it or redeploy often.
  • The Shim processes do not require a lot of CPU/memory, so we don’t need a lot of them. This helps us save on our infrastructure costs, but more importantly it limits the number of hosts that have access to the sensitive VPN, making things more secure.
  • No other parts of the system need to be deployed into the VPN since they can all communicate with the Shim via SQS.

CATAIR, etc

We’ve now solved the problems stemming directly from the VPN, and most of the MQ challenges (more on this coming up), but there is still more to go. The two systems can send messages to each other, but CBP still communicates in those fixed-width formats described above whereas our backend likes data structures that are a bit more traditional. To solve this, we created a set of codecs that can perform bi-directional transformation between the CBP-specific messages and JSON-like data blobs that our application can work with. The system we devised for this translation was one of the more difficult challenges in this process and is outside the scope of this blog post, but in short it is a robust, extensible and powerful module.

Sync vs. Async

We still have the problem of synchronicity: we are sending out messages to the MQ server, but are receiving responses to those requests “sometime” later. For this problem, there are 2 things we need to accomplish: (1) how to know what a message received is in response to and (2) how to ensure each message gets to the right place so it can be handled. To recycle the pizza oven analogy I used above, it’s like putting a whole bunch of uncooked pizzas onto the belt and watching them go into the oven, but they can come out in any order after any amount of time, so it’s hard to know when each pizza will be done and to whom they belong when they come out.

Fortunately, CBP’s specifications allow us space to include some “correlation data” with our payloads for this very purpose: whatever data we put in the correlation area of our requests will be part of any messages that are eventually sent to us by CBP in response to those requests. For example, by putting serialized object identifiers into the correlation area, we can now know which record(s) in our system that this response should affect.

For many of the actions we support, an asynchronous response is to be expected (e.g. bill matching for some filings can take hours or days), but for others (e.g. querying a Manufacturer), a more “synchronous” experience is desired. For the asynchronous-friendly actions, our Rails backend can translate the request payloads to the appropriate format (with correlation data included) and then put the message directly onto the appropriate Shim-bound SQS queue. The eventual asynchronous response is picked up by a worker, logged, decoded, correlated and then acted upon. To achieve a synchronous experience (e.g. when a User is waiting for a response on the frontend), we added a layer of lightweight HTTPS servers written in pure Ruby, which become part of a managed, load-balanced, internal-only API we call the “Gateway”. A User-triggered action on the front-end might make a normal Ajax request to the backend, which in turn will encode the request payload for the action and make a standard HTTPS request to the Gateway with it. The Gateway server that receives the request will add some information to the correlation area to identify itself and the specific request it is handling, and then pass that message along to the appropriate Shim-bound SQS queue while holding open the HTTPS request. Unless things timeout, threading in the Gateway processes combined with a bit of SQS routing + the correlation data baked into the payloads allows us to go round-trip with CBP and match up the response messages with the backend request that is waiting for them. The backend can then do what it needs to do, and respond to the Ajax request from the User on the frontend, completing the full request/response lifecycle in a synchronous-feeling manner.

With all of this, we now have a system with which we can communicate with CBP to accomplish our current and future needs. Here’s what it looks like:

Simplified diagram of the system

TODO:

The system I’ve just described has served us well, but it’s not perfect and there are still challenges that remain. Here are a few of them:

  • We could probably get rid of the Gateway part of the system, and instead use Websockets to get messages to the frontend and still feel synchronous enough.
  • We could have leveraged a Kafka-like system instead of simple queues in order to make routing a bit more simple, flexible, and powerful.
  • Flexport has not ended up using Node much anywhere else, but other backend languages that could support MQ communication have been adopted. We could consider re-implementing the Shim in Java or Clojure, for example.
  • Local development is still challenging. CBP provides a “sandbox”-like MQ to test with, but the VPN, Docker, SQS, Shim, Gateway, routing and other considerations are still a bit tough to coordinate locally.
  • Despite the immense complexity to represent, CBP’s documentation is actually quite complete and impressive. Regardless, dealing with message formats like CATAIR when building/enhancing/debugging parts of the system are still quite challenging.

--

--