Efficiently handle Data and Integrations in Spring Boot

With great data comes great repositories…

Mohammed Atif
Engineering@Zemoso

--

The moment we talk about Spring Boot, first thing that comes to our mind is Controller-Service-DAO. But backend applications are not built just to fetch data from Database and serve it to Front End. One of its core responsibility also involves gathering data from several sources along with Database and performing some processing on top of it (and then storing the processed data back to those sources as required).

In this article I will walk through some of the common coding mistakes that is done while dealing with external data sources and how we can avoid them efficiently.

The Data Web

A simple Data Flow diagram

Consider the above diagram, our Spring Boot Application is fetching data from multiple sources.

  1. Database: Usual SQL or NoSQL database
  2. IDP: Identity Provider, usually to store and fetch user profile information after Authentication
  3. Rule: A simple rule engine server to fetch Business Rules
  4. Payment Data Provider: In case integrated with Payment Facilitator
  5. Domain Data Providers: Our application might be integrated with some third party Applications or Servers that perform dedicated domain specific heavy weight operations

In most of the Applications the data web can get real complex and Application might be fetching and storing data from multiple datasources.

Common Mistakes

source: https://giphy.com/

Fetching and Storing the data in Service Layer

Many developers end up writing the data fetching logic from third party sources in the service layer. That sounds good most of the time but has some major drawbacks

  • Tight Coupling: When we write the integrations in Service layer, we unknowingly end up having a tight coupling to the Integration or Data Provider
    Suppose we have a payment integration with Paypal and fetching the payment details in 5 different services. Now due to some policy change or pricing change you have decided to use Stripe, that means now you have to change the payment fetching logic in 5 different classes. And the worst part, this logic might not be that straight forward.

Above example might sound debatable, but don’t worry, I will elaborate on it in the solutions part

  • Scattered code: As the code size increases, integrations end up becoming more and more unmanageable. With the integrations and data fetching code all across the place, it becomes really difficult to track and manage the code (Even in a micro service architecture).
  • Migration Hell: One common situation that happens with most of the integrations when we rely solely on their SDK is library updates and deprecations. Say we are using AWS Java SDK to communicate with AWS to fetch and store files to S3 and then AWS decides to upgrade the SDK to add some features and to deprecate and remove others. If this AWS logic is scattered across multiple files, searching and upgrading them becomes a mess.
  • And many more issues…

Not using Interface-Implementation pattern

While many developers smartly escape the above mentioned issue and by following single responsibility principle they extract out the Integrations and Data Fetching logic to different class, many end up writing a code directly with a class instead of interface. In most of the cases it works well, but it still can be a bad design and cause multiple code management issues going forward. Some common problems of using classes directly can be

  • Pseudo hard coupling: While extracting out the Integration specific code to a class can help to loosely couple your code to some extent, but it doesn’t eliminate it completely.
  • Ad Hoc Changes: Many developers end up making work arounds and gradually deviate from the Single Responsibility making the code harder to manage.

Directly using the Integration SDK

Many developers end up using the SDKs directly that were provided by the Integrations or Data Source provider. That sounds really convenient but turns out to be a terrible decision in no time. Doing this is the one of the worst mistakes as it adds up all the issues I listed above

Directly using Exceptions and Objects provided by the SDK

One most ignored mistake that many developers end up doing is propagating the Exceptions and Objects that were provided by the SDK. This small mistake spoil all the efforts we have done to avoid other listed mistakes. Consider this example

import com.stripe.Payment;
import com.stripe.PaymentException;
import java.util.Optional;public interface PaymentDataRepository {
Optional<Payment> getPaymentDetail(String paymentId) throws PaymentException;
}

In above code, though we have created a separate interface we ended up adding dependency on the Objects (PaymentIntent) and Exception (PaymentException) which was provided by the Integration SDK (Stripe). This small mistake more or less kills the whole effort of making the code loosely coupled and manage the data from third party providers and as side effect, in case of changing the SDK or in case of any deprecations, you will have to change these references all across the code.

This issue is beyond the scope of this article

Not using different Data and DTO objects

Considering we are not using GraphQL, most of the time our Controllers are designed to return the JSON that serves a specific purpose on the front end.

While most of the organisations are moving to Resource Oriented API design, where each API is designed to serve a specific resource irrespective of what front end needs, and then orchestrating the data in intermediate layer like GraphQL, many still prefer to design the APIs dedicated to directly serve Front End and in such cases defining different DTO objects becomes really important.

One major problem that arises by not splitting DTO and Data Object is that any changes to DataSource or Database Schema will end up causing multiple work arounds just to make keep the application running.

Apart from the above mentioned issues, writing test cases also becomes difficult in case Integration is not correctly.

Handle complex Integrations and Data Providers

Now since we have seen some relatable mistakes that we might end up doing in our Spring Boot Application, let us see how we can fix them and make our code much more cleaner and manageable.

Bonus Panel: Below provided techniques are not just limited to the Data Provider from Data Source or Integrations but can be extended to various other use cases too. But let’s not get distracted for rest of the Article 😜

Let us consider a simple use case

  1. You are building an Application similar to Coursera or O’Reilly
  2. Your end user pays you to subscribe to books or courses
  3. Your application then provides the books or courses to the users based on their subscription
  4. You might also want to show the preview of the books to guests and subscribed users

While this whole application is much more complex, let us just focus on the payment part and serving the books to the subscribed users and guests.

Let us split the whole scenario in two smaller parts

  1. Handling Payment
  2. Serving the Books from Database and S3 repository

Handling Payment

Class diagram for Payment Flow

As you would have noticed, I have just added a StripeWrapper class between StripeSDK and StripeService. This looks like a small change and at first glance it sounds like an unnecessary change. But let’s have a look at Stripe SDK first.

Credits: https://stripe.com/docs/api/authentication

Stripe SDK comes with Static class methods, i.e. you cannot inject Stripe Dependency to your service, instead you have to call the methods directly when and where required. This can lead to tight coupling in code and mocking Stripe during tests can become a nightmare.

So wrapping these static calls within a wrapper can help us add the Stripe SDK as an injected dependency to your service. And writing test cases becomes much more convenient.

Not only tests, but in case of SDK updates you will have all your Stripe Calls at one place and managing the migrations becomes easier even without touching your actual business logic in your Service class.

Many SDK like Firebase, DataLoggers, etc. use static methods to perform some operations and introducing Wrappers on top of them can make your code much more manageable.

Wait!!! Aren’t we talking about fetching and managing data? Where is that in above example?

Let us look closer to Stripe SDK

Charge charge = Charge.retrieve(
"ch_random_charge,
requestOptions,
);

Stripe already has methods that internally handles the data fetching and storing operations. So the wrapper should be self sufficient and can act as the data repository. But depending on the amount of operations and applying Segregation Principle, you can further split it into multiple smaller wrappers. In the end, target must be a code with Single Responsibility and should be easy to manage.

I hope this solves the controversy that I raised while talking about the common mistakes.

Serving the Books

Class diagram for Book Information Flow

Here you can notice that I have just introduced a new Repository with its implementation but there are no Wrappers on AWS SDK. Check the below example, AWS SDK uses builder pattern to create the instance, which can be externalised in a configuration class and instance can be injected as a Bean

Credits: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/creating-clients.html

Well that sounds good. But why do we have different JPA repository and S3 Repository and an intermediate layer?

Let us break down this Hypothetical Problem into smaller units

  1. JPA Repository serves the book preview like Name, Author, Description to both subscribed and guest users
  2. S3 Repository serves the whole book to subscribed users
  3. BooksSubscriptionRepository orchestrates between the two to provide respective content to the BooksService

OK, first two statements sound straight forward but what does orchestrating data between multiple repositories mean?

Sample Interface-Implementation diagram

Consider the above data model. As we can see, there is a single Data provider and it has two different implementations. Each of the implementations again has different sources to provide data, it can either be a cloud provider, database, message queue or cache.

Each of them may or may not contain the same data. In case they have same data they might be used for serving different load of data. Possibilities are endless, but as a service who needs data all it cares about getting the data no matter how. And many times due to various reasons you might end up switching the sources from one type to another and in that case rest of the Application must not be affected. So data orchestrator internally picks up the right source to provide the data in best possible way without actually affecting the logic that utilises the data.

In our example, our Book Data Repository switched between S3 and Database to serve preview or full book as required by the caller. in future if you decide to serve the preview too from S3, then all you have to do is to switch the source in data provider only.

Summary

  1. Always try to separate out the data fetching logic and data processing logic (business logic)
  2. In case Integrations come with an SDK that uses static methods to perform certain operations then write custom wrappers on top of it.
  3. Use different DTO and Model Objects
  4. Never propagate the Exceptions and Objects from Integrations to higher layers until and unless absolutely required
  5. Never overload service with business logic and data management logic
  6. Write data orchestrating repositories or adapters whenever required, but do not over do it at the same time

Hope this article helped in designing the code to streamline data flow into your Application. Happy Coding!

Sample Project: Coming Soon…

Disclaimer

Above mentioned example is purely designed for demonstration purpose. Though it is good enough as an architecture, it might not be a good reference for Stripe or AWS integration into your Application.

Spend sufficient time in understanding different available design patterns, as different problem required different solution and above mentioned approach might not work with all the use cases

Check out https://medium.com/engineering-zemoso to find many more interesting Software Engineering articles spread across multiple domains.

Do share the feedback and recommendations to make this article better for future readers.

--

--