5 min read

Business processes automation with Camunda: fault-tolerant implementation of BPM schemes

BPMN

Comunda

In today’s digitally driven world, maintaining a competitive edge requires streamlined and efficient business processes. Automation stands out as a key solution to achieving this. According to Statista, the business process management (BPM) market is expected to reach a size of 14.4. billion U.S. dollars by 2025. The rising popularity and demand for BPM tools like Camunda, known for its flexibility and scalability, testify to this trend. As businesses seek reliable tools to optimize their operations, Camunda emerges as a forerunner, paving the way for innovative, fault-tolerant automation solutions in the industry.

What is Camunda?

In simple terms, Camunda is an open-source platform for workflow and decision automation that brings business users and software developers together. Through its robust set of tools and features, Camunda offers ways to design, implement, and optimize BPMN (Business Process Model and Notation) workflows, making business operations smoother and more transparent.

Camunda, Spring Boot & BPMN: understanding the concepts

Three key players have reshaped the business process management landscape: Camunda, Spring Boot, and BPMN. Each has carved out its niche, offering unique functionalities that address distinct facets of process management. However, when combined, they transform into an unparalleled powerhouse, capable of revolutionizing digital enterprise operations.

Camunda: This isn’t just another tool in the vast BPM toolbox; it’s a standout. As a robust open-source platform, Camunda specializes in workflow and decision automation. Its primary objective? To seamlessly fuse the worlds of business strategists and software developers. By doing so, it ensures that the conceptualization, design, and implementation of business processes are efficient, transparent, and cohesive.

Spring Boot: Spring Boot takes the strengths of the Spring framework and elevates them. By offering a streamlined method to build standalone Java applications, it has become the go-to for developers wanting to minimize boilerplate code and dive straight into the heart of project-specific functionalities. Its power lies in its flexibility and its convention-over-configuration approach, which champions the idea of smart defaults. This approach allows developers to build scalable applications faster, ensuring timely delivery and consistent performance.

BPMN: If we were to personify BPMN, it would be the eloquent linguist of the business world. As a globally recognized standard, BPMN provides a visual vocabulary for drafting business processes, making them easily understandable to a wide range of stakeholders. This universal language ensures that the technical nuances of a process are decipherable by both the tech-savvy coder and the business strategist, fostering collaborative dialogues and more informed decision-making.

The synergy of Camunda’s automation capabilities, Spring Boot’s development ease, and BPMN’s standardized notation presents businesses with a dynamic trifecta. Together, they ensure that BPM schemes transition from mere theoretical constructs on paper to actionable, real-world implementations. The end goal? To cultivate business processes that are agile, resilient, and perfectly aligned with the evolving demands of the contemporary digital enterprise landscape.

BPMN basic components

For those unfamiliar with BPMN, understanding its essential components is crucial. These components form the foundation of any BPMN diagram.

Events

These signify something that happens during a process. Events can start, interrupt, or end a flow, and they are often represented as circles.

Gateways

Gateways handle decision-making within the process. Based on conditions, they control the flow of the process, usually depicted as diamonds.

Activities

Activities represent work being done. They can be tasks or sub-processes and are displayed as rounded rectangles.

Connecting objects

These elements, including sequence flows, message flows, and associations, illustrate the sequence of processes and the flow of messages.

Swimlanes

These categorize BPMN elements either by role (e.g., manager, accountant) or system (e.g., an ERP system).

Artifacts

These offer additional information about the process. Common artifacts include data objects, groups, and annotations.

Pros and cons of Camunda

As with any technological solution, Camunda brings a mix of advantages and challenges. Here’s a comprehensive look into its pros and cons.

Pros:

Flexible and easy integration with Java applications through Spring Boot.
An intuitive modeler interface for BPMN 2.0.
Provides detailed analytics on process metrics.

Cons:

Might have a steeper learning curve for non-technical users.
It’s a strong starting point, but think of it as just the base – while Camunda is a powerful workflow engine, you’ll still need further software development.

Streamlining overburdened BPMN diagrams

Harsh reality

Camunda is designed to make developers and analysts speak the same language, but often, reality intervenes.

Microservices fail, users enter incorrect data, anything can happen. In this case, the beautiful analytical diagram begins to be embellished with various error handlers, loggers, and alternative pathways. The analyst designs a beautiful, succinct, and comprehensible scheme. It has a few delegates and provides logical paths for the process flow under various circumstances. This is how a provisional scheme looks when it gets into the hands of a developer:

However, there are downsides. Such a scheme might contain a brief task description, like “check the client”, which implies several stages, decision-making based on each outcome, and compiling the derived decisions into a single result, possibly with the subsequent transfer of this result to external systems.

It’s clear that at this point, error handlers, loggers, and technical service elements appear on the diagram or in the code. This way, one “analytical” task in the Java implementation becomes voluminous and complex, or the number of steps on the scheme increases, each being accompanied by handlers and alternative pathways. As a result, the scheme quickly becomes convoluted, difficult for further support and modification, and adding new functionality might entail restructuring a vast area of both the scheme and the delegate code. In essence, it contains a massive number of identical elements.

Here’s how the previous scheme might look in a real deployment:

Clearly, the scheme has expanded and become more cumbersome. But there are advantages: all tasks have become atomic, and branches of behavior in the event of errors have emerged.

Realizing the problem

If we try to separate and encapsulate the scheme and the business logic of the Java code, we can do the following:

Avoid duplicating similar elements on the scheme.
Use a universal and reusable implementation of delegates in the Java code.
Optimize and accelerate the flow of the process.
Simplify the handling of technical errors and establish a process behavior logic when they arise – almost without the involvement of Java code. This will significantly simplify debugging and manual analysis of failed processes that are in an incident.
Drastically reduce the number of processes that “fall” into incidents when technical exceptions arise.
Lay a solid foundation for further development.

To make it easier to work with the product it’s better to decompose the scheme into atomic tasks, reduce the total volume of scheme elements, decrease the number of service handlers, reduce the volume of Java code of each delegate, and reuse universal delegates, conducting instantaneous refactoring when necessary. All of this automatically implied writing unit tests for all delegates and the main paths of the process.

Decomposition and atomization

If you look closely at the process application and analyze its nodes, you can see many repetitive functions: queries to external systems, logging, error handling, sending callbacks, etc. In other words, one needs to critically assess the process application, identify objects from it that can be easily encapsulated… But into what? Into Java code? No, that would be illogical, because in this case, the scheme would be closely tied to its Java implementation. In this situation, it makes sense to consider process pools.

A process pool is a scheme of a separate process that will have its own context. It is noteworthy that it’s convenient to extract atomic pieces of functionality from the main process into such pools, as well as all repetitive moments: sending notifications, requests to external systems, etc.

There can be many process pools, and it would be logical to group them thematically. For example, queries to a particular microservice, alerting, sending various notifications. Interaction between such pools can be easily set up using Camunda messaging. Each time such a pool is called in the Camunda engine, a certain message is passed containing a conditional header and the parent process number for returning a response, as well as a set of necessary data for the operation of this specific small pool.

Here we see how the main process (bottom) sends a message to which the starter of another pool is subscribed. When the event occurs, the second pool starts a new instance of the process, makes a request, and sends a response back to the main process, after which it successfully completes. During this time, the main process waits for the response event from the external pool to which it sent a request. When the message arrives, the process continues. If there is no response within the specified time interval, the process understands that the external computation is unavailable or has failed, and terminates.

What this offers:

Opportunity for code reuse. If you need to call the same code several times under different conditions throughout the process, you can simply create specific messages and call the corresponding atomic process pools;
Encapsulation of the software implementation scheme from its business representation. It doesn’t matter how the main scheme will be redesigned, or which paths the process will take. All interactions have already been moved to separate minor processes, which gives complete flexibility: just form a request and wait for a response.
The number and likelihood of the main process crashes are significantly reduced. Before such a division, the process was in an uncertainty of 4 states:
The response has arrived.
The response didn’t come because the external microservice crashed.
The response didn’t come because the main process crashed while sending the request.
The response didn’t come because a timeout was exceeded.

With this division, the process is always in a strictly single state: the response either came, or the process waited and ended. For business, it matters how exactly the process ended: whether it was an error or not. But this will be a proper conclusion, not an incident. This is important because a process not stuck in an incident doesn’t “consume” resources, and errors can be easily logged, statistics gathered, alerts set up, and analyzed.

It no longer matters what happens with the minor processes. They can do whatever they want: crash, run… Only the result is important: the response from the external resource. And even then, not always, because the main process shouldn’t guarantee the functionality of external systems. For instance, there might be no sense in the process waiting for a response from the notification microservice since there could be no response at all.
The complexity of the main process is greatly reduced. Complex logic can be distributed among separate small pools, which are easier to debug. For example, client verification might look something like this:

Here, we can see that in the external pool, multiple tasks are called simultaneously. Let’s delve deeper into this point.

Parallelization of process computations

Camunda allows for the concurrent execution of branches of process computations. For this purpose, there’s a special gateway called the Parallel Gateway, using which the flow can be divided into parallels or to merge multiple parallel computations into one stream. It’s clear that to accelerate the flow of a process, it would be advantageous to delegate certain tasks to parallel threads. If the logic is independent, it can be executed in parallel, for example, making simultaneous requests to external systems and waiting for responses from all of them at once:

Each time at such a gateway, there will be overhead costs associated with creating new threads for task division and merging the results. One may encounter various locking exceptions, and, of course, it’s not always necessary or justified to always act this way, especially without testing, but the benefits are evident.

With sequential execution, the total execution time equals the sum of the execution times of each operation. In contrast, with parallel execution, it equates to the execution time of the longest operation. Given the conditions of non-instant responses from external sources, retries, and failures, this difference is far from insignificant. Another undeniable advantage is the form of “free retries”, i.e., while the longest request is being executed, the other tasks hypothetically have the opportunity to fail several times and attempt to redo their actions without impacting the overall task execution time.

Exceptions and repetition attempts

Broke? It happens. The out-of-the-box version of Camunda has the capability to retry a failed transaction. By “transaction”, we mean Camunda’s internal mechanism for executing delegate code. The start of a transaction can be, for example, the “async before” or “async after” marker on a task in the modeler. When the engine encounters this marker, it commits its information to the database and starts a new asynchronous thread. This is important. To delve deeper, by “transaction”, we mean the execution section between the calls to the .complete() method in TaskService, followed by recording information to the database. These transactions, like others, are atomic.

When a technical exception arises, i.e., any non-business error, for example, dividing by zero and forgetting a null check, the transaction does a rollback and tries to start again. By default, it does this three times consecutively without any pauses. A retry attempt starts when a regular exception arises, which in the BPMN world is called a technical exception, not a BpmnError. An arising BpmnError stops the process without any retry attempts. Imagine how this enhances the resilience of the process.

It makes sense to maximize this feature. Therefore, on every delegate that requests an external system, these markers are set, specifying the number of retries and the pause between them, and in the delegate code separates logic for when the process should be terminated and when it shouldn’t. It gives full control over the exception handling and retry mechanisms. As a result, the process tries to redo the failed task several times, and only after a series of failures it produces an error.

Perhaps, the biggest challenge is the handling of technical exceptions and BPMN-related errors, as well as designing the logic of their handling for a continuous flow of the process. We’ve already discussed some errors related to handling responses from external sources when talking about dividing into process pools. We’d like to remind you that the very call was encapsulated into a separate mini-process, and the main one either received a response and proceeded further or, due to a timeout, followed the “I didn’t receive a response” route.

Now, let’s look at that very small process:

Do you see the frame? It’s a subprocess. It contains specific tasks and captures errors thrown by internal tasks. Moreover, on such frames, the job executor is capable of creating a job for the timer, which sets the execution time for everything inside the subprocess.

How does it work? The execution flow reaches the subprocess, creates parallel timer processing, and waits either for the completion of what’s inside or, if the timer runs out first, it will follow the timer route. If an exception is thrown during the process, which the subprocess frame captures, the process will stop its execution on the current branch and follow the error branch.

It’s also evident that there’s an option to create response dispatches for critical requests. Note that error capturing works only for BpmnError with a specific code. Therefore, technically, it’s essential to catch any exception and throw a BpmnError with the required code, which works for the ErrorBoundaryEvent.

Error handling in the main process works similarly. From several tasks, logical units are singled out that can be placed in a subprocess frame, with a listener set up for a specific error code. But there are two nuances here. The first is that creating multiple identical branches with error handling, differing only in code, is inconvenient. If the error handling strategy changes or, for example, logging, then many delegates on the scheme would need to be redesigned, which isn’t desirable. Therefore, one might consider looking into event-based subprocesses.

At its core, this is a separate subprocess of the process pool, which starts only when a certain event it’s subscribed to occurs. For instance, if you subscribe such a subprocess to the BpmnError event with a code, say, MyCustomBusinessError, then when this event occurs, the handler will be triggered, and upon its completion, the process will end correctly. Yes, it didn’t end in success, but it ended correctly. In these subprocesses, you can also implement different handling logic for the same event depending on external conditions, for example, optionally notifying about an application error when the process passes a conditional point.

The second nuance is much more complicated. In real life, the life cycle of each process is likely divided into two business stages: before lead generation and after it. If an error occurred before the data was formatted into a lead, the process could probably just be terminated, notifying about the difficulties encountered. Once the lead is generated, this is no longer possible.

We also don’t recommend ending processes if legal obligations arise during the process, for instance, if a contract is signed. How do we handle such errors? Some technical errors, like those associated with the unavailability of external services, are handled by automatic retries within a pre-agreed timeout. But what if the process crashed, retries have passed, but the hypothetical external microservice is still down?

Manual optimization

We come to the concept of manual resolution or, also known as, compensations.

How does it work? Any errors are caught, delegates are given the opportunity to retry if necessary, and if luck still doesn’t favor them, the process goes into an error state, but with the appropriate code, for instance, COMPENSATION_ERROR. This code is caught by another event-based subprocess, which processes, logs, notifies, and importantly, cannot fail unexpectedly. Only where it’s designed to, it throws an uncatchable technical exception and crashes into an incident.

Why do it this way? For monitoring, you can use EXCAMAD – an external admin panel for Camunda, an analogue to Cockpit, with powerful features. It highlights processes in incidents in red. These processes can be modified or restarted from the desired point. For instance, you can place the necessary variable value in the context and restart the process from the point right after the problematic one. This is convenient, straightforward, and allows for manual problem resolution with minimal effort.

Business process automation with Camunda: real-life examples

Renowned for its open-source platform and user-friendly interface, Camunda has empowered numerous enterprises to optimize their workflows. Let’s explore a few real-life examples.

Banking and Finance

Münchener Hypothekenbank eG, an independent real estate bank, transitioned to using the Camunda workflow engine to enhance and automate internal processes, specifically mail handling and inter-departmental loan application coordination. Previously, their system was rigid, lacked flexibility, and led to complexities that increased error rates.

In their move towards a Java-based microservice architecture, they selected Camunda based on internal recommendations and worked closely with WDW Consulting Group. Some benefits they obtained immediately from Camunda were off-the-shelf functions, while others needed more development. This transition resulted in a centralized task list used by all staff and provided flexibility to maintain individual processes without affecting others.

The most notable outcome has been a significant improvement in the processing speed of loan applications. This benefits both staff and end customers. As a testament to its success, other departments are now looking to adopt Camunda, and the bank has even hired more developers to further support its implementation.

Insurance

SV Informatik, a subsidiary of SV SparkassenVersicherung, specializes in custom IT solutions for insurance firms. They incorporated Camunda to automate various processes across departments, leading to notable time savings and improved customer response times. The company adopted Camunda in 2018 as a solution to their search for an effective business process modeling tool, with a focus on improving processes and enhancing collaboration between IT and other departments.

Since its implementation, Camunda has automated tasks like motor vehicle insurance policy cancellations and policy document requests. A notable achievement was the 80% automated processing of online storm damage reports. This proved especially valuable during the 2021 floods and storms in Germany. Tools like Camunda Optimize and Camunda Cockpit facilitate process monitoring and optimization.

Hospitality

In 2020, the SV Group, operating in Germany, Switzerland, and Austria, launched a disruptive digital platform called ‘likeMagic’ with Camunda’s assistance. This platform provided a seamless guest experience, from booking to check-out, with outcomes including a 95% self-check-in/out rate and a 9 out of 10 guest happiness score. The innovation reduced staffing needs and integrated platforms like Airbnb seamlessly. Recognizing its potential, SV Group offered ‘likeMagic’ to other hospitality providers. By 2023, they expanded from 2 to over 30 customers in the DACH region, with plans for a broader European reach and targeting 15,000 rooms by year-end.

Wrapping up

Camunda’s transformative potential lies not just in its core functionalities but in its ability to redefine business operations at a fundamental level. Combined with Spring Boot, it opens a doorway to seamless integrations and enhanced scalability. Understanding the nuts and bolts of BPMN is paramount to leveraging Camunda’s full potential. As businesses evolve in this digital age, tools like Camunda stand out, offering dynamic solutions that can pivot and adapt to ever-changing needs. It’s not just about automating processes, it’s about innovating workflows, enhancing efficiency, and driving tangible results that make a difference. Embrace the power of Camunda, and let your business soar to new horizons.

Dmitry Nazarevich CTO

Date: Jan 17, 2024

Rate this article:

★★★★★ 4/5

4.8/5 (45 reviews)

Contact us

What happens next?

Having received and processed your request, we will get back to you shortly to detail your project needs and sign an NDA to ensure the confidentiality of information.

After examining requirements, our analysts and developers devise a project proposal with the scope of works, team size, time, and cost estimates.

We arrange a meeting with you to discuss the offer and come to an agreement.

We sign a contract and start working on your project as quickly as possible.