Data pipelines with Apache Kafka: 36% increase in decision-making accuracy

Using Apache Kafka data streaming, Innowise provided a smooth data pipeline for informed decision-making and analytics.

Customer

Industry

Region

Client since

2023

Our client is a multinational corporation that specializes in manufacturing a diverse range of passenger and commercial vehicles, motorcycles, engines, and turbomachinery.

Detailed information about the client cannot be disclosed under the provisions of the NDA.

Challenge

Operational inefficiencies and a lack of insight into business workflows

The automotive manufacturer, a global entity with branches and dealer centers across continents, faced a significant data management dilemma. Various units within the company operated independently, resulting in inefficiencies and a lack of insight into operations, sales, project management, and more.

Multiple data sources led to duplicate efforts, inconsistent data quality, and a significant drain on resources as teams in different locations struggled to reconcile information. This fragmentation hindered the manufacturer’s ability to make informed, strategic decisions swiftly and effectively.

In addition, the client struggled with accessing real-time data needed for strategic decision-making. Data sharing and processing delays resulted in missed opportunities and belated responses to market demands as market trends and consumer preferences evolve rapidly.

The client sought a comprehensive solution to unify disparate data sources into a cohesive system and ensure scalability to adapt to future business expansions.

Solution

Apache data pipeline to integrate disparate data sources into a single cohesive system

Innowise offered a transformative approach centered around integrating Apache Kafka to address the client’s challenges. Simply put, we turned the customer’s existing information flows into Kafka data streams to ensure uninterrupted data flow, real-time analytics, and comprehensive visualizations.

Kafka connector for codebeamer data source

Our initial task was to create an architecture to offload information from data sources and transmit it to Apache Kafka. First, we built a connector for Codebeamer, a comprehensive project management platform the client used for software development and collaboration. We chose Apache Kafka due to its exceptional ability to handle large-scale, high-throughput, and real-time data streams in a fault-tolerant, scalable, and distributed manner.

Initially, Innowise’s specialists analyzed Codebeamer’s API documentation comprehensively to identify the most efficient methods for extracting project data, including work items, changesets, and user activities. We also examined the authentication mechanism, data request limits, and the API’s return formats.

Based on the API analysis, we designed the connector architecture with a focus on modularity, scalability, and fault tolerance. Our software engineers utilized Java to code the connector, which was responsible for linking to Codebeamer’s API, fetching data, and writing it to a Kafka topic. We implemented a converter to transform the data from Codebeamer’s format to a Kafka-compatible format. This included mapping various data fields to Kafka’s key-value pairs and handling schema variations. Finally, our project team managed robust configuration, enabling users to dynamically specify API credentials, polling intervals, and target Kafka topics.

In the first stage, the connector polled Codebeamer’s API to fetch new and updated data at configurable intervals. Then, it transformed the data into a Kafka-compatible format, ensuring each piece of information is represented as a discrete event. We utilized batch processing capabilities to efficiently handle large volumes of data without overwhelming Codebeamer’s API or the Kafka cluster.

Kafka connector FTP data source

Also, we developed a custom Kafka connector for an FTP data source, a critical component for consolidating various files and formats, including JSON, XML, and CSV. The connector interfaced with the FTP server and efficiently monitored for new and updated files, extracting and transporting them into the Kafka ecosystem.

We implemented a robust file-watching mechanism to detect when new files are added or existing files are modified. We incorporated intelligent parsing logic that could automatically recognize and correctly process each file type to handle the diversity of file formats (JSON, XML, CSV). This was crucial for transforming the structured and semi-structured data within these files into a uniform format suitable for streaming through Kafka.

Technologies

Back-end

Java virtual machine 17, Kotlin, Spring

CI/CD

CI/CD on-premise

Process

Our project team followed a well-structured project course, with deliverables at the end of each stage to ensure alignment with the client’s goals. Our approach was grounded in the Scrum framework, facilitating flexibility, continuous improvement, and robust client engagement throughout the project.

Initially, our business analysts conducted workshops with the client to understand their data landscape, identified key data sources, and defined the Kafka integration scope. Based on that information, they mapped out a comprehensive project plan and a list of requirements for the Kafka connectors.

Upon gathering the requirements, our developers designed the architecture for the Kafka connectors. In turn, QA engineers performed extensive checkups, including unit, integration, and performance tests, to ensure connectors’ reliability and efficiency.

Finally, we deployed the connectors into the client’s environment, providing training sessions for the client’s team on managing and utilizing new data streaming solutions.

Throughout the project, communication with the client was a top priority. We utilized Slack for daily communication and Zoom for weekly check-ins and sprint reviews. Task tracking and project management were managed through Jira, enabling transparent visibility into project progress and accountability for all team members.

At present, our project team makes minor adjustments as needed. Besides, the client plans to approach us for more data streaming projects in the future.

Team

Product Owner

Solution Architect

Technical Lead

Back-End Developers

Results

44% boost in data accessibility after implementing data pipeline with Apache Kafka

Innowise developed a system of connectors that aggregates information from the customer’s data sources and transforms it into Apache Kafka data streams. By integrating disparate data sources into a unified, real-time streaming Apache data pipeline, we addressed the core challenges of data fragmentation, scalability, and integration. The automotive manufacturer now benefits from mitigated data silo, informed decision-making, and transparent analytics that foster business growth.

Our Kafka-based data streaming solution is built to scale so the client can grow quickly and add new data sources without compromising performance.

Project duration

November 2022 - Ongoing

36%

increase in decision-making accuracy

44%

boost in data accessibility

Related cases

Contact us

Book a call or fill out the form below and we’ll get back to you once we’ve processed your request.

Name

Company

Phone

Message

Send us a voice message

Attach documents

Upload file

You can attach 1 file up to 2MB. Valid file formats: pdf, jpg, jpeg, png.

By clicking Send, you consent to Innowise processing your personal data per our Privacy Policy to provide you with relevant information. By submitting your phone number, you agree that we may contact you via voice calls, SMS, and messaging apps. Calling, message, and data rates may apply.

You can also send us your request
to contact@innowise.com

What happens next?

Once we’ve received and processed your request, we’ll get back to you to detail your project needs and sign an NDA to ensure confidentiality.

After examining your wants, needs, and expectations, our team will devise a project proposal with the scope of work, team size, time, and cost estimates.

We’ll arrange a meeting with you to discuss the offer and nail down the details.

Finally, we’ll sign a contract and start working on your project right away.

Data pipelines with Apache Kafka: 36% increase in decision-making accuracy

Customer

Challenge

Operational inefficiencies and a lack of insight into business workflows

Solution

Apache data pipeline to integrate disparate data sources into a single cohesive system

Kafka connector for codebeamer data source

Kafka connector FTP data source

Technologies

Process

Team

Results

44% boost in data accessibility after implementing data pipeline with Apache Kafka

Related cases

Contact us

What happens next?

Need other services?

Subscribe to our newsletter

Subscribe
to our newsletter