Leganta Logо

Bogata w funkcje platforma do przetwarzania dokumentów dla banków i przedsiębiorstw

Building the core parsing module for Leganta’s contract management platform that breaks down complex legal documents into structured, searchable data and integrates AI for automated field classification and semantic content analysis (DORA / NIS2 ready).

airplane in the sky image
Pracownicy <50
Region Europe (Germany)
Klient od 2024

Opis klienta

Podsumuj artykuł za pomocą AI

LEGANTA® is a Germany-based technology company building a document management platform designed for organizations that handle large volumes of contracts, primarily financial institutions and enterprises. The product’s core idea is straightforward: instead of making people scroll through 60- or 80-page PDFs looking for what they need, the system converts those documents into structured, searchable objects that users can filter, update, and work with directly. Important applications are semantic DORA / NIS2 contract transformations.

Leganta came to Innowise to build the central piece of that product. This module is responsible for taking a raw contract PDF and breaking it down into semantic sections that their existing internal system could then process.

Quote icon

The Innowise team took ownership of a significant portion of the new product right from the start of our engagement. They have worked very closely with our technical lead to learn the current code base, assist in designing its architecture, and have been involved in making architectural decisions since day one of the project. Over the entire course of this collaboration, we have enjoyed good communication, with frequent daily standup meetings and regularly scheduled sync sessions.

Flyyo logo
Hugo Christian Rieß CEO, LEGANTA
Letter of recommendation, Page 1

Wyzwanie

Leganta needed a reliable, automated way to take a raw PDF contract and transform it into structured objects, so experts didn’t have to do it by hand. Building that module from scratch was the core challenge on this project.

  • Time-consuming manual processing. Employees previously read through massive contracts to extract specific entities manually. This manual routine slowed down operations and increased the risk of human error.
  • Information overload. Corporate agreements contain excessive amounts of text. Users require a method to isolate crucial data objects to prepare documents for ERP integrations or electronic signatures efficiently.
  • Legal compliance. Automated text modification presents severe legal risks. The system must preserve the exact original wording of legal clauses to prevent any misinterpretation or contractual disputes.
  • No database or parsing logic in place. The client had no existing foundation for contract parsing, but knew they wanted to use MongoDB. The project required setting up a database from scratch and building all the core logic on top of it to support the new functionality.
  • Unpredictable document formats. Corporate contracts come with varying styles, irregular layouts, and complex tables of contents. Leganta needed a reliable algorithm to extract text from these unpredictable PDF files precisely.
  • Cloud and on-premise deployment. Leganta required the platform to operate seamlessly as both a cloud-hosted solution and a local on-premise installation to satisfy various enterprise clients. The foundational architecture had to leverage versatile containerization tools such as Docker and Kubernetes to support these dual hosting environments from the start.

Rozwiązanie

To address these challenges, Innowise built the document parsing module from scratch. The work covered backend logic, the frontend interface, and deployment infrastructure, with the two developers splitting responsibilities across the full stack.

Document parsing and semantic segmentation

The first task was building the parsing engine. We started by integrating Apache POI to extract text content from uploaded PDF contracts, along with the formatting metadata embedded in each file. We used that metadata, heading styles, paragraph breaks, and font weights as the signals that drive the parsing logic.

  • Our team developed a custom segmentation algorithm that breaks the extracted text into semantic units: individual clauses, sections, and data fields that users can then view, edit, and work with directly.
  • We developed the segmentation rules and tested them against real contract samples until the outputs were consistent and meaningful. We store all parsed sections as structured objects in MongoDB.
  • On the frontend, we built a two-pane interface. We put the original PDF on the left so users always have the source document in view, and we built an editable table of parsed sections on the right. This way, users can compare the source against the extracted data at any point.
  • Our experts also extended an open-source PDF rendering library because the free version didn't handle certain edge cases, so we manually brought it up to the level of paid alternatives.
  • We also built a set of editing tools so users can correct the output where needed. They can merge sections that the algorithm split incorrectly, adjust titles, fill in fields, and change any part of the structure before saving. We designed the flow to be fast, since parsing accuracy depends on document quality, and users often need to make corrections.

Template system for recurring document types

Once the core parsing was working, we built a template system on top of it. The idea came from a practical observation: organizations that process large volumes of similar contracts, such as banks using standardized loan agreements, repeatedly encounter the same document structures.

  • We built a save-as-template function that lets users capture a fully structured and corrected document as a reusable pattern. When a new contract with a similar structure arrives, the system applies that pattern automatically during parsing.
  • For template-matched documents, accuracy on the first pass is substantially higher, and the time users spend on manual review drops accordingly.

AI integration for field classification

In parallel with our work, the client’s experts developed a GPT-based classification layer that sits on top of the parsed sections. Its job is to classify each section against the platform’s internal entity types.

  • Our responsibility was to ensure the parsed output fed into that layer cleanly. That said, we structured the sections to be consistently bounded and well-formed so the AI classification could work reliably on top of them.
  • We coordinated closely with the client's team on the handoff format between the two layers. The client's team built the AI layer on their end. And our job was to ensure the parsed sections fed into it cleanly.

Project and document management layer

Around the parsing engine, we built the full management layer that users interact with day to day.

  • Our team built the project structure, which lets users group related documents together under a single contract negotiation or deal. We also built the document upload and lifecycle flow, and the full CRUD layer for managing both projects and documents.
  • We set up H2 as a lightweight, self-contained store for credentials and roles, keeping it separate from the main document data in MongoDB.

Infrastruktura i wdrożenie

We wrote Dockerfiles for all services, configured Kubernetes deployments and services, set up ingress with TLS certificates, and built the CI pipeline on GitHub Actions to handle the build, image push, and publish steps.

  • We deployed the platform on infrastructure provided by Syseleven, Leganta's German cloud partner.
  • We structured the containerized setup to also support on-premise deployment at client sites, which Leganta requires for some of their enterprise customers.
Quote icon

The collaboration with the Leganta team worked well from the start. The client's tech lead was available, clear about what they needed, and open when we had a different take on something. We came in, got familiar with what was already there, and figured out the architecture together from that point. The scope was genuinely open-ended at the beginning, and the only hard requirement was MongoDB, so a lot of the technical decisions happened through ongoing discussion. That kind of working collaboration is easier when the other side knows their product well, and the Leganta team did. We've been on this project since early 2024, and the working rhythm has stayed consistent throughout.

Dmitry Nazarevich
Dmitry Nazarevich Dyrektor ds. technologii

Technologie

Backend

Java 17, Spring Boot

Frontend

Vue.js, Vuetify, TypeScript, Pinia

Database (main)

MongoDB

Database (auth)

H2

PDF processing

Apache POI

CI

GitHub Actions

Testowanie

Unit tests, integration tests (backend), Selenium (frontend)

Kontenery

Docker, Kubernetes

Zespół

Icon 1
Programista Back-End
Icon 1
Programista Full-Stack
zespół Innowise

Wyniki

Czas trwania projektu
February 2024 — 2025

The parsing module is live and in production. Leganta uses it as the entry point into their contract management workflow.

  • Contract structuring time cut from hours to seconds. Now the system produces an initial parsed structure in around 10 seconds. During demos, a full contract, reviewed, corrected where needed, and completely filled in, was ready within an hour. For documents that match an existing template, the initial parse is close to the final version with minimal correction required.
  • Templates make repetitive work faster each time. Once a contract has been structured and saved as a template, subsequent documents of the same type automatically reuse that structure. Organizations handling high volumes of similar agreements, with banks being the primary target, see the benefit compound across every contract processed.
  • Platform deployed and running in production. The platform supports both cloud infrastructure and on-premise deployment for enterprise clients who need it. The team has maintained a consistent bi-weekly release cycle since the project started.
  • Semantic transformation engine. LEGANTA® provides a semantic transformation capability that converts any document into freely selectable target structures. This enables precise alignment with customer‑specific objectives and seamless integration into existing IT landscapes. At its core, the engine interprets documents as semantic information spaces. It restructures and enriches them so that organizations can embed the resulting data directly into their operational, compliance, risk, or analytic systems, without manual remodeling.
  • Seamless system integration. The solution flawlessly synergizes with the client's existing authentication and other modules, and enables smooth data exports to other internal systems.

The team delivered everything that was planned and shipped on a consistent bi-weekly release cycle. The parsing module went live, and Leganta started using it in their day-to-day contract workflow.

Spis treści

Build and improve your platform with Innowise

    Skontaktuj się z nami

    Umów się na rozmowę lub wypełnij poniższy formularz, a my odezwiemy się do Ciebie po przetworzeniu Twojego zgłoszenia.

    Wyślij nam wiadomość głosową
    Załącz dokumenty
    Prześlij plik

    Można załączyć 1 plik o rozmiarze do 2 MB. Prawidłowe formaty plików: pdf, jpg, jpeg, png.

    Klikając "Wyślij", wyrażasz zgodę na przetwarzanie Twoich danych osobowych przez Innowise zgodnie z naszą Politykę Prywatności w celu przekazania Ci odpowiednich informacji. Podając numer telefonu, zgadzasz się na kontakt za pośrednictwem połączeń głosowych, SMS-ów lub komunikatorów. Mogą obowiązywać opłaty za połączenia, wiadomości i transmisję danych.

    Możesz także wysłać swoje zapytanie
    na contact@innowise.com
    Co dalej?
    1

    Po otrzymaniu i przetworzeniu zgłoszenia skontaktujemy się z Tobą, aby szczegółowo opisać projekt i podpisać umowę NDA w celu zapewnienia poufności.

    2

    Po zapoznaniu się z Twoimi potrzebami i oczekiwaniami, nasz zespół opracuje projekt wraz z zakresem prac, wielkością zespołu, wymaganym czasem i szacunkowymi kosztami.

    3

    Zorganizujemy spotkanie w celu omówienia oferty i ustalenia szczegółów.

    4

    Na koniec podpiszemy umowę, błyskawicznie rozpoczynając pracę nad projektem.

    arrow