Leganta Logо

Feature-rich document processing platform for banks and enterprises

Building the core parsing module for Leganta’s contract management platform that breaks down complex legal documents into structured, searchable data and integrates AI for automated field classification and semantic content analysis (DORA / NIS2 ready).

airplane in the sky image
Anställda <50
Region Europe (Germany)
Kund sedan 2024

Översikt över kunder

Sammanfatta artikeln med AI

LEGANTA® is a Germany-based technology company building a document management platform designed for organizations that handle large volumes of contracts, primarily financial institutions and enterprises. The product’s core idea is straightforward: instead of making people scroll through 60- or 80-page PDFs looking for what they need, the system converts those documents into structured, searchable objects that users can filter, update, and work with directly. Important applications are semantic DORA / NIS2 contract transformations.

Leganta came to Innowise to build the central piece of that product. This module is responsible for taking a raw contract PDF and breaking it down into semantic sections that their existing internal system could then process.

Quote icon

The Innowise team took ownership of a significant portion of the new product right from the start of our engagement. They have worked very closely with our technical lead to learn the current code base, assist in designing its architecture, and have been involved in making architectural decisions since day one of the project. Over the entire course of this collaboration, we have enjoyed good communication, with frequent daily standup meetings and regularly scheduled sync sessions.

Flyyo logo
Hugo Christian Rieß CEO, LEGANTA
Letter of recommendation, Page 1

Utmaning

Leganta needed a reliable, automated way to take a raw PDF contract and transform it into structured objects, so experts didn’t have to do it by hand. Building that module from scratch was the core challenge on this project.

  • Time-consuming manual processing. Employees previously read through massive contracts to extract specific entities manually. This manual routine slowed down operations and increased the risk of human error.
  • Information overload. Corporate agreements contain excessive amounts of text. Users require a method to isolate crucial data objects to prepare documents for ERP integrations or electronic signatures efficiently.
  • Legal compliance. Automated text modification presents severe legal risks. The system must preserve the exact original wording of legal clauses to prevent any misinterpretation or contractual disputes.
  • No database or parsing logic in place. The client had no existing foundation for contract parsing, but knew they wanted to use MongoDB. The project required setting up a database from scratch and building all the core logic on top of it to support the new functionality.
  • Unpredictable document formats. Corporate contracts come with varying styles, irregular layouts, and complex tables of contents. Leganta needed a reliable algorithm to extract text from these unpredictable PDF files precisely.
  • Cloud and on-premise deployment. Leganta required the platform to operate seamlessly as both a cloud-hosted solution and a local on-premise installation to satisfy various enterprise clients. The foundational architecture had to leverage versatile containerization tools such as Docker and Kubernetes to support these dual hosting environments from the start.

Lösning

To address these challenges, Innowise built the document parsing module from scratch. The work covered backend logic, the frontend interface, and deployment infrastructure, with the two developers splitting responsibilities across the full stack.

Document parsing and semantic segmentation

The first task was building the parsing engine. We started by integrating Apache POI to extract text content from uploaded PDF contracts, along with the formatting metadata embedded in each file. We used that metadata, heading styles, paragraph breaks, and font weights as the signals that drive the parsing logic.

  • Our team developed a custom segmentation algorithm that breaks the extracted text into semantic units: individual clauses, sections, and data fields that users can then view, edit, and work with directly.
  • We developed the segmentation rules and tested them against real contract samples until the outputs were consistent and meaningful. We store all parsed sections as structured objects in MongoDB.
  • On the frontend, we built a two-pane interface. We put the original PDF on the left so users always have the source document in view, and we built an editable table of parsed sections on the right. This way, users can compare the source against the extracted data at any point.
  • Our experts also extended an open-source PDF rendering library because the free version didn't handle certain edge cases, so we manually brought it up to the level of paid alternatives.
  • We also built a set of editing tools so users can correct the output where needed. They can merge sections that the algorithm split incorrectly, adjust titles, fill in fields, and change any part of the structure before saving. We designed the flow to be fast, since parsing accuracy depends on document quality, and users often need to make corrections.

Template system for recurring document types

Once the core parsing was working, we built a template system on top of it. The idea came from a practical observation: organizations that process large volumes of similar contracts, such as banks using standardized loan agreements, repeatedly encounter the same document structures.

  • We built a save-as-template function that lets users capture a fully structured and corrected document as a reusable pattern. When a new contract with a similar structure arrives, the system applies that pattern automatically during parsing.
  • For template-matched documents, accuracy on the first pass is substantially higher, and the time users spend on manual review drops accordingly.

AI integration for field classification

In parallel with our work, the client’s experts developed a GPT-based classification layer that sits on top of the parsed sections. Its job is to classify each section against the platform’s internal entity types.

  • Our responsibility was to ensure the parsed output fed into that layer cleanly. That said, we structured the sections to be consistently bounded and well-formed so the AI classification could work reliably on top of them.
  • We coordinated closely with the client's team on the handoff format between the two layers. The client's team built the AI layer on their end. And our job was to ensure the parsed sections fed into it cleanly.

Project and document management layer

Around the parsing engine, we built the full management layer that users interact with day to day.

  • Our team built the project structure, which lets users group related documents together under a single contract negotiation or deal. We also built the document upload and lifecycle flow, and the full CRUD layer for managing both projects and documents.
  • We set up H2 as a lightweight, self-contained store for credentials and roles, keeping it separate from the main document data in MongoDB.

Infrastruktur och driftsättning

We wrote Dockerfiles for all services, configured Kubernetes deployments and services, set up ingress with TLS certificates, and built the CI pipeline on GitHub Actions to handle the build, image push, and publish steps.

  • We deployed the platform on infrastructure provided by Syseleven, Leganta's German cloud partner.
  • We structured the containerized setup to also support on-premise deployment at client sites, which Leganta requires for some of their enterprise customers.
Quote icon

The collaboration with the Leganta team worked well from the start. The client's tech lead was available, clear about what they needed, and open when we had a different take on something. We came in, got familiar with what was already there, and figured out the architecture together from that point. The scope was genuinely open-ended at the beginning, and the only hard requirement was MongoDB, so a lot of the technical decisions happened through ongoing discussion. That kind of working collaboration is easier when the other side knows their product well, and the Leganta team did. We've been on this project since early 2024, and the working rhythm has stayed consistent throughout.

Dmitry Nazarevich
Dmitry Nazarevich Teknikchef

Teknik

Backend

Java 17, Spring Boot

Frontend

Vue.js, Vuetify, TypeScript, Pinia

Database (main)

MongoDB

Database (auth)

H2

PDF processing

Apache POI

CI

GitHub Actions

Testning

Unit tests, integration tests (backend), Selenium (frontend)

Container

Docker, Kubernetes

Team

Icon 1
Back-End-utvecklare
Icon 1
Full-Stack Utvecklare
Innowise team

Resultat

Projektets löptid
February 2024 — 2025

The parsing module is live and in production. Leganta uses it as the entry point into their contract management workflow.

  • Contract structuring time cut from hours to seconds. Now the system produces an initial parsed structure in around 10 seconds. During demos, a full contract, reviewed, corrected where needed, and completely filled in, was ready within an hour. For documents that match an existing template, the initial parse is close to the final version with minimal correction required.
  • Templates make repetitive work faster each time. Once a contract has been structured and saved as a template, subsequent documents of the same type automatically reuse that structure. Organizations handling high volumes of similar agreements, with banks being the primary target, see the benefit compound across every contract processed.
  • Platform deployed and running in production. The platform supports both cloud infrastructure and on-premise deployment for enterprise clients who need it. The team has maintained a consistent bi-weekly release cycle since the project started.
  • Semantic transformation engine. LEGANTA® provides a semantic transformation capability that converts any document into freely selectable target structures. This enables precise alignment with customer‑specific objectives and seamless integration into existing IT landscapes. At its core, the engine interprets documents as semantic information spaces. It restructures and enriches them so that organizations can embed the resulting data directly into their operational, compliance, risk, or analytic systems, without manual remodeling.
  • Seamless system integration. The solution flawlessly synergizes with the client's existing authentication and other modules, and enables smooth data exports to other internal systems.

The team delivered everything that was planned and shipped on a consistent bi-weekly release cycle. The parsing module went live, and Leganta started using it in their day-to-day contract workflow.

Innehållsförteckning

Build and improve your platform with Innowise

    Kontakta oss

    Boka ett samtal eller fyll i formuläret nedan så återkommer vi till dig när vi har behandlat din förfrågan.

    Skicka ett röstmeddelande till oss
    Bifoga dokument
    Ladda upp filen

    Du kan bifoga 1 fil på upp till 2 MB. Giltiga filformat: pdf, jpg, jpeg, png.

    Genom att klicka på Skicka samtycker du till att Innowise behandlar dina personuppgifter enligt våra Integritetspolicy för att förse dig med relevant information. Genom att lämna ditt telefonnummer samtycker du till att vi kan kontakta dig via röstsamtal, SMS och meddelandeappar. Samtals-, meddelande- och datataxor kan gälla.

    Du kan också skicka oss din förfrågan

    till contact@innowise.com
    Vad händer härnäst?
    1

    När vi har tagit emot och behandlat din förfrågan återkommer vi till dig för att beskriva dina projektbehov och undertecknar en NDA för att säkerställa sekretess.

    2

    Efter att ha undersökt dina önskemål, behov och förväntningar kommer vårt team att ta fram ett projektförslag förslag med arbetsomfattning, teamstorlek, tids- och kostnadsberäkningar.

    3

    Vi ordnar ett möte med dig för att diskutera erbjudandet och fastställa detaljerna.

    4

    Slutligen undertecknar vi ett kontrakt och börjar arbeta med ditt projekt direkt.

    arrow