Society Work
AI Framework Helps Automatically Read Spanish Notarized Documents
A new AI-based method aims to make Spanish notarized documents more machine-readable. The work of Pedro A. Villa-García, Raúl Alonso-Calvo, and Miguel García-Remesal has developed a framework that can automatically extract essential information from these legally binding texts.
Notarized documents are a central part of Spain's contract and property law. They confirm, for example, real estate transactions and other agreements in a way that is difficult to dispute in court. Banks, insurance companies, and authorities use them, and millions of documents are produced annually. However, their processing is largely manual because the texts are informal and filled with legal jargon.
The information extraction framework developed by the researchers focuses specifically on Spanish notarized documents, whose structure and language differ from the usual. The challenge is compounded by the fact that there are significantly fewer ready-made datasets and language models available in Spanish compared to English.
The proposed solution is based on a so-called end-to-end approach, where the model learns directly from the original documents and their related meanings without separate, manually constructed rule sets. The goal is for the system to identify and extract parties, dates, and key terms from contracts as they appear in the documents.
The research paves the way for the automation of business processes in fields that heavily rely on notarized documents. Successful information extraction could reduce paperwork and errors and speed up decision-making in banks and authorities, for example.
Source: Information extraction framework for Spanish notarized documents using end-to-end data, Artificial Intelligence and Law.
Notarized documents are a central part of Spain's contract and property law. They confirm, for example, real estate transactions and other agreements in a way that is difficult to dispute in court. Banks, insurance companies, and authorities use them, and millions of documents are produced annually. However, their processing is largely manual because the texts are informal and filled with legal jargon.
The information extraction framework developed by the researchers focuses specifically on Spanish notarized documents, whose structure and language differ from the usual. The challenge is compounded by the fact that there are significantly fewer ready-made datasets and language models available in Spanish compared to English.
The proposed solution is based on a so-called end-to-end approach, where the model learns directly from the original documents and their related meanings without separate, manually constructed rule sets. The goal is for the system to identify and extract parties, dates, and key terms from contracts as they appear in the documents.
The research paves the way for the automation of business processes in fields that heavily rely on notarized documents. Successful information extraction could reduce paperwork and errors and speed up decision-making in banks and authorities, for example.
Source: Information extraction framework for Spanish notarized documents using end-to-end data, Artificial Intelligence and Law.
This text was generated with AI assistance and may contain errors. Please verify details from the original source.
Original research: Information extraction framework for spanish notarized documents using end-to-end data
Publisher: Artificial Intelligence and Law
Authors: Pedro A. Villa-García, Raúl Alonso-Calvo, Miguel García-Remesal
December 24, 2025
Read original →