Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      Designing Better UX For Left-Handed People

      July 25, 2025

      This week in AI dev tools: Gemini 2.5 Flash-Lite, GitLab Duo Agent Platform beta, and more (July 25, 2025)

      July 25, 2025

      Tenable updates Vulnerability Priority Rating scoring method to flag fewer vulnerabilities as critical

      July 24, 2025

      Google adds updated workspace templates in Firebase Studio that leverage new Agent mode

      July 24, 2025

      I ran with the Apple Watch and Samsung Watch 8 – here’s the better AI coach

      July 26, 2025

      8 smart home gadgets that instantly upgraded my house (and why they work)

      July 26, 2025

      I tested Panasonic’s new affordable LED TV model – here’s my brutally honest buying advice

      July 26, 2025

      OpenAI teases imminent GPT-5 launch. Here’s what to expect

      July 26, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      NativePHP Is Entering Its Next Phase

      July 26, 2025
      Recent

      NativePHP Is Entering Its Next Phase

      July 26, 2025

      Medical Card Generator Android App Project Using SQLite

      July 26, 2025

      The details of TC39’s last meeting

      July 26, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      Elden Ring Nightreign’s Patch 1.02 update next week is adding a feature we’ve all been waiting for since launch — and another I’ve been begging for, too

      July 26, 2025
      Recent

      Elden Ring Nightreign’s Patch 1.02 update next week is adding a feature we’ve all been waiting for since launch — and another I’ve been begging for, too

      July 26, 2025

      The next time you look at Microsoft Copilot, it may look back — but who asked for this?

      July 26, 2025

      5 Open Source Apps You Can use for Seamless File Transfer Between Linux and Android

      July 26, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»Google DeepMind Introduces Aeneas: AI-Powered Contextualization and Restoration of Ancient Latin Inscriptions

    Google DeepMind Introduces Aeneas: AI-Powered Contextualization and Restoration of Ancient Latin Inscriptions

    July 26, 2025

    The discipline of epigraphy, focused on studying texts inscribed on durable materials like stone and metal, provides critical firsthand evidence for understanding the Roman world. The field faces numerous challenges including fragmentary inscriptions, uncertain dating, diverse geographical provenance, widespread use of abbreviations, and a large and rapidly growing corpus of over 176,000 Latin inscriptions, with approximately 1,500 new inscriptions added annually.

    To address these challenges, Google DeepMind developed Aeneas: a transformer-based generative neural network that performs restoration of damaged text segments, chronological dating, geographic attribution, and contextualization through retrieval of relevant epigraphic parallels.

    Challenges in Latin Epigraphy

    Latin inscriptions span more than two millennia, from roughly the 7th century BCE to the 8th century CE, across the vast Roman Empire comprising over sixty provinces. These inscriptions vary from imperial decrees and legal documents to tombstones and votive altars. Epigraphers traditionally restore partially lost or illegible texts using detailed knowledge of language, formulae, and cultural context, and attribute inscriptions to certain timeframes and locations by comparing linguistic and material evidence.

    However, many inscriptions suffer from physical damage with missing segments of uncertain lengths. The wide geographic dispersion and diachronic linguistic changes make dating and provenance attribution complex, especially when combined with the sheer corpus size. Manual identification of epigraphic parallels is labor-intensive and often limited by specialized expertise localized to certain regions or periods.

    Latin Epigraphic Dataset (LED)

    Aeneas is trained on the Latin Epigraphic Dataset (LED), an integrated and harmonized corpus of 176,861 Latin inscriptions aggregating records from three major databases. The dataset includes approximately 16 million characters covering inscriptions spanning seven centuries BCE to eight centuries CE. About 5% of these inscriptions have associated grayscale images.

    The dataset uses character-level transcriptions employing special placeholder tokens: - marks missing text of a known length while # denotes missing segments of unknown length. Metadata includes province-level provenance over 62 Roman provinces and dating by decade.

    Model Architecture and Input Modalities

    Aeneas’s core is a deep, narrow transformer decoder based on the T5 architecture, adapted with rotary positional embeddings for effective local and contextual character processing. The textual input is processed alongside optional inscription images (when available) through a shallow convolutional network (ResNet-8), which feeds image embeddings to the geographical attribution head only.

    The model includes multiple specialized task heads to perform:

    • Restoration: Predict missing characters, supporting arbitrary-length unknown gaps using an auxiliary neural classifier.
    • Geographical Attribution: Classify inscriptions among 62 provinces by combining text and visual embeddings.
    • Chronological Attribution: Estimate text date by decade using a predictive probabilistic distribution aligned with historical date ranges.

    Additionally, the model generates a unified historically enriched embedding by combining outputs from the core and task heads. This embedding enables retrieval of ranked epigraphic parallels using cosine similarity, incorporating linguistic, epigraphic, and broader cultural analogies beyond exact textual matches.

    Training Setup and Data Augmentation

    Training occurs on TPU v5e hardware with batch sizes up to 1024 text-image pairs. Losses for each task are combined with optimized weighting. The data is augmented by random text masking (up to 75% characters), text clipping, word deletions, punctuation dropping, image augmentations (zoom, rotation, brightness/contrast adjustments), dropout, and label smoothing to improve generalization.

    Prediction uses beam search with specialized non-sequential logic for unknown-length text restoration, ensuring multiple restoration candidates ranked by joint probability and length.

    Performance and Evaluation

    Evaluated on the LED test set and through a human-AI collaboration study with 23 epigraphers, Aeneas demonstrates marked improvements:

    • Restoration: Character error rate (CER) reduced to approximately 21% when Aeneas support is provided, compared to 39% for unaided human experts. The model itself achieves around 23% CER on the test set.
    • Geographical Attribution: Achieves around 72% accuracy in correctly classifying the province among 62 options. With Aeneas assistance, historians improve accuracy up to 68%, outperforming either alone.
    • Chronological Attribution: Average error in date estimation is approximately 13 years for Aeneas, with historians aided by Aeneas reducing error from about 31 years to 14 years.
    • Contextual Parallels: Epigraphic parallels retrieved are accepted as useful starting points for historical research in approximately 90% of cases and increase historians’ confidence by an average of 44%.

    These improvements are statistically significant and highlight the model’s utility as an augmentation to expert scholarship.

    Case Studies

    Res Gestae Divi Augusti:
    Aeneas’s analysis of this monumental inscription reveals bimodal dating distributions reflecting scholarly debates about its compositional layers and stages (late first century BCE and early first century CE). Saliency maps highlight date-sensitive linguistic forms, archaic orthography, institutional titles, and personal names, mirroring expert epigraphic knowledge. Parallels retrieved predominantly include imperial legal decrees and official senatorial texts sharing formulaic and ideological features.

    Votive Altar from Mainz (CIL XIII, 6665):
    Dedicated in 211 CE by a military official, this inscription was accurately dated and geographically attributed to Germania Superior and related provinces. Saliency maps identify key consular dating formulas and cultic references. Aeneas retrieved highly related parallels including a 197 CE altar sharing rare textual formulas and iconography, revealing historically meaningful connections beyond direct text overlap or spatial metadata.

    Integration in Research Workflows and Education

    Aeneas operates as a cooperative tool, not a replacement for historians. It accelerates searching for epigraphic parallels, aids restoration, and refines attribution, freeing scholars to focus on higher-level interpretation. The tool and dataset are openly available via the Predicting the Past platform under permissive licenses. An educational curriculum has been co-developed targeting high school students and educators, promoting interdisciplinary digital literacy by bridging AI and classical studies.


    FAQ 1: What is Aeneas and what tasks does it perform?

    Aeneas is a generative multimodal neural network developed by Google DeepMind for Latin epigraphy. It assists historians by restoring damaged or missing text in ancient Latin inscriptions, estimating their date within about 13 years, attributing their geographical origin with around 72% accuracy, and retrieving historically relevant parallel inscriptions for contextual analysis.

    FAQ 2: How does Aeneas handle incomplete or damaged inscriptions?

    Aeneas can predict missing text segments even when the length of the gap is unknown, a capability known as arbitrary-length restoration. It uses a transformer-based architecture and specialized neural network heads to generate multiple plausible restoration hypotheses, ranked by likelihood, facilitating expert evaluation and further research.

    FAQ 3: How is Aeneas integrated into historian workflows?

    Aeneas provides historians with ranked lists of epigraphic parallels and predictive hypotheses for restoration, dating, and provenance. These outputs boost historians’ confidence and accuracy, reduce research time by quickly suggesting relevant texts, and support collaborative human-AI analysis. The model and datasets are openly accessible via the Predicting the Past platform.


    Check out the Paper, Project and Google DeepMind Blog. All credit for this research goes to the researchers of this project. SUBSCRIBE NOW to our AI Newsletter

    The post Google DeepMind Introduces Aeneas: AI-Powered Contextualization and Restoration of Ancient Latin Inscriptions appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleNVIDIA AI Releases GraspGen: A Diffusion-Based Framework for 6-DOF Grasping in Robotics
    Next Article Building a GPU-Accelerated Ollama LangChain Workflow with RAG Agents, Multi-Session Chat Performance Monitoring

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 26, 2025
    Machine Learning

    RoboBrain 2.0: The Next-Generation Vision-Language Model Unifying Embodied AI for Advanced Robotics

    July 26, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    CVE-2025-45042 – Tenda AC9 Command Injection Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    LogoRRR – cross-platform log analysis tool

    Linux

    CVE-2025-47733 – Microsoft Power Apps SSRF Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    CVE-2025-7463 – Tenda FH1201 HTTP POST Request Handler Buffer Overflow Vulnerability

    Common Vulnerabilities and Exposures (CVEs)

    Highlights

    CVE-2024-51977 – HP Printer Information Disclosure

    June 25, 2025

    CVE ID : CVE-2024-51977

    Published : June 25, 2025, 8:15 a.m. | 2 hours, 42 minutes ago

    Description : An unauthenticated attacker who can access either the HTTP service (TCP port 80), the HTTPS service (TCP port 443), or the IPP service (TCP port 631), can leak several pieces of sensitive information from a vulnerable device. The URI path /etc/mnt_info.csv can be accessed via a GET request and no authentication is required. The returned result is a comma separated value (CSV) table of information. The leaked information includes the device’s model, firmware version, IP address, and serial number.

    Severity: 5.3 | MEDIUM

    Visit the link for more details, such as CVSS details, affected products, timeline, and more…

    Microsoft Warns of Tax-Themed Email Attacks Using PDFs and QR Codes to Deliver Malware

    April 3, 2025

    CVE-2025-6617 – D-Link DIR-619L Stack-Based Buffer Overflow Vulnerability

    June 25, 2025

    Orbit by Mozilla (AI Add-on for Firefox) Shuts Down This Month

    June 9, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.