Translation 2.0: Methodological Note on my Pucci Study as a Proof of Concept

AI inside

The Forgotten Code: Validating a Century-Old Translation System with AI

The study was conceived as a proof of concept (PoC) rather than a benchmark exercise. Its objective is to test, empirically and in a controlled manner, the operational validity of Federico Pucci’s interlingual method (1931):

reconstruct the rule-based procedure he proposed,
instrument contemporary large language models (LLMs) to execute that procedure on the same canonical excerpts Pucci used, and
quantify the divergence between the resulting outputs and Pucci’s original translations.

The findings—consistently low average deviations on the target passages and replicability on additional language directions explored—indicate that it is the method that generalizes. As such, Pucci’s rules can plausibly operate as an explainable, symbolic component within a modern neuro-symbolic architecture.

The PoC is deliberately narrow in scope. It does not claim multi-genre or multi-language robustness. The experimental corpus is restricted to Pucci’s two canonical passages (Dante, it→fr; Voltaire, fr→it), because the sole question at stake is: Is Pucci’s 1931 procedure operational and replicable today? Within that frame, the answer is yes.

The study seeks a historical–conceptual “existence proof”—or proof of feasibility—showing that a pre-RBMT rule set can be instantiated a century later with traceability. To keep inference tight and attributable, the design uses:

1. Gold Reference (R). Pucci’s 1931 “mechanical” translations serve as the reference (R).

They are not AI outputs; they are treated as the designated reference in distance calculations D(Ci→R).

2. Controlled Contrast (C₀/C₁).

C₀: the LLM/NMT system without Pucci’s rules;
C₁: the same system with Pucci’s rules explicitly enforced via instruction.

3. Intra-model Ablation. Following comparisons (Group 2, §3.2): remove Pucci’s rules (C₀), then re-activate them (C₁). Different outputs under identical inputs and model weights isolate the causal effect of the rules. Observed Edit Counts (C₀ → C₁):

ChatGPT: 20 deletions + 22 additions = 42 edits
Claude: 20 + 19 = 39 edits
Grok: 24 + 28 = 52 edits

With input and model held constant, toggling the rules yields substantial output change (from 39 to 52 edits), ruling out chance. According to Mill’s method of difference [If one case has the outcome and another otherwise identical case does not, and the only difference between the two cases is a single factor X, then X is (part of) the cause of that outcome], here the only manipulated factor is rule activation; the attributable effect is thus the rules themselves.

Interpretation and Limits

The PoC provides evidence of operability and replicability of Pucci’s rule set on the defined tasks. It does not claim broad generalization across genres, domains, or arbitrary language pairs.
The inference is appropriately conservative: causal attribution is confined to the contrast tested.

Next Steps

Given the PoC’s restricted perimeter, subsequent work should:

Broaden corpora (beyond the canonical excerpts) and extend to additional language pairs and registers;
Run a pilot with human post-editing and contemporary automatic metrics (BLEU / chrF / METEOR) to assess practical and conceptual value at scale;
Incorporate controls (placebo rules, further ablations) to stress-test attribution.

Openness, Falsifiability, and Reproducibility

We will release rules, prompts, scripts, and protocols on GitHub to enable independent replication and attempted refutation of the hypothesis “Pucci’s rules affect the output.” The setup is designed as a Popperian test [aimed at refutation, not confirmation; passing it increases confidence by surviving serious attempts to break the claim]. The hypothesis would be falsified if, for the same model:

activating the rules does not produce a stable effect (C₁ ≈ C₀);
ablations fail to yield the expected error profiles; or
the pipeline is not traceable (i.e., edits cannot be linked to specific rules, or the sequence cannot be replayed with the same result).

A progressive research programme in Lakatos’s sense—i.e., a sequence of theories built around a ‘hard core’ of commitments, protected by a ‘protective belt’ of auxiliary hypotheses, and judged progressive when it yields novel, corroborated predictions—could be designed and implemented, aligned with his MSRP (Methodology of Scientific Research Programmes): establish an evolving, testable structure of ideas likely to yield theoretical and empirical progress, novel facts, and corroborated predictions—while maintaining measurable, achievable goals. Without prejudging future technical choices, reproducibility workshops could usefully experiment with a documented FST pipeline (finite-state transducer: analysis → transfer → generation), conducive to inter-team comparisons, to better explicate, record, test, and (in)validate.

Institutional Context and Community Invitation

A broader project along these lines has been proposed to the Italian CNR, more than 75 years after Pucci’s first contact with the institution, creating an opportunity to revisit this intellectual heritage, recognize Pucci’s contributions, and coordinate replications and exchanges across the community. An open inquiry framework—shared datasets, replication labs, and methodological guidance—would allow the “Pucci” hypothesis to be tested, refined, or discarded across diverse texts, registers, and languages.

Concluding Remark

After a long journey, Pucci’s 1949 letter finds a natural continuation. His opening aim—“enabling people who know only their own language to translate from one language to another”—now admits a traceable, rule-guided instantiation within contemporary systems. Nearly a century later, Pucci’s system is no longer a utopia: its feasibility has been demonstrated in its intended domain; the broader programme now is to determine where, and how far, the method extends.

P.S.

Abstract of the study: The Forgotten Code: Validating a Century-Old Translation System with AI

A pioneering rule-based mechanical translation system (precursor of modern RBMTs) was first presented in December 1929 by its inventor, Federico Pucci, who later published the full method in a book titled "Il traduttore meccanico ed il metodo per corrispondersi fra Europei conoscendo ciascuno solo la propria lingua: Parte I", in Salerno (Italy), in 1931. This study illustrates how AI breathes new life into the system of international keys and ideograms devised by Pucci to translate from/into any Romance language (at least as a first step). The methodology involves having the AIs retranslate, following Pucci's method, the two text excerpts originally translated in 1931 and clearly documented in his publication: a passage from Dante's La Vita Nuova, translated from Italian into French, and a passage from Voltaire's Zadig, translated from French into Italian. The result is notable: the two texts, translated 94 years apart using the same method--by Pucci in 1931 and by AIs in 2025--show a low average difference, with only minor variations observed. With Pucci's system thus validated, it became feasible to have the AIs reproduce the excerpts in English, Spanish, and German according to his method. The results were consistent, and Pucci--via Artificial Intelligence--was tasked with translating more modern and technical texts, thereby reviving, nearly a century later, an invention that had remained almost entirely unknown and never applied beyond its creator, now brought to wider attention and opened to possible experimentation. Such a demonstration would not only affirm Pucci's historical status but also place him among the precursors and intellectual contributors to machine translation, whose work merits examination alongside figures such as Troyanskij, Booth, and Weaver, with possible consequences for how the history of the field is understood.

Pages

vendredi 3 octobre 2025

Methodological Note on my Pucci Study as a Proof of Concept

Aucun commentaire:

Enregistrer un commentaire