MirTankov EN Localization Project

2025

Python

Localization

Automation

Project Overview

The MirTankov EN Localization Project focuses on fully translating the Russian MMO game "Mir Tankov" to English. By developing two Python tools β€” a PO file merger and an AI-powered translator β€” I automated the process of syncing and translating 250+ `.po` files, totaling over 350,000 characters, in under 1.5 hours.

My Contributions


PO File Merger

A Python tool to batch-merge .po translation files for Mir Tankov (and other games). It syncs English and Russian .po files by always replacing Russian translations with their English counterparts where available, preserving unmatched Russian entries.

def detect_po_language(po_file, sample_size=10):
   def detect_po_language(po_file, sample_size=10):
    po = polib.pofile(po_file)
    texts = [entry.msgstr for entry in po if entry.msgstr.strip()]
    sample_texts = texts[:sample_size]
    langs = []
    for text in sample_texts:
        try:
            langs.append(detect(text))
        except:
            continue
    if langs:
        return max(set(langs), key=langs.count)
    return 'unknown'

# Directories
en_dir = "en"
ru_dir = "ru"
merged_dir = "merged"

# Ensure folders exist
for folder in [en_dir, ru_dir, merged_dir]:
    os.makedirs(folder, exist_ok=True)

# Get all PO filenames from both folders
ru_files = {f for f in os.listdir(ru_dir) if f.endswith('.po')}
en_files = {f for f in os.listdir(en_dir) if f.endswith('.po')}

# Only process files that exist in Russian
for ru_filename in sorted(ru_files):
    ru_path = os.path.join(ru_dir, ru_filename)
    en_path = os.path.join(en_dir, ru_filename)
    merged_path = os.path.join(merged_dir, ru_filename)

    if os.path.isfile(en_path):
        # Merge if English file exists
        ru_lang = detect_po_language(ru_path)
        en_lang = detect_po_language(en_path)
        if ru_lang != "ru" or en_lang != "en":
            print(f"Warning: Detected {ru_lang} for {ru_filename} (ru) and {en_lang} for {ru_filename} (en)")

        ru_po = polib.pofile(ru_path)
        en_po = polib.pofile(en_path)
        en_map = {entry.msgid: entry.msgstr for entry in en_po}

        replaced = 0
        for entry in ru_po:
            if entry.msgid in en_map:
                if entry.msgstr != en_map[entry.msgid]:
                    entry.msgstr = en_map[entry.msgid]
                    replaced += 1
        ru_po.save(merged_path)
        print(f"{ru_filename}: Merged, replaced {replaced} entries.")

    else:
        # No English file, just copy Russian file as-is
        copyfile(ru_path, merged_path)
        print(f"{ru_filename}: English version of the file not found. Copied Russian file as-is.")

# Optionally, warn about English files that don't have a Russian original
for en_filename in sorted(en_files - ru_files):
    print(f"{en_filename}: Russian version of the file not found. Skipped, no file saved in merged.")

print("Done! All merged files saved in 'merged' folder.")

AI Translation β€” Snippets

Snippet 1:

import polib
def mask_placeholders(text):
    PLACEHOLDER_PATTERN = re.compile(r"%\([^)]+\)[sd]|%[sd]|\{[^}]+\}")
    text = text.replace('\n', '\\n')  # Escape real line breaks as literal "\n"
    placeholders = PLACEHOLDER_PATTERN.findall(text)
    mapping = {}
    masked = text
    for idx, ph in enumerate(placeholders):
        token = f"<>"
        mapping[token] = ph
        masked = masked.replace(ph, token)
    return masked, mapping

def unmask_placeholders(text, mapping):
    for token, ph in mapping.items():
        token_id = re.findall(r"\d+", token)[0]
        possible_patterns = [
            re.escape(token),                   
            re.escape(token.replace("<<", "[[").replace(">>", "]]")),
            rf"\(PH_{token_id}\)",              
            rf"\[PH_{token_id}\]",              
            rf"PH_{token_id}",                  
            rf"%\(PH_{token_id}\)[sd]?",        
        ]
        for pat in possible_patterns:
            text = re.sub(pat, ph, text)
    text = text.replace('\\n', '\n')
    text = re.sub(r"[\[\(<%]*PH_\d+[\]\)>d%s]*", "", text)
    return text
Excluding game tags from being translated


Snippet 2

for entry in po:
def translate_batch(batch, client):
    masked_batch = []
    all_mappings = []
    lang_set = set()
    for text in batch:
        masked, mapping = mask_placeholders(text)
        masked_batch.append(masked)
        all_mappings.append(mapping)
        lang_set.add(get_language_name(text))

    if "Russian" in lang_set:
        src_lang = "Russian"
    elif "Chinese" in lang_set:
        src_lang = "Chinese"
    else:
        src_lang = "source language"

    batch_text = "\n".join(f"{i+1}. {line}" for i, line in enumerate(masked_batch))
    resp = client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": (
                f"You are a professional video game localization specialist working on World of Tanks. "
                f"Translate these {src_lang} interface strings into clear, natural English for the game World of Tanks. "
                "Use proper and established World of Tanks in-game terminology. "
                "Be concise, prefer official wording, and preserve formatting.\n"
                "NEVER change or touch placeholders like %(foo), %s, %d, {foo}, <>. "
                "Return ONLY the translations, in the same order, numbered as in input, nothing else."
            )},
            {"role": "user", "content": batch_text}
        ],
        temperature=0.1,
    )
    output_lines = resp.choices[0].message.content.strip().split('\n')
    translations = []
    for line, mapping in zip(output_lines, all_mappings):
        line = line.strip()
        line = re.sub(r"^\d+\.\s*", "", line)
        translations.append(unmask_placeholders(line, mapping))
    return translations

Translation function


Snippet 3

               
    po = polib.pofile(input_path)
    entries_to_translate = []
    indexes_to_translate = []

    # Identify which entries need translation (non-English)
    for idx, entry in enumerate(po):
        if entry.msgstr and entry.msgstr != entry.msgid:
            if not is_english(entry.msgstr):
                entries_to_translate.append(entry)
                indexes_to_translate.append(idx)

    if not entries_to_translate:
        print(f"No strings to translate in {input_path}")
        po.save(output_path)
        return

    translations = []
    for i in tqdm(range(0, len(entries_to_translate), BATCH_SIZE),
                  desc=f"Translating '{os.path.basename(input_path)}' (batches of {BATCH_SIZE})"):
        batch = [entry.msgstr for entry in entries_to_translate[i:i+BATCH_SIZE]]
        batch_translations = translate_batch(batch, client)
        translations.extend(batch_translations)

    # Assign translations back only to entries that were translated
    for idx, translated in zip(indexes_to_translate, translations):
        if translated:
            po[idx].msgstr = translated

    po.save(output_path)
    print(f"βœ… Saved: {output_path}")
Seperating languages

Full code of the script can be found on my github

Screenshots

Script running preview

Mixed and fully done translation

Script running preview

Translation in game

Script running preview

Translation in game 2

Project Repositories

Conclusion

The MirTankov localization project has been one of my most technically rewarding and practically impactful solo works.

The project pushed me to combine scripting, automation, and AI in a meaningful way that directly improved the usability of a game for non-Russian speakers. By automating .po files and translating hundreds of thousands of characters with GPT, I managed to cut what would normally be weeks of manual work into under two hours of processing.

This project deepened my Python skills and working on files' read and write operationsd. It also taught me how to work with sensitive formatting and placeholder handling while maintaining accuracy across languages. Contributing to a public GitHub repo with other community members gave it an open-source spirit that I’m proud to be part of. Overall, it reinforced my passion for building tools that will help others in the future if they want to translate something to different language.