Python
Localization
Automation
Project Overview
The MirTankov EN Localization Project focuses on fully translating the Russian MMO game "Mir Tankov" to English. By developing two Python tools β a PO file merger and an AI-powered translator β I automated the process of syncing and translating 250+ `.po` files, totaling over 350,000 characters, in under 1.5 hours.
My Contributions
- π§ Created full translation automation with OpenAIβs API
- π Built a PO file parser to handle key-value translation merges
- π‘οΈ Designed a placeholder masking system to avoid syntax breaks
- π» Developed file scanning, matching, and backup logic
- π Contributed to LocalizedTanki β a public GitHub project supporting international MirTankov communities
PO File Merger
A Python tool to batch-merge .po translation files for Mir Tankov (and other games). It syncs English and Russian .po files by always replacing Russian translations with their English counterparts where available, preserving unmatched Russian entries.
def detect_po_language(po_file, sample_size=10):
def detect_po_language(po_file, sample_size=10):
po = polib.pofile(po_file)
texts = [entry.msgstr for entry in po if entry.msgstr.strip()]
sample_texts = texts[:sample_size]
langs = []
for text in sample_texts:
try:
langs.append(detect(text))
except:
continue
if langs:
return max(set(langs), key=langs.count)
return 'unknown'
# Directories
en_dir = "en"
ru_dir = "ru"
merged_dir = "merged"
# Ensure folders exist
for folder in [en_dir, ru_dir, merged_dir]:
os.makedirs(folder, exist_ok=True)
# Get all PO filenames from both folders
ru_files = {f for f in os.listdir(ru_dir) if f.endswith('.po')}
en_files = {f for f in os.listdir(en_dir) if f.endswith('.po')}
# Only process files that exist in Russian
for ru_filename in sorted(ru_files):
ru_path = os.path.join(ru_dir, ru_filename)
en_path = os.path.join(en_dir, ru_filename)
merged_path = os.path.join(merged_dir, ru_filename)
if os.path.isfile(en_path):
# Merge if English file exists
ru_lang = detect_po_language(ru_path)
en_lang = detect_po_language(en_path)
if ru_lang != "ru" or en_lang != "en":
print(f"Warning: Detected {ru_lang} for {ru_filename} (ru) and {en_lang} for {ru_filename} (en)")
ru_po = polib.pofile(ru_path)
en_po = polib.pofile(en_path)
en_map = {entry.msgid: entry.msgstr for entry in en_po}
replaced = 0
for entry in ru_po:
if entry.msgid in en_map:
if entry.msgstr != en_map[entry.msgid]:
entry.msgstr = en_map[entry.msgid]
replaced += 1
ru_po.save(merged_path)
print(f"{ru_filename}: Merged, replaced {replaced} entries.")
else:
# No English file, just copy Russian file as-is
copyfile(ru_path, merged_path)
print(f"{ru_filename}: English version of the file not found. Copied Russian file as-is.")
# Optionally, warn about English files that don't have a Russian original
for en_filename in sorted(en_files - ru_files):
print(f"{en_filename}: Russian version of the file not found. Skipped, no file saved in merged.")
print("Done! All merged files saved in 'merged' folder.")
AI Translation β Snippets
Snippet 1:
import polib
def mask_placeholders(text):
PLACEHOLDER_PATTERN = re.compile(r"%\([^)]+\)[sd]|%[sd]|\{[^}]+\}")
text = text.replace('\n', '\\n') # Escape real line breaks as literal "\n"
placeholders = PLACEHOLDER_PATTERN.findall(text)
mapping = {}
masked = text
for idx, ph in enumerate(placeholders):
token = f"<>"
mapping[token] = ph
masked = masked.replace(ph, token)
return masked, mapping
def unmask_placeholders(text, mapping):
for token, ph in mapping.items():
token_id = re.findall(r"\d+", token)[0]
possible_patterns = [
re.escape(token),
re.escape(token.replace("<<", "[[").replace(">>", "]]")),
rf"\(PH_{token_id}\)",
rf"\[PH_{token_id}\]",
rf"PH_{token_id}",
rf"%\(PH_{token_id}\)[sd]?",
]
for pat in possible_patterns:
text = re.sub(pat, ph, text)
text = text.replace('\\n', '\n')
text = re.sub(r"[\[\(<%]*PH_\d+[\]\)>d%s]*", "", text)
return text
Snippet 2
for entry in po:
def translate_batch(batch, client):
masked_batch = []
all_mappings = []
lang_set = set()
for text in batch:
masked, mapping = mask_placeholders(text)
masked_batch.append(masked)
all_mappings.append(mapping)
lang_set.add(get_language_name(text))
if "Russian" in lang_set:
src_lang = "Russian"
elif "Chinese" in lang_set:
src_lang = "Chinese"
else:
src_lang = "source language"
batch_text = "\n".join(f"{i+1}. {line}" for i, line in enumerate(masked_batch))
resp = client.chat.completions.create(
model=OPENAI_MODEL,
messages=[
{"role": "system", "content": (
f"You are a professional video game localization specialist working on World of Tanks. "
f"Translate these {src_lang} interface strings into clear, natural English for the game World of Tanks. "
"Use proper and established World of Tanks in-game terminology. "
"Be concise, prefer official wording, and preserve formatting.\n"
"NEVER change or touch placeholders like %(foo), %s, %d, {foo}, <>. "
"Return ONLY the translations, in the same order, numbered as in input, nothing else."
)},
{"role": "user", "content": batch_text}
],
temperature=0.1,
)
output_lines = resp.choices[0].message.content.strip().split('\n')
translations = []
for line, mapping in zip(output_lines, all_mappings):
line = line.strip()
line = re.sub(r"^\d+\.\s*", "", line)
translations.append(unmask_placeholders(line, mapping))
return translations
Snippet 3
po = polib.pofile(input_path)
entries_to_translate = []
indexes_to_translate = []
# Identify which entries need translation (non-English)
for idx, entry in enumerate(po):
if entry.msgstr and entry.msgstr != entry.msgid:
if not is_english(entry.msgstr):
entries_to_translate.append(entry)
indexes_to_translate.append(idx)
if not entries_to_translate:
print(f"No strings to translate in {input_path}")
po.save(output_path)
return
translations = []
for i in tqdm(range(0, len(entries_to_translate), BATCH_SIZE),
desc=f"Translating '{os.path.basename(input_path)}' (batches of {BATCH_SIZE})"):
batch = [entry.msgstr for entry in entries_to_translate[i:i+BATCH_SIZE]]
batch_translations = translate_batch(batch, client)
translations.extend(batch_translations)
# Assign translations back only to entries that were translated
for idx, translated in zip(indexes_to_translate, translations):
if translated:
po[idx].msgstr = translated
po.save(output_path)
print(f"β
Saved: {output_path}")
Full code of the script can be found on my github
Screenshots
Project Repositories
Conclusion
The MirTankov localization project has been one of my most technically rewarding and practically impactful solo works.
The project pushed me to combine scripting, automation, and AI in a meaningful way that
directly improved the usability of a game for non-Russian speakers. By automating .po files and translating hundreds of thousands of characters with GPT, I managed to cut what would normally be weeks of manual work into under two hours of
processing.
This project deepened my Python skills and working on files' read and write operationsd. It also taught me how to work with sensitive formatting and placeholder handling while maintaining accuracy across languages.
Contributing to a public GitHub repo with other community members gave it an open-source spirit that Iβm proud to be part of. Overall, it reinforced my passion for building tools that will help others in the future if they want to translate something to different language.