How Facebook TransCoder AI compiler convert code between high-level languages
Transcompilers can convert source code from a high-level programming language to another. Facebook researchers have also developed a ‘neural transcompiler’ dubbed TransCoder AI. It is a system that can convert code between high-level languages such as Java, Python, and C++.
Migrating an existing codebase to a more modern or efficient language requires a huge amount of resources, both financial and manpower. In theory, transcompilers can help eliminate that expense required in rewriting code from scratch.
However, they are difficult to build in practice because both the source and target language have different syntax, variable types, standard-library functions, and platform APIs.
TransCoder AI: A Code Converter For Any Programming Language
Facebook’s TransCoder AI tackles these challenges with an unsupervised learning approach. It can run unsupervised with a minimal amount of human intervention to find undetected patterns in data sets without labels and outperform rule-based baselines by a “significant” margin.
The neural transcompiler can map pieces of code representing the same instructions to identical pieces of code irrespective of the programming language used.
The best part about Facebook TransCoder is that it can be easily generalized to any programming language and does not require any expert knowledge to convert codes from one programming language to another.
Accuracy of TransCoder AI
Facebook researchers trained TransCoder AI on a publicly available code in more than 2.8 million open-source repositories on GitHub to focus on code translation at the function level.
To evaluate the performance of TransCoder AI, they chose 852 parallel functions in C++, Java, and Python from GeeksforGeeks — a popular platform that teaches coding through problems and offering solutions in several programming languages.
Using the above two data, they developed a new metric called “computational accuracy” that checks whether translated functions generate the same outputs with the same inputs as it did in the source language.
Here’s the accuracy level in the results obtained by the AI while converting codes from
- C++ to Java: 74.8%
- C++ to Python: 67.2%
- Java to C++: 91.6%
- Java to Python: 68.7%
- Python to Java: 56.1%
- Python to C++: 57.8%
The researchers say that TransCoder AI has exhibited an understanding of the syntax of each language (Java, Python, and C++) along with their data structures. It even managed to correctly align the libraries across each language while adapting to small modifications — for instance, renaming a variable in the input.
Though Transcoder isn’t exactly perfect as it failed to account for certain variable types during code generation. However, it did outperform the frameworks that rewrite rules manually built using knowledge by human experts.