Hardcode subtitles usually refer to text that has been embedded into a video and cannot be removed by editing or playback software. If you want to extract and translate video subtitles, and finally synthesize the video, you will need to use various technologies, such as OCR, video restoration, translation, sentence simplification, video layout calculation, and audio and video processing.

With the popularity of AI artificial intelligence in 2023, is there a product that can integrate these technologies to directly extract and translate buildin subtitles from the video? Today, I will analyze the difficulties here and recommend a product that truly solves the problem of translating video hardcode subtitles.

The whole article is divided into several parts

Definition of hard subtitles in videos
Technical principles and difficulties of translating hard subtitles in videos
Translation effect of hard subtitles in videos
Tutorial on translating hardcode subtitles in videos
How to modify translated hard subtitles?

1. Definition of hard subtitles in videos

Hard subtitles are also known as embedded subtitles, internal subtitles, built-in subtitles, etc. Generally, the text of the subtitles is already embedded in the movie. This kind of subtitles are no longer text, but images. They do not have a separate subtitle file, and users cannot use editing tools or playback tools to remove this option. These subtitles cannot be changed or completely deleted.

2. Technical difficulties of translating hardcode subtitles in videos

The technical process of extracting and translating hard subtitles in videos is as follows:

Video analysis and subtitle extraction: Use video parsing tools or open source libraries to extract the subtitle files from the video. Hard subtitles are usually saved as pixel information in the video file and cannot be turned off or hidden. The video file is sent to an OCR recognition engine or API to convert the pixel information into text information.
Recognize text with OCR technology: OCR technology can recognize the text information in the subtitles and then convert it into text form. The OCR engine will establish a subtitle dataset to identify possible special characters that may appear in the subtitles, such as punctuation marks, special symbols, font colors, and capital letters, etc. One of the difficulties here is style extraction.
Translate subtitles: Use language models like ChatGPT for automated translation. Machine translation API can be used to pass the extracted subtitle text to the translation engine and return the translation results to the application. One of the difficulties here is translation accuracy.
Subtitles replacement and integration: Embed the translated subtitles into the video timeline and generate a new subtitle file (in formats such as .srt, .ass, etc.). One of the difficulties here is to accomplish video restoration and remove the original hard subtitles.
Matching audio files with subtitles: Use audio processing tools to synchronize the subtitles in the video with the audio file, so that they remain consistent and eliminate any time differences. One of the difficulties here is spatial and layout calculation and multi-font composition.
Generate the translated video file: Re-combine the synchronized audio file and video file, and output the translated video file.

The entire technical process can be automated, where OCR and ChatGPT API are utilized for natural language processing.