Hi everyone, I'm niumo and I'm developing a fully automated AI product that can translate video subtitles and voiceovers into another language.

However, I've encountered some difficulties, specifically how to align the translated subtitles, new voiceover, and visuals.

As every localization professional knows, the same text can have different lengths in different languages. For instance, a translation between English and Italian can differ by up to 30%. This means that during playback, the original speech and translation may fall significantly out of sync. To avoid this problem, we need to synchronize the two speech streams.

To solve this problem, I use different techniques to process text, voice, and video separately for synchronization. Here is what I have done:

By simplifying sentences, accelerating audio, and slowing down videos, we can truly achieve alignment of the new video's sound, picture, and subtitles.

However, these processing methods may also produce other issues, such as:

Issue 1: Sentence simplification may lead to inaccurate translation;

Issue 2: Audio acceleration may make it sound too fast;

Issue 3: Video deceleration may result in slower playback.

How do you deal with these issues?You should know that even if I use manual translation and video editing, I will encounter similar issues. Moreover, not all videos will have these problems, but in order to ensure audio and video synchronization, the methods I use will exacerbate these issues.

So, my question is, compared to the need for audio and video synchronization, which of these issues do you think is the most serious?Feel free to leave a comment below. We will carefully review your feedback to improve our product.