Hi everyone, I'm niumo and I'm developing a fully automated AI product that can translate video subtitles and voiceovers into another language.
However, I've encountered some difficulties, specifically how to align the translated subtitles, new voiceover, and visuals.
As every localization professional knows, the same text can have different lengths in different languages. For instance, a translation between English and Italian can differ by up to 30%. This means that during playback, the original speech and translation may fall significantly out of sync. To avoid this problem, we need to synchronize the two speech streams.
To solve this problem, I use different techniques to process text, voice, and video separately for synchronization. Here is what I have done:
Text processing: By reducing unnecessary pauses between words and phrases, we can achieve synchronization. If we find that the translated subtitle is too long and cannot be aligned with the video even after the voice is accelerated, we will use an AI sentence simplification algorithm to simplify the lengthy text. Here's an example:
Original sentence: Every localization professional knows that the same text can have a different length in different languages.
Simplified sentence: Localization professionals know text length varies by language.
Voice processing: To optimize the sound to match the video, we can use timings to generate audio of the desired duration. However, if the voices still don’t sync up, the algorithm will accelerate the speech rate.But it have a rate limited.
Video processing: After using sentence simplification and audio acceleration, if the video still cannot be aligned, we extract the lengthy audio, subtitles, and corresponding video clips, and then slow down the Designatedpart video speed to align it properly.
By simplifying sentences, accelerating audio, and slowing down videos, we can truly achieve alignment of the new video's sound, picture, and subtitles.
However, these processing methods may also produce other issues, such as:
Issue 1: Sentence simplification may lead to inaccurate translation;
Issue 2: Audio acceleration may make it sound too fast;
Issue 3: Video deceleration may result in slower playback.
How do you deal with these issues?You should know that even if I use manual translation and video editing, I will encounter similar issues. Moreover, not all videos will have these problems, but in order to ensure audio and video synchronization, the methods I use will exacerbate these issues.
So, my question is, compared to the need for audio and video synchronization, which of these issues do you think is the most serious?Feel free to leave a comment below. We will carefully review your feedback to improve our product.