This tutorial deals with audio extraction from video using GPU accelerated libraries supported by FFMPEG in Ubuntu. The full code is available in this GitHub repository. For similar posts about video processing, please refer to Resizing a video is unbelievably fast by GPU acceleration and GPU-based video rotation FFmpeg.


FFmpeg is one of the most famous multimedia frameworks which is widely used for processing videos. In order to encode the video, certainly, a video encoder must be used. The latest NVIDIA GPUs contains a hardware-based video encoder called NVENC which is much faster than traditional ones. In order to be able to utilize this GPU-accelerated encoder, FFmpeg must be installed with NVENC support. The full documentation of FFmpeg integrated with NVIDIA can be found at here. documentation on NVENC can be found here. Moreover, the NVENC programming guide can be found here.

In this tutorial, the main goal is to show how to extract audio from video with GPU-accelerated libraries in Linux. In this tutorial, we do not use the terminal commands directly for employing the FFmpeg with NVENC support. Instead, the python interface is being used to run commands in the terminal. This can be done using subprocess Python module. This module is employed for execution and dealing with external commands, intended to supersede the os.sys module. The trivial method os its usage will be explained in this tutorial. Please refer to this documentation for further details.

The assumption of this tutorial is that the FFmpeg is already installed with NVENC support. The installation guide can be found in FFMPEG WITH NVIDIA ACCELERATION ON UBUNTU LINUXdocumentation provided by NVIDIA.

Audio Extraction from Video

From now on the assumption is that the .txt file is ready and well-formatted. The python script for processing videos is as below:

import subprocess
import os
import sys
# Pre...
textfile_path = 'absolute/path/to/videos.txt'
# Read the text file
with open(textfile_path) as f:
    content = f.readlines()
# you may also want to remove whitespace characters like `\n` at the end of each line
files_list = [x.strip() for x in content]
# Extract audio from video.
# It already save the video file using the named defined by output_name.
for file_num, file_path_input in enumerate(files_list, start=1):
    # Get the file name withoutextension
    file_name = os.path.basename(file_path_input)
    if 'mouthcropped' not in file_name:
        raw_file_name = os.path.basename(file_name).split('.')[0]
        file_dir = os.path.dirname(file_path_input)
        file_path_output = file_dir + '/' + raw_file_name + '.wav'
        print('processing file: %s' % file_path_input)
            ['ffmpeg', '-i', file_path_input, '-codec:a', 'pcm_s16le', '-ac', '1', file_path_output])
        print('file %s saved' % file_path_output)

Overall Code Description

The videos.txt file is saved in the absolute path. The code reads the .txt file and stores each line as an item of a list called files_list. Each line refers to the absolute path of a videos. The loop processes each file with the command. In each loop, the folder of the input file is found and the output file will be stored in the same directory but with different naming convention which is explained by the comments in the code. Each , in the command in the python is correspondent to an empty space in the terminal. As an example the correspondent shell command is as below:

for i in **/*.mp4; do base={i%.mp4}; ffmpeg -ibase.mp4 -codec:a pcm_s16le -ac 1 base.wav; done</pre> <!-- /wp:enlighter/codeblock -->  <!-- wp:heading --> <h2 id="ffmpeg-encoder">FFmpeg Encoder</h2> <!-- /wp:heading -->  <!-- wp:paragraph --> The command executed by <strong>FFmpeg</strong> needs to be described. Each of the elements started <g class="gr_ gr_8 gr-alert gr_gramm gr_inline_cards gr_run_anim Style multiReplace" id="8" data-gr-id="8">by </g><code>-</code><g class="gr_ gr_8 gr-alert gr_gramm gr_inline_cards gr_disable_anim_appear Style multiReplace" id="8" data-gr-id="8"> are</g> calling specific operations and the command follows by them execute the desired operation. For example, the <code>-vcodec</code> indicator will specify the <strong>codec</strong> to be used by <strong>FFmpeg</strong> and <strong>NVENC</strong> which follows by that point to the codec. More details can be found at <a href="">FFmpeg Filters Documentation</a>. The following Table, summarize the indicators: <!-- /wp:paragraph -->  <!-- wp:image {"id":238} --> <figure class="wp-block-image"><img src="" alt="FFmpeg Encoder Arguments necessary for audio extraction " class="wp-image-238"/></figure> <!-- /wp:image -->  <!-- wp:paragraph --> The <code>pcm_s16le</code> audio codec is applicable for .wav files. <!-- /wp:paragraph -->  <!-- wp:heading --> <h2 id="code-execution">Code Execution</h2> <!-- /wp:heading -->  <!-- wp:paragraph --> In order to run the python file we go to the terminal and execute the following: <!-- /wp:paragraph -->  <!-- wp:enlighter/codeblock {"language":"shell"} --> <pre class="EnlighterJSRAW" data-enlighter-language="shell" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">for i in **/*.mp4; do base={i%.mp4}; ffmpeg -i base.mp4 -codec:a pcm_s16le -ac 1base.wav; done

As a consideration, if we are working on any specific virtual environment it has to be activated at first.


This tutorial demonstrated how to extract audio from a video and specifically using FFmpeg and Nvidia GPU accelerated library called NVENC. The advantage of using the Python interface is to easily parse the .txtfile and looping through all files. Moreover, it enables the user with options which are more complex to be directly employed in the terminal environment.

Leave a Comment

Your email address will not be published. Required fields are marked *