pytsmod¶

pytsmod.ola(x, s, win_type='hann', win_size=1024, syn_hop_size=512)¶

Modify length of the audio sequence using OLA algorithm. WSOLA with zero tolerance is working same as OLA.

Parameters:

x : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the input audio sequence to modify.
s : number > 0 [scalar] or numpy.ndarray [shape=(2, num_points)]: the time stretching factor. Either a constant value (alpha) or an 2 x n array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.
win_type : str: type of the window function. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
syn_hop_size : int > 0 [scalar]: hop size of the synthesis window. Usually half of the window size.

Returns:

y : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the modified output audio sequence.

pytsmod.wsola(x, s, win_type='hann', win_size=1024, syn_hop_size=512, tolerance=512)¶

Modify length of the audio sequence using WSOLA algorithm.

Parameters:

x : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the input audio sequence to modify.
s : number > 0 [scalar] or numpy.ndarray [shape=(2, num_points)]: the time stretching factor. Either a constant value (alpha) or an 2 x n array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.
win_type : str: type of the window function. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
syn_hop_size : int > 0 [scalar]: hop size of the synthesis window. Usually half of the window size.
tolerance : int >= 0 [scalar]: number of samples the window positions in the input signal may be shifted to avoid phase discontinuities when overlap-adding them to form the output signal (given in samples).

Returns:

y : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the modified output audio sequence.

pytsmod.phase_vocoder(x, s, win_type='sin', win_size=2048, syn_hop_size=512, zero_pad=0, restore_energy=False, fft_shift=False, phase_lock=False)¶

Modify length of the audio sequence using Phase Vocoder algorithm.

Parameters:

x : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the input audio sequence to modify.
s : number > 0 [scalar] or numpy.ndarray [shape=(2, num_points)]: the time stretching factor. Either a constant value (alpha) or an 2 x n array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.
win_type : str: type of the window function for the STFT. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
syn_hop_size : int > 0 [scalar]: hop size of the synthesis window. Usually half of the window size.
zero_pad : int > 0 [scalar]: the size of the zero pad in the window function.
restore_energy : bool: tries to reserve potential energy loss.
fft_shift : bool: apply circular shift to STFT and ISTFT.
phase_lock : bool: apply phase locking.

Returns:

y : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the modified output audio sequence.

pytsmod.phase_vocoder_int(x, s, win_type='hann', win_size=2048, syn_hop_size=512, zero_pad=None, restore_energy=False, fft_shift=True)¶

Modify length of the audio sequence using Phase Vocoder algorithm. Works specially well for integer stretching.

Parameters:

x : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the input audio sequence to modify.
alpha : int > 0 [scalar]: the time stretching factor. Only a integer value greater than 0 is allowed.
win_type : str: type of the window function for the STFT. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
syn_hop_size : int > 0 [scalar]: hop size of the synthesis window. Usually half of the window size.
zero_pad : int > 0 [scalar]: the size of the zero pad in the window function.
restore_energy : bool: tries to reserve potential energy loss.
fft_shift : bool: apply circular shift to STFT and ISTFT.

Returns:

y : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the modified output audio sequence.

pytsmod.hptsm(x, s, hp_kernel_size=31, hp_power=2.0, hp_mask=False, hp_margin=1.0, pv_win_type='hann', pv_win_size=2048, pv_syn_hop_size=512, pv_zero_pad=0, pv_restore_energy=False, pv_fft_shift=False, pv_phase_lock=True, ola_win_type='hann', ola_win_size=256, ola_syn_hop_size=128)¶

Modify length of the audio sequence using both Phase Vocoder and OLA. Apply Phase Vocoder to harmonic signal, and apply OLA to percussive signal. For HPSS, median filter based algorithm is used.

Parameters:

x : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the input audio sequence to modify.
s : number > 0 [scalar] or numpy.ndarray [shape=(2, num_points)]: the time stretching factor. Either a constant value (alpha) or an 2 x n array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.
hp_ : parameters for HPSS.
pv_ : parameters for phase vocoder.
ola_ : parameters for OLA.

Returns:

y : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the modified output audio sequence.

pytsmod.tdpsola(x, sr, src_f0, tgt_f0=None, alpha=1, beta=None, win_type='hann', p_hop_size=441, p_win_size=1470)¶

Modify length and pitch of the audio sequnce using TD-PSOLA algorithm.

Parameters:

x : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the input audio sequence to modify.
sr : int > 0 [scalar]: sample rate of the input audio sequence.
src_f0 : numpy.ndarray [shape=(channel, num_freqs) or (num_freqs)]: the fundamental frequency contour of the input audio sequence.
tgt_f0 : numpy.ndarray [shape=(channel, num_freqs) or (num_freqs)]: the target fundamental frequency contour you want to modify the input audio sequence. Should not be used with beta.
alpha : number > 0 [scalar]: time stretching factor.
beta : number > 0 [scalar]: the pitch shifting factor. should not be used with target_f0.
win_type : str: type of the window function. hann and sin are available.
p_hop_size : int > 0 [scalar]: the hop size of src_f0 (in samples).
p_win_size : int > 0 [scalar]: the window size of pitch tracking algorithm you used. (in samples).

Returns:

y : numpy.ndarray [shape=(channel, num_samples) or (num_samples)]: the modified output audio sequence.