utils¶

pytsmod.utils.win(win_type='hann', win_size=4096, zero_pad=0)¶

Generate diverse type of window function

Parameters:	win_type : str the type of window function. Currently, Hann and Sin are supported. win_size : int > 0 [scalar] the size of window function. It doesn’t contains the length of zero padding. zero_pad : int > 0 [scalar] the total length of zero-pad. Zeros are equally distributed for both left and right of the window.
Returns:	win : numpy.ndarray([shape=(win_size)]) the window function generated.

pytsmod.utils.stft(x, ana_hop=2048, win_type='hann', win_size=4096, zero_pad=0, sr=44100, fft_shift=0, time_frequency_out=False)¶

Short-Time Fourier Transform (STFT) for the audio signal. This function is used for phase vocoder.

Parameters:

x : numpy.ndarray [shape=(num_samples)]: the input audio sequence. Should be a single channel.
ana_hop : int > 0 [scalar] or numpy.ndarray [shape=(num_frames)]: either a analysis hop size (scalar) or analyze window positions (array).
win_type : str: type of the window function for the STFT. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
zero_pad : int > 0 [scalar]: the size of the zero pad in the window function.
sr : int > 0 [scalar]: the sample rate of the audio sequence.
fft_shift : bool: apply circular shift to STFT.
time_frequency_out : bool: returns time and frequency axis indices in (spec, t, f).

Returns:

spec : numpy.ndarray [shape=(win_size // 2 + 1, num_frames)]: the STFT result of the input audio sequence.
t : numpy.ndarray [shape=num_frames]: timestamp of the output result.
f : numpy.ndarray [shape=win_size // 2 + 1]: frequency value for each frequency bin of the output result.

pytsmod.utils.istft(spec, syn_hop=2048, win_type='hann', win_size=4096, zero_pad=0, num_iter=1, original_length=-1, fft_shift=False, restore_energy=False)¶

Inverse Short-Time Fourier Transform to recover the audio signal from the spectrogram. This function is used for phase vocoder.

Parameters:

X : numpy.ndarray [shape=(num_bins, num_frames)]: the input audio complex spectrogram.
syn_hop : int > 0 [scalar]: the hop size of the synthesis window.
win_type : str: type of the window function for the ISTFT. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
zero_pad : int > 0 [scalar]: the size of the zero pad in the window function.
num_iter : int > 0 [scalar]: the number of iterations the algorihm should perform to adapt the phase.
original_length : int > 0 [scalar]: original length of the audio signal.
fft_shift : bool: apply circular shift to ISTFT.
restore_energy : bool: tries to reserve potential energy loss.

Returns:

y : numpy.ndarray [shape=(original_length)]: the output audio sequence.

pytsmod.utils.lsee_mstft(X, syn_hop, win_type, win_size, zero_pad, fft_shift, restore_energy)¶

Least Squares Error Estimation from the MSTFT (Modified STFT). Griffin-Lim procedure to estimate the audio signal from the modified STFT.

Parameters:

X : numpy.ndarray [shape=(num_bins, num_frames)]: the input audio complex spectrogram.
syn_hop : int > 0 [scalar]: the hop size of the synthesis window.
win_type : str: type of the window function for the ISTFT. hann and sin are available.
win_size : int > 0 [scalar]: size of the window function.
zero_pad : int > 0 [scalar]: the size of the zero pad in the window function.
fft_shift : bool: apply circular shift to ISTFT.
restore_energy : bool: tries to reserve potential energy loss.

Returns:

x : numpy.ndarray [shape=num_samples]: the output audio sequence through LSEE_MSTFT

pytsmod.utils._validate_audio(audio)¶

validate the input audio and modify the order of channels.

Parameters:	audio : numpy.ndarray [shape=(channel, num_samples) or (num_samples) or (num_samples, channel)] the input audio sequence to validate.
Returns:	audio : numpy.ndarray [shape=(channel, num_samples)] the validataed output audio sequence.

pytsmod.utils._validate_scale_factor(audio, s)¶

Validate the scale factor s and convert the fixed scale factor to anchor points.

Parameters:

audio : numpy.ndarray [shape=(num_channels, num_samples) or (num_samples) or (num_samples, num_channels)]: the input audio sequence.
s : number > 0 [scalar] or numpy.ndarray [shape=(2, num_points) or (num_points, 2)]: the time stretching factor. Either a constant value (alpha) or an (2 x n) (or (n x 2)) array of anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

Returns:

anc_points : numpy.ndarray [shape=(2, num_points)]: anchor points which contains the sample points of the input signal in the first row and the sample points of the output signal in the second row.

pytsmod.utils._validate_f0(audio, f0)¶

Validate the input f0 is suitable for input audio.

Parameters:	audio : numpy.ndarray [shape=(num_channels, num_samples) or (num_samples) or (num_samples, num_channels)] the input audio sequence. f0 : numpy.ndarray [shape=(num_channels, num_pitches) or (num_pitches) or (num_pitches, num_channels)] the f0 sequence that used for TD-PSOLA. If f0 is 1D array, the f0 of all audio channels are regarded as the same f0.
Returns:	f0 : numpy.ndarray [shape=(num_channels, num_freqs)] the f0 sequence that used for TD-PSOLA.