Receive audio file path and desired visualizations.
Parse command-line arguments and options.
Decode the audio file (WAV/MP3 natively, others via ffmpeg).
Extract audio features (spectrogram, mel, chroma, etc.).
Generate the specified visualizations.
Combine visualizations into a grid if multiple are selected.
Save the output image in the specified format (jpg/png).