Audio & Video Processing Guide¶

Overview¶

File Organizer provides advanced audio and video processing capabilities for intelligent file organization. This guide covers:

Audio Transcription: Speech-to-text using Faster-Whisper models
Audio Classification: Automatic categorization (music, podcast, audiobook, etc.)
Audio Metadata: ID3 tags, duration, bitrate, and quality analysis
Video Processing: Scene detection and keyframe extraction (covered in separate section)

All features run 100% locally with no cloud dependencies, preserving your privacy.

Audio Transcription¶

Overview¶

Audio transcription converts speech in audio files to text using state-of-the-art Whisper models via the faster-whisper library. This enables:

Content-based organization: Organize by spoken content, not just filenames
Searchable audio: Find audio files by what's said inside them
Classification accuracy: Better categorization using transcribed content
Metadata extraction: Extract speaker names, topics, and keywords

System Requirements¶

Component	Requirement	Notes
Python	3.11+	Required
FFmpeg	Latest	Required for audio processing
RAM	4-8 GB	Depends on model size
Storage	1-10 GB	For downloaded models
GPU	Optional	CUDA/ROCm for acceleration

Installing FFmpeg¶

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use:

choco install ffmpeg

Installation¶

Install the audio processing dependencies:

# From your File Organizer directory
pip install -e ".[audio]"

This installs: - faster-whisper>=1.0.0 - Whisper transcription engine - torch>=2.1.0 - GPU acceleration support - mutagen>=1.47.0 - Audio metadata extraction - tinytag>=1.10.0 - Lightweight metadata fallback - pydub>=0.25.0 - Audio manipulation utilities

Verify Installation¶

python -c "from faster_whisper import WhisperModel; print('✓ Audio transcription ready')"

If successful, you're ready to transcribe audio files.

Model Sizes¶

Faster-Whisper supports multiple model sizes with different speed/accuracy tradeoffs:

Model	Size	VRAM	Speed	Accuracy	Use Case
`tiny`	75 MB	~1 GB	Very Fast	Fair	Quick previews, low-resource systems
`base`	150 MB	~1 GB	Fast	Good	General use, balanced performance
`small`	500 MB	~2 GB	Moderate	Very Good	Recommended for most users
`medium`	1.5 GB	~5 GB	Slow	Excellent	High accuracy needs
`large-v2`	3 GB	~10 GB	Very Slow	Best	Maximum accuracy
`large-v3`	3 GB	~10 GB	Very Slow	Best	Latest Whisper version

Recommendation: Start with small for a good balance of speed and accuracy.

Compute Types¶

Control precision and performance with compute types:

Type	Precision	Speed	VRAM	Supported Hardware
`float32`	Full	Slow	High	CPU, GPU
`float16`	Half	Fast	Medium	GPU only (CUDA, ROCm)
`int8`	8-bit	Very Fast	Low	CPU, GPU
`int8_float16`	Mixed	Very Fast	Low	GPU only

GPU Users: Use float16 or int8_float16 for best performance. CPU Users: Use int8 to reduce memory usage.

Basic Usage¶

Programmatic API¶

from pathlib import Path
from file_organizer.services.audio.transcriber import AudioTranscriber, ModelSize, ComputeType

# Initialize transcriber
transcriber = AudioTranscriber(
    model_size=ModelSize.SMALL,
    compute_type=ComputeType.FLOAT16,  # Use INT8 for CPU
    device="cuda"  # or "cpu"
)

# Transcribe audio file
audio_file = Path("~/Downloads/podcast-episode.mp3")
result = transcriber.transcribe(audio_file)

# Access results
print(f"Language: {result.language} ({result.language_confidence:.2%})")
print(f"Duration: {result.duration:.1f} seconds")
print(f"Text: {result.text}")

# Access segments with timestamps
for segment in result.segments:
    print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")

Advanced Options¶

from file_organizer.services.audio.transcriber import (
    AudioTranscriber,
    TranscriptionOptions,
    ModelSize,
    ComputeType
)

# Configure advanced options
options = TranscriptionOptions(
    language="en",  # Force English (None for auto-detect)
    word_timestamps=True,  # Enable word-level timestamps
    beam_size=5,  # Beam search size (higher = slower, more accurate)
    best_of=5,  # Number of candidates (higher = slower, more accurate)
    temperature=0.0,  # Sampling temperature (0 = deterministic)
    vad_filter=True,  # Voice Activity Detection (removes silence)
    initial_prompt="This is a technical podcast about AI and machine learning."
)

transcriber = AudioTranscriber(
    model_size=ModelSize.MEDIUM,
    compute_type=ComputeType.INT8_FLOAT16
)

result = transcriber.transcribe("interview.wav", options=options)

# Word-level timestamps
for segment in result.segments:
    if segment.words:
        for word in segment.words:
            print(f"{word.word} [{word.start:.2f}s] (confidence: {word.probability:.2%})")

Language Support¶

Whisper supports 100+ languages with automatic detection:

Auto-Detection (Recommended):

# Language is detected automatically
result = transcriber.transcribe("audio.mp3")
print(f"Detected: {result.language}")

Manual Language Selection:

options = TranscriptionOptions(language="es")  # Spanish
result = transcriber.transcribe_with_options("audio.mp3", options)

Supported Languages: - English (en), Spanish (es), French (fr), German (de) - Mandarin (zh), Japanese (ja), Korean (ko) - Arabic (ar), Russian (ru), Portuguese (pt) - Italian (it), Dutch (nl), Polish (pl) - And 90+ more...

Supported Audio Formats¶

Audio transcription supports the following file formats:

Format	Extension	Notes
MP3	`.mp3`	Most common, widely supported
WAV	`.wav`	Uncompressed, highest quality
FLAC	`.flac`	Lossless compression
M4A	`.m4a`	Apple/iTunes format
Ogg Vorbis	`.ogg`	Open-source format

Requirements: FFmpeg must be installed for format conversion and preprocessing.

Verification: Check if a file is supported:

from file_organizer.core.types import AUDIO_EXTENSIONS

file_path = "my-file.mp3"
is_supported = any(file_path.endswith(ext) for ext in AUDIO_EXTENSIONS)
print(f"Supported: {is_supported}")

Performance Optimization¶

GPU Acceleration¶

Check GPU Availability:

import torch

print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

Optimize for GPU:

transcriber = AudioTranscriber(
    model_size=ModelSize.SMALL,
    compute_type=ComputeType.FLOAT16,  # GPU-optimized
    device="cuda",
    num_workers=4  # Parallel processing
)

CPU Optimization¶

Optimize for CPU:

transcriber = AudioTranscriber(
    model_size=ModelSize.TINY,  # Smaller model
    compute_type=ComputeType.INT8,  # Quantized precision
    device="cpu",
    num_workers=1  # Limit workers to avoid thrashing
)

Batch Processing¶

Process multiple files efficiently:

from pathlib import Path

audio_files = list(Path("~/Podcasts").glob("*.mp3"))

for audio_file in audio_files:
    try:
        result = transcriber.transcribe(audio_file)

        # Save transcription
        output_file = audio_file.with_suffix(".txt")
        output_file.write_text(result.text)

        print(f"✓ {audio_file.name}: {result.language} ({len(result.text)} chars)")
    except Exception as e:
        print(f"✗ {audio_file.name}: {e}")

Integration with File Organization¶

Organize by Transcribed Content¶

from file_organizer.services.audio.organizer import AudioOrganizer
from file_organizer.services.audio.classifier import AudioClassifier
from file_organizer.services.audio.metadata_extractor import AudioMetadataExtractor

# Extract metadata
metadata_extractor = AudioMetadataExtractor()
metadata = metadata_extractor.extract("podcast.mp3")

# Transcribe audio
transcriber = AudioTranscriber(model_size=ModelSize.SMALL)
transcription = transcriber.transcribe("podcast.mp3")

# Classify audio type
classifier = AudioClassifier()
classification = classifier.classify(
    metadata=metadata,
    transcription=transcription
)

print(f"Type: {classification.audio_type}")
print(f"Confidence: {classification.confidence:.2%}")
print(f"Reasoning: {classification.reasoning}")

# Organize with AudioOrganizer
organizer = AudioOrganizer()
plan = organizer.preview_organization(
    files=[(Path("podcast.mp3"), classification.audio_type, metadata)],
    base_path=Path("~/Audio").expanduser(),
)
print(f"Planned moves: {len(plan.planned_moves)}")

TUI Integration¶

View audio files and transcriptions in the Terminal UI:

file-organizer tui

Press 5 to access the Audio view, which displays: - Discovered audio files in current directory - Metadata (title, artist, album, duration, bitrate) - Classification results (music, podcast, audiobook, etc.) - Transcription preview (if available)

Troubleshooting¶

"FFmpeg not found"¶

Error:

FileNotFoundError: FFmpeg not found

Solution:

# Verify FFmpeg installation
ffmpeg -version

# If not installed, see "Installing FFmpeg" section above

Out of Memory¶

Error:

RuntimeError: CUDA out of memory

Solutions:

# 1. Use smaller model
transcriber = AudioTranscriber(model_size=ModelSize.TINY)

# 2. Use quantized compute type
transcriber = AudioTranscriber(compute_type=ComputeType.INT8)

# 3. Switch to CPU
transcriber = AudioTranscriber(device="cpu")

Poor Transcription Quality¶

Solutions: 1. Use larger model: medium or large-v3 2. Specify language: language="en" instead of auto-detect 3. Add context: initial_prompt="Technical discussion about..." 4. Enable VAD: vad_filter=True to remove silence 5. Increase beam size: beam_size=10 for better accuracy

Best Practices¶

Model Selection¶

Quick previews: tiny or base
General use: small (recommended)
High accuracy: medium or large-v3
Non-English: medium or larger for best results

Compute Type Selection¶

GPU with 6+ GB VRAM: float16
GPU with <6 GB VRAM: int8_float16
CPU: int8
Development/debugging: float32

Processing Strategy¶

Start small: Test with tiny or base model first
Validate quality: Check a few transcriptions before batch processing
Monitor resources: Watch RAM/VRAM usage during processing
Save incrementally: Save results after each file in batch jobs
Handle errors: Wrap transcription calls in try/except blocks

Configuration¶

Configure transcription settings in ~/.config/file-organizer/config.yaml:

audio:
  transcription:
    enabled: true
    model_size: small
    compute_type: float16
    device: cuda  # or cpu
    language: null  # null for auto-detect
    word_timestamps: false
    vad_filter: true
    beam_size: 5
    best_of: 5
    temperature: 0.0

Video Analysis¶

Overview¶

Video analysis provides advanced scene detection and metadata extraction capabilities for intelligent video file organization. This enables:

Scene Detection: Automatically detect scene changes and transitions
Keyframe Extraction: Extract representative frames from each scene
Content-based Organization: Organize videos by visual content, not just filenames
Metadata Extraction: Resolution, codec, duration, bitrate, and creation date
Screen Recording Detection: Identify and categorize screen recordings

All features run 100% locally using OpenCV and PySceneDetect with no cloud dependencies.

System Requirements¶

Component	Requirement	Notes
Python	3.11+	Required
OpenCV	4.8.0+	Core video processing
FFmpeg	Latest	Metadata extraction (optional)
RAM	2-4 GB	Depends on video resolution
Storage	Minimal	No models to download

Installing FFmpeg (Optional)¶

FFmpeg is optional but recommended for richer metadata extraction.

macOS:

brew install ffmpeg

Ubuntu/Debian:

sudo apt update
sudo apt install ffmpeg

Windows: Download from ffmpeg.org or use:

choco install ffmpeg

Installation¶

Install the video processing dependencies:

# From your File Organizer directory
pip install -e ".[video]"

This installs: - opencv-python>=4.8.0 - Video frame processing and analysis - scenedetect[opencv]>=0.6.0 - Advanced scene detection algorithms

Verify Installation¶

python -c "import cv2; import scenedetect; print('✓ Video analysis ready')"

If successful, you're ready to analyze video files.

Detection Methods¶

PySceneDetect supports multiple scene detection algorithms with different characteristics:

Method	Algorithm	Speed	Accuracy	Use Case
`content`	Content-aware analysis	Moderate	Excellent	General use (recommended)
`threshold`	Simple pixel difference	Fast	Good	Quick previews, low-resource systems
`adaptive`	Adaptive threshold	Slow	Very Good	Variable lighting, complex scenes
`histogram`	Color histogram comparison	Moderate	Very Good	Color-based transitions

Recommendation: Start with content for best balance of speed and accuracy.

Detection Thresholds¶

Control sensitivity with threshold parameters:

Threshold	Sensitivity	Scene Count	Use Case
`15.0`	Very High	Many scenes	Subtle transitions, slow-paced content
`27.0`	High	Moderate	Default - recommended for most videos
`40.0`	Medium	Fewer scenes	Action videos, music videos
`60.0`	Low	Minimal scenes	Only major scene changes

Lower threshold = more sensitive = more scenes detected

Basic Usage¶

Programmatic API¶

from pathlib import Path
from file_organizer.services.video.scene_detector import SceneDetector, DetectionMethod

# Initialize detector
detector = SceneDetector(
    method=DetectionMethod.CONTENT,
    threshold=27.0,
    min_scene_length=1.0  # seconds
)

# Detect scenes in video
video_file = Path("~/Videos/movie.mp4")
result = detector.detect_scenes(video_file)

# Access results
print(f"Video: {result.video_path.name}")
print(f"Duration: {result.total_duration:.1f} seconds")
print(f"FPS: {result.fps:.2f}")
print(f"Detected {len(result.scenes)} scenes")

# Access individual scenes
for scene in result.scenes:
    print(f"Scene {scene.scene_number}: {scene.start_time:.2f}s - {scene.end_time:.2f}s "
          f"({scene.duration:.2f}s, {scene.frame_count} frames)")

Advanced Options¶

from file_organizer.services.video.scene_detector import SceneDetector, DetectionMethod

# High-sensitivity detection for subtle transitions
detector = SceneDetector(
    method=DetectionMethod.ADAPTIVE,
    threshold=15.0,  # Lower = more sensitive
    min_scene_length=0.5  # Allow shorter scenes
)

result = detector.detect_scenes("interview.mp4")

# Override method/threshold per video
result = detector.detect_scenes(
    "action-movie.mp4",
    method=DetectionMethod.CONTENT,
    threshold=40.0  # Less sensitive for fast-paced content
)

Extract Scene Thumbnails¶

from pathlib import Path
from file_organizer.services.video.scene_detector import SceneDetector

detector = SceneDetector()
result = detector.detect_scenes("video.mp4")

# Extract thumbnail for each scene
output_dir = Path("~/Videos/thumbnails")
SceneDetector.extract_scene_thumbnails(
    video_path="video.mp4",
    result=result,
    output_dir=output_dir,
    frame_offset=0.5  # Extract frame 0.5s into each scene
)

print(f"Saved {len(result.scenes)} thumbnails to {output_dir}")

Save Scene List¶

from file_organizer.services.video.scene_detector import SceneDetector

detector = SceneDetector()
result = detector.detect_scenes("video.mp4")

# Save scene list as CSV
SceneDetector.save_scene_list(result, "scenes.csv")

# CSV format:
# Scene,Start Time,End Time,Duration,Start Frame,End Frame,Frame Count,Score
# 1,0.00,5.23,5.23,0,157,157,0.850
# 2,5.23,12.45,7.22,157,373,216,0.920

Video Metadata Extraction¶

Extract Metadata¶

from pathlib import Path
from file_organizer.services.video.metadata_extractor import VideoMetadataExtractor

# Initialize extractor
extractor = VideoMetadataExtractor()

# Extract metadata
video_file = Path("~/Videos/movie.mp4")
metadata = extractor.extract(video_file)

# Access metadata
print(f"File: {metadata.file_path.name}")
print(f"Size: {metadata.file_size / 1024 / 1024:.1f} MB")
print(f"Format: {metadata.format}")
print(f"Duration: {metadata.duration:.1f} seconds")
print(f"Resolution: {metadata.width}x{metadata.height}")
print(f"FPS: {metadata.fps:.2f}")
print(f"Codec: {metadata.codec}")
print(f"Bitrate: {metadata.bitrate / 1000:.0f} kbps")

Resolution Classification¶

from file_organizer.services.video.metadata_extractor import resolution_label

# Classify resolution
label = resolution_label(1920, 1080)
print(label)  # "1080p"

label = resolution_label(3840, 2160)
print(label)  # "4k"

label = resolution_label(1280, 720)
print(label)  # "720p"

Batch Processing¶

Process multiple videos efficiently:

from pathlib import Path
from file_organizer.services.video.scene_detector import SceneDetector

detector = SceneDetector()
video_files = list(Path("~/Videos").glob("*.mp4"))

# Batch detect scenes
results = detector.detect_scenes_batch(video_files)

# Process results
for result in results:
    print(f"{result.video_path.name}: {len(result.scenes)} scenes")

    # Save scene list for each video
    output_csv = result.video_path.with_suffix(".scenes.csv")
    SceneDetector.save_scene_list(result, output_csv)

Supported Video Formats¶

Core Formats (Recognized by File Organizer)¶

These formats are explicitly supported by File Organizer's VIDEO_EXTENSIONS:

Format	Extension	Notes
MP4	`.mp4`	Most common, recommended
MKV	`.mkv`	High-quality container
AVI	`.avi`	Legacy Windows format
MOV	`.mov`	QuickTime format
WMV	`.wmv`	Windows Media Video

Verification: Check if a file is recognized:

from file_organizer.core.types import VIDEO_EXTENSIONS

file_path = "my-video.mp4"
is_recognized = any(file_path.endswith(ext) for ext in VIDEO_EXTENSIONS)
print(f"Recognized: {is_recognized}")

Additional Formats (OpenCV/FFmpeg Runtime Support)¶

Depending on your OpenCV and FFmpeg installation, these formats may also work for scene detection: - WebM (.webm) - Web-optimized format - FLV (.flv) - Flash video - MPEG (.mpeg, .mpg) - MPEG-1/2 format - M4V (.m4v) - iTunes video format - 3GP (.3gp) - Mobile video format

Note: Additional formats may work for scene detection via OpenCV, but are not recognized by File Organizer's file organization logic (filtering, categorization). For best compatibility, use core formats.

Integration with File Organization¶

Organize by Scene Count¶

from pathlib import Path
from file_organizer.services.video.organizer import VideoOrganizer
from file_organizer.services.video.scene_detector import SceneDetector
from file_organizer.services.video.metadata_extractor import VideoMetadataExtractor

# Extract metadata
metadata_extractor = VideoMetadataExtractor()
metadata = metadata_extractor.extract("video.mp4")

# Detect scenes
detector = SceneDetector()
scene_result = detector.detect_scenes("video.mp4")

# Organize based on video characteristics
if len(scene_result.scenes) > 50:
    category = "long-form"
elif scene_result.total_duration < 60:
    category = "short-clips"
else:
    category = "standard"

print(f"Category: {category}")
print(f"Scenes: {len(scene_result.scenes)}")
print(f"Duration: {scene_result.total_duration:.1f}s")

Screen Recording Detection¶

from file_organizer.services.video.organizer import is_screen_recording

# Detect screen recordings by filename
if is_screen_recording("Screen Recording 2025-01-15 at 3.45.22 PM.mp4"):
    print("macOS QuickTime screen recording detected")

if is_screen_recording("2025-01-15 14-05-32.mp4"):
    print("OBS Studio recording detected")

# Supports patterns from:
# - macOS QuickTime
# - Windows Snipping Tool
# - OBS Studio
# - Xbox Game Bar
# - Camtasia

Troubleshooting¶

"OpenCV not found"¶

Error:

ImportError: No module named 'cv2'

Solution:

# Verify installation
python -c "import cv2; print(cv2.__version__)"

# If not installed
pip install opencv-python>=4.8.0

"scenedetect not found"¶

Error:

ImportError: No module named 'scenedetect'

Solution:

pip install scenedetect[opencv]>=0.6.0

Platform-Specific OpenCV Issues¶

macOS - Conflicting Installations:

# Remove brew version if conflicts occur
brew uninstall opencv

# Use pip version only
pip install opencv-python

Linux - Missing System Libraries:

# Ubuntu/Debian
sudo apt install libgl1-mesa-glx libglib2.0-0

# Fedora/RHEL
sudo dnf install mesa-libGL glib2

Windows - Binary Compatibility:

# Use prebuilt binaries
pip install opencv-python

# Avoid building from source unless necessary

Failed to Open Video¶

Error:

ValueError: Failed to open video: video.mp4

Solutions: 1. Check file exists: Verify path is correct 2. Check format support: Try with .mp4 file first 3. Install FFmpeg: Some codecs require FFmpeg 4. Test with OpenCV:

import cv2
cap = cv2.VideoCapture("video.mp4")
print(f"Opened: {cap.isOpened()}")
cap.release()

Too Many/Few Scenes Detected¶

Solutions:

Too many scenes:

# Increase threshold (less sensitive)
detector = SceneDetector(threshold=40.0)

# Increase minimum scene length
detector = SceneDetector(min_scene_length=2.0)  # 2 seconds minimum

Too few scenes:

# Decrease threshold (more sensitive)
detector = SceneDetector(threshold=15.0)

# Try different detection method
detector = SceneDetector(method=DetectionMethod.ADAPTIVE)

Best Practices¶

Detection Method Selection¶

General videos: content method (recommended)
Fast previews: threshold method
Variable lighting: adaptive method
Color-based transitions: histogram method

Threshold Selection¶

Subtle transitions (interviews, documentaries): 15.0 - 20.0
General content (movies, TV shows): 27.0 (default)
Fast-paced (action, music videos): 40.0 - 50.0
Major changes only: 60.0+

Processing Strategy¶

Start with defaults: Test with content method and threshold 27.0
Validate on sample: Check scene detection quality on representative video
Adjust parameters: Tune threshold based on content type
Batch process: Use detect_scenes_batch() for multiple files
Save results: Export scene lists to CSV for review
Extract thumbnails: Visual verification of scene boundaries

Performance Optimization¶

Fast Processing¶

# Use threshold method for speed
detector = SceneDetector(
    method=DetectionMethod.THRESHOLD,
    threshold=30.0
)

# Skip scene extraction, metadata only
from file_organizer.services.video.metadata_extractor import VideoMetadataExtractor
extractor = VideoMetadataExtractor()
metadata = extractor.extract("video.mp4")  # Much faster than scene detection

High-Quality Processing¶

# Use adaptive method for accuracy
detector = SceneDetector(
    method=DetectionMethod.ADAPTIVE,
    min_scene_length=0.5  # Detect shorter scenes
)

# Extract high-resolution thumbnails
SceneDetector.extract_scene_thumbnails(
    video_path="video.mp4",
    result=result,
    output_dir="thumbnails",
    frame_offset=1.0  # Use frame 1s into scene
)

Configuration¶

Configure video analysis settings in ~/.config/file-organizer/config.yaml:

video:
  scene_detection:
    enabled: true
    method: content  # content, threshold, adaptive, histogram
    threshold: 27.0
    min_scene_length: 1.0
  metadata:
    use_ffprobe: true  # Prefer ffprobe over OpenCV
    extract_thumbnails: false
  organization:
    detect_screen_recordings: true
    short_clip_threshold: 60.0  # seconds

Verification¶

This section provides comprehensive tests to verify your audio and video processing setup is working correctly.

System Dependencies¶

Verify all required system dependencies are installed:

# Check Python version (requires 3.11+)
python3 --version

# Check FFmpeg installation
ffmpeg -version

# Check pip installation
pip --version

Expected Output: - Python 3.11.0 or higher - FFmpeg version information (any recent version) - pip 23.0 or higher

Audio Processing Verification¶

1. Verify Audio Dependencies¶

# Test faster-whisper installation
python -c "from faster_whisper import WhisperModel; print('✓ faster-whisper installed')"

# Test torch installation
python -c "import torch; print(f'✓ PyTorch {torch.__version__} installed')"

# Test audio metadata libraries
python -c "import mutagen; import tinytag; print('✓ Audio metadata libraries installed')"

# Check GPU availability (if applicable)
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Expected Output:

✓ faster-whisper installed
✓ PyTorch 2.1.0+ installed
✓ Audio metadata libraries installed
CUDA available: True  # or False if no GPU

2. Test Audio Transcription¶

Create a test script to verify transcription works:

# Create test script
cat > test_audio.py << 'EOF'
from file_organizer.services.audio.transcriber import AudioTranscriber, ModelSize, ComputeType
from pathlib import Path
import sys

try:
    # Initialize with smallest model for quick test
    print("Initializing transcriber...")
    transcriber = AudioTranscriber(
        model_size=ModelSize.TINY,
        compute_type=ComputeType.INT8,
        device="cpu"
    )
    print("✓ Transcriber initialized successfully")

    # Test with a sample file if provided
    if len(sys.argv) > 1:
        audio_file = Path(sys.argv[1])
        if audio_file.exists():
            print(f"Transcribing {audio_file.name}...")
            result = transcriber.transcribe(audio_file)
            print(f"✓ Transcription completed")
            print(f"  Language: {result.language}")
            print(f"  Duration: {result.duration:.1f}s")
            print(f"  Text preview: {result.text[:100]}...")
        else:
            print(f"✗ File not found: {audio_file}")
            sys.exit(1)
    else:
        print("✓ Ready to transcribe (pass audio file path as argument)")

except Exception as e:
    print(f"✗ Error: {e}")
    sys.exit(1)
EOF

# Run test (without audio file)
python test_audio.py

# Or test with your own audio file
# python test_audio.py ~/path/to/your/audio.mp3

Expected Output:

Initializing transcriber...
✓ Transcriber initialized successfully
✓ Ready to transcribe (pass audio file path as argument)

3. Test Audio Metadata Extraction¶

# Create metadata test script
cat > test_audio_metadata.py << 'EOF'
from file_organizer.services.audio.metadata_extractor import AudioMetadataExtractor
from pathlib import Path
import sys

if len(sys.argv) < 2:
    print("Usage: python test_audio_metadata.py <audio_file>")
    print("Note: Metadata extraction requires an actual audio file")
    sys.exit(0)

try:
    extractor = AudioMetadataExtractor()
    audio_file = Path(sys.argv[1])

    print(f"Extracting metadata from {audio_file.name}...")
    metadata = extractor.extract(audio_file)

    print("✓ Metadata extraction successful")
    print(f"  Title: {metadata.title or 'N/A'}")
    print(f"  Artist: {metadata.artist or 'N/A'}")
    print(f"  Duration: {metadata.duration:.1f}s")
    print(f"  Bitrate: {metadata.bitrate / 1000:.0f} kbps")
    print(f"  Format: {metadata.format}")

except Exception as e:
    print(f"✗ Error: {e}")
    sys.exit(1)
EOF

# This requires an actual audio file to test
echo "✓ Metadata test script created (run with: python test_audio_metadata.py <audio_file>)"

Video Processing Verification¶

1. Verify Video Dependencies¶

# Test OpenCV installation
python -c "import cv2; print(f'✓ OpenCV {cv2.__version__} installed')"

# Test PySceneDetect installation
python -c "import scenedetect; print(f'✓ PySceneDetect {scenedetect.__version__} installed')"

# Test video capture capability
python -c "import cv2; cap = cv2.VideoCapture(); print('✓ Video capture available')"

Expected Output:

✓ OpenCV 4.8.0+ installed
✓ PySceneDetect 0.6.0+ installed
✓ Video capture available

2. Test Video Scene Detection¶

Create a test script to verify scene detection works:

# Create test script
cat > test_video.py << 'EOF'
from file_organizer.services.video.scene_detector import SceneDetector, DetectionMethod
from pathlib import Path
import sys

try:
    # Initialize detector
    print("Initializing scene detector...")
    detector = SceneDetector(
        method=DetectionMethod.CONTENT,
        threshold=27.0
    )
    print("✓ Scene detector initialized successfully")

    # Test with a sample file if provided
    if len(sys.argv) > 1:
        video_file = Path(sys.argv[1])
        if video_file.exists():
            print(f"Detecting scenes in {video_file.name}...")
            result = detector.detect_scenes(video_file)
            print(f"✓ Scene detection completed")
            print(f"  Duration: {result.total_duration:.1f}s")
            print(f"  FPS: {result.fps:.2f}")
            print(f"  Scenes detected: {len(result.scenes)}")
            if result.scenes:
                print(f"  First scene: {result.scenes[0].start_time:.2f}s - {result.scenes[0].end_time:.2f}s")
        else:
            print(f"✗ File not found: {video_file}")
            sys.exit(1)
    else:
        print("✓ Ready to detect scenes (pass video file path as argument)")

except Exception as e:
    print(f"✗ Error: {e}")
    sys.exit(1)
EOF

# Run test (without video file)
python test_video.py

# Or test with your own video file
# python test_video.py ~/path/to/your/video.mp4

Expected Output:

Initializing scene detector...
✓ Scene detector initialized successfully
✓ Ready to detect scenes (pass video file path as argument)

3. Test Video Metadata Extraction¶

# Create metadata test script
cat > test_video_metadata.py << 'EOF'
from file_organizer.services.video.metadata_extractor import VideoMetadataExtractor, resolution_label
from pathlib import Path
import sys

if len(sys.argv) < 2:
    print("Usage: python test_video_metadata.py <video_file>")
    print("Note: Metadata extraction requires an actual video file")
    sys.exit(0)

try:
    extractor = VideoMetadataExtractor()
    video_file = Path(sys.argv[1])

    print(f"Extracting metadata from {video_file.name}...")
    metadata = extractor.extract(video_file)

    print("✓ Metadata extraction successful")
    print(f"  Size: {metadata.file_size / 1024 / 1024:.1f} MB")
    print(f"  Format: {metadata.format}")
    print(f"  Duration: {metadata.duration:.1f}s")
    print(f"  Resolution: {metadata.width}x{metadata.height} ({resolution_label(metadata.width, metadata.height)})")
    print(f"  FPS: {metadata.fps:.2f}")
    print(f"  Codec: {metadata.codec}")
    print(f"  Bitrate: {metadata.bitrate / 1000:.0f} kbps")

except Exception as e:
    print(f"✗ Error: {e}")
    sys.exit(1)
EOF

# This requires an actual video file to test
echo "✓ Metadata test script created (run with: python test_video_metadata.py <video_file>)"

Integration Verification¶

Test the complete audio/video processing pipeline:

# Create integration test script
cat > test_integration.py << 'EOF'
import sys
from pathlib import Path

def test_audio_video_integration():
    """Test that audio and video processing can be imported together"""
    try:
        # Import audio components
        from file_organizer.services.audio.transcriber import AudioTranscriber
        from file_organizer.services.audio.metadata_extractor import AudioMetadataExtractor
        print("✓ Audio processing modules imported")

        # Import video components
        from file_organizer.services.video.scene_detector import SceneDetector
        from file_organizer.services.video.metadata_extractor import VideoMetadataExtractor
        print("✓ Video processing modules imported")

        # Test initialization
        audio_transcriber = AudioTranscriber()
        audio_metadata = AudioMetadataExtractor()
        video_detector = SceneDetector()
        video_metadata = VideoMetadataExtractor()
        print("✓ All processors initialized successfully")

        print("\n✓ Integration test PASSED")
        print("  Audio and video processing are ready to use")
        return True

    except Exception as e:
        print(f"\n✗ Integration test FAILED: {e}")
        return False

if __name__ == "__main__":
    success = test_audio_video_integration()
    sys.exit(0 if success else 1)
EOF

# Run integration test
python test_integration.py

Expected Output:

✓ Audio processing modules imported
✓ Video processing modules imported
✓ All processors initialized successfully

✓ Integration test PASSED
  Audio and video processing are ready to use

Quick Verification Command¶

Run all basic checks at once:

# One-line verification
python -c "
import sys
try:
    from faster_whisper import WhisperModel
    import torch, cv2, scenedetect
    from file_organizer.services.audio.transcriber import AudioTranscriber
    from file_organizer.services.video.scene_detector import SceneDetector
    print('✓ All audio/video dependencies installed')
    print(f'  - PyTorch: {torch.__version__}')
    print(f'  - OpenCV: {cv2.__version__}')
    print(f'  - PySceneDetect: {scenedetect.__version__}')
    print(f'  - CUDA: {torch.cuda.is_available()}')
except ImportError as e:
    print(f'✗ Missing dependency: {e}')
    sys.exit(1)
"

Expected Output:

✓ All audio/video dependencies installed
  - PyTorch: 2.1.0+
  - OpenCV: 4.8.0+
  - PySceneDetect: 0.6.0+
  - CUDA: True/False

Troubleshooting Verification Issues¶

If any verification step fails, refer to the troubleshooting sections in the Audio Transcription and Video Analysis sections above, or check:

Import errors: Reinstall dependencies with pip install -e ".[audio]" or pip install -e ".[video]"
FFmpeg not found: Install FFmpeg following instructions in the respective sections
CUDA errors: Update GPU drivers or use CPU mode
File not found: Ensure file paths are correct and files exist
Permission errors: Check file/directory permissions

Cleanup Test Scripts¶

After verification, remove test scripts:

rm -f test_audio.py test_audio_metadata.py test_video.py test_video_metadata.py test_integration.py

Next Steps¶

Audio Transcription: See the audio section above for speech-to-text capabilities
Audio Classification: Learn about automatic audio type detection (music, podcast, etc.)
Integration: Combine audio and video analysis in organization workflows
Advanced: Explore custom scene detection algorithms and ML-based classification

For more information, see: - User Guide - General usage patterns - Dependencies - Installation and requirements - Audio and Video Setup - This guide