← All articles

Auto-Hash Every File That Hits Your Folder

May 7, 2026·python · automation · tutorial · filesystem

I needed a way to hash files the second they hit my machine. Not when I remember to do it. Not after I've already shared them. The moment they land.

The pattern's simple: watch a folder, hash anything new, store the hash. Later you can anchor those hashes to a blockchain for timestamps. But first you need the hashes captured automatically.

Here's the Python script that does it.

The Watchdog + Hash Pipeline

Two pieces: watchdog monitors filesystem events, verify-proof handles the SHA-256 hashing.

pip install watchdog verify-proof

The watchdog library fires events when files get created, modified, or deleted. We catch the "created" event and immediately hash the new file:

import os
import json
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from verify_proof import hash_file
from datetime import datetime

class HashHandler(FileSystemEventHandler):
    def __init__(self, watch_folder):
        self.watch_folder = watch_folder
        self.hash_log = os.path.join(watch_folder, '.file_hashes.json')
        self.hashes = self.load_existing_hashes()
    
    def load_existing_hashes(self):
        if os.path.exists(self.hash_log):
            with open(self.hash_log, 'r') as f:
                return json.load(f)
        return {}
    
    def save_hashes(self):
        with open(self.hash_log, 'w') as f:
            json.dump(self.hashes, f, indent=2)
    
    def on_created(self, event):
        if not event.is_directory and not event.src_path.endswith('.file_hashes.json'):
            self.hash_new_file(event.src_path)
    
    def hash_new_file(self, filepath):
        try:
            file_hash = hash_file(filepath)
            relative_path = os.path.relpath(filepath, self.watch_folder)
            
            self.hashes[relative_path] = {
                'sha256': file_hash,
                'hashed_at': datetime.now().isoformat(),
                'size_bytes': os.path.getsize(filepath)
            }
            
            self.save_hashes()
            print(f"Hashed: {relative_path} -> {file_hash[:16]}...")
            
        except Exception as e:
            print(f"Failed to hash {filepath}: {e}")

The handler stores everything in a .file_hashes.json file inside the watched folder. Each file gets its SHA-256, timestamp, and size recorded.

Running the Watcher

The main loop starts the observer and keeps it running:

def start_watching(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)
        print(f"Created watch folder: {folder_path}")
    
    handler = HashHandler(folder_path)
    observer = Observer()
    observer.schedule(handler, folder_path, recursive=True)
    
    observer.start()
    print(f"Watching {folder_path} for new files...")
    print("Press Ctrl+C to stop")
    
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
        print("\nStopping watcher...")
    
    observer.join()

if __name__ == "__main__":
    import sys
    import time
    
    watch_folder = sys.argv[1] if len(sys.argv) > 1 else "./watched_files"
    start_watching(watch_folder)

Run it with:

python file_watcher.py /path/to/your/folder

Drop a file into the folder and you'll see:

Hashed: photo_001.jpg -> a1b2c3d4e5f67890...

The script runs until you Ctrl+C it. It catches subdirectories too if you've got nested folders.

What the Hash Log Looks Like

After dropping a few files, your .file_hashes.json looks like this:

{
  "photo_001.jpg": {
    "sha256": "a1b2c3d4e5f67890abcdef1234567890abcdef1234567890abcdef1234567890",
    "hashed_at": "2026-05-07T14:30:15.123456",
    "size_bytes": 2048576
  },
  "documents/contract.pdf": {
    "sha256": "b2c3d4e5f67890a1abcdef1234567890abcdef1234567890abcdef1234567890",
    "hashed_at": "2026-05-07T14:31:22.789012",
    "size_bytes": 1024000
  }
}

Each entry proves the file existed at that hash on that timestamp. The hash is what you'd anchor to a blockchain later.

Verification Against Existing Proofs

If you've already got blockchain timestamp proofs for some files, you can verify them against your hash log:

from verify_proof import load_proof, verify_proof

def verify_existing_proof(hash_log_path, proof_path):
    with open(hash_log_path, 'r') as f:
        hashes = json.load(f)
    
    proof = load_proof(proof_path)
    
    # Find the file that matches this proof
    for filepath, data in hashes.items():
        if data['sha256'] == proof.get('hash'):
            result = verify_proof(data['sha256'], proof)
            
            if result['verified']:
                print(f"✓ {filepath} verified against blockchain proof")
                print(f"  Anchored: {result['anchored_at']}")
                print(f"  Chain: {result['blockchain']}")
            else:
                print(f"✗ {filepath} failed verification: {result.get('error')}")
            return
    
    print("No matching file found for this proof")

This connects your local hash log with blockchain timestamp proofs you've collected.

What's Next

This script handles the "capture everything" part. For the blockchain anchoring, you'd take the hashes from your log file and submit them through a timestamping service.

The pattern scales: point this at your photo import folder, your Git hooks directory, or anywhere files show up that you need timestamped later. The hashes get captured immediately, even if you're offline when the files arrive.

You can extend it to filter file types, skip temporary files, or trigger different actions based on file extensions. The core loop stays the same: watch, hash, log.