Commit Graph

29 Commits

Author SHA1 Message Date
gilesb
d685518c4c Remove Redis fallbacks - database only, no silent failures
- Database is the ONLY source of truth for cache_id -> ipfs_cid
- Removed Redis caching layer entirely
- Failures will raise exceptions instead of warning and continuing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 04:23:28 +00:00
gilesb
529c173722 Use database for cache_id -> ipfs_cid mapping
- Database (cache_items table) is now source of truth
- Redis used as fast cache on top
- Mapping persists across restarts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 04:22:28 +00:00
gilesb
d7d7cd28c2 Store cache items by IPFS CID, index by cache_id
- Files in /data/cache/nodes/ are now stored by IPFS CID only
- cache_id parameter creates index from cache_id -> IPFS CID
- Removed deprecated node_id parameter behavior
- get_by_cid(cache_id) still works via index lookup

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 04:20:34 +00:00
gilesb
c46fcd2308 Make IPFS upload failures fatal - no local hash fallback
IPFS CIDs are the primary identifiers. If IPFS upload fails,
the operation must fail rather than silently using local hashes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 04:17:34 +00:00
gilesb
6c4b850487 Add COMPOUND node handling and fix cache lookups by code-addressed hash
- Add COMPOUND node handling in execute_recipe for collapsed effect chains
- Index cache entries by node_id (cache_id) when different from IPFS CID
- Fix test_cache_manager.py to unpack put() tuple returns

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-13 01:47:20 +00:00
gilesb
3e3df6ff2a Code-addressed node IDs and remove JSON index files
- Compiler now generates SHA3-256 hashes for node IDs
- Each hash includes type, config, and input hashes (Merkle tree)
- Same plan = same hashes = automatic cache reuse

Cache changes:
- Remove index.json - filesystem IS the index
- Files at {cache_dir}/{hash}/output.* are source of truth
- Per-node metadata.json for optional stats (not an index)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 22:38:50 +00:00
gilesb
faf794ef35 Use IPFS as universal fallback for content lookup
When content isn't found in local cache, fetch directly from IPFS
using the CID. IPFS is the source of truth for all content-addressed data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 21:20:28 +00:00
gilesb
a4bf0eae24 Add filesystem fallback when artdag Cache lookup fails
The artdag Cache object doesn't persist state across process restarts,
so cache.get(node_id) returns None even when files exist on disk.

Now we check the filesystem directly at {cache_dir}/nodes/{node_id}/output.*
when the in-memory cache lookup fails but we have a valid node_id from
the Redis index.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 21:18:20 +00:00
gilesb
f67aacdceb Add detailed logging to cache_manager put and get_by_cid
Debug why recipes are not found in cache after upload.
Logs now show each step of put() and get_by_cid().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 21:07:09 +00:00
gilesb
8bf6f87c2a Implement ownership model for all cached content deletion
- cache_service.delete_content: Remove user's ownership link first,
  only delete actual file if no other owners remain

- cache_manager.discard_activity_outputs_only: Check if outputs and
  intermediates are used by other activities before deleting

- run_service.discard_run: Now cleans up run outputs/intermediates
  (only if not shared by other runs)

- home.py clear_user_data: Use ownership model for effects and media
  deletion instead of directly deleting files

The ownership model ensures:
1. Multiple users can "own" the same cached content
2. Deleting removes the user's ownership link (item_types entry)
3. Actual files only deleted when no owners remain (garbage collection)
4. Shared intermediates between runs are preserved

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 20:02:27 +00:00
gilesb
c5c7e5e162 Fix file_hash called after move in cache_manager.put
The dual-indexing code was calling file_hash(source_path) after
cache.put(move=True) had already moved the file, causing
"No such file or directory" errors on upload.

Now computes local_hash before the move operation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 19:07:45 +00:00
gilesb
ee8719ac0b Fix media friendly names, metadata display, output recording, and plan display
- Add friendly name display to media detail and list pages
- Unpack nested meta fields to top level for template access
- Fix output_cid mismatch: use IPFS CID consistently between cache and database
- Add dual-indexing in cache_manager to map both IPFS CID and local hash
- Fix plan display: accept IPFS CIDs (Qm..., bafy...) not just 64-char hashes
- Add friendly names to recipe listing
- Add recipe upload button and handler to recipes list
- Add debug logging to recipe listing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 14:21:39 +00:00
gilesb
98ca2a6c81 Fix recipe listing, effects count, and add nav counts to all pages
- Fix list_by_type to return node_id (IPFS CID) instead of local hash
- Fix effects count on home page (count from _effects/ directory)
- Add nav_counts to all page templates (recipes, effects, runs, media, storage)
- Add editable metadata section to cache/media detail page
- Show more metadata on recipe detail page (ID, IPFS CID, step count)
- Update tests for new list_by_type behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 13:30:11 +00:00
gilesb
9f8aa54e2b Fix duplicate get_by_cid method shadowing recipe lookup
Bug: Two get_by_cid methods existed in L1CacheManager. The second
definition shadowed the first, breaking recipe lookup because the
comprehensive method (using find_by_cid) was hidden.

- Remove duplicate get_by_cid method (lines 470-494)
- Add regression test to ensure only one get_by_cid exists
- Add tests for template variables and recipe visibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 12:14:59 +00:00
gilesb
f333eeb1e6 Fix infinite recursion in get_by_cid
Remove self-recursive call that caused infinite loop when looking up IPFS CIDs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 08:43:06 +00:00
gilesb
92d26b2b72 Rename content_hash/output_hash to cid throughout
Refactor to use IPFS CID as the primary content identifier:
- Update database schema: content_hash -> cid, output_hash -> output_cid
- Update all services, routers, and tasks to use cid terminology
- Update HTML templates to display CID instead of hash
- Update cache_manager parameter names
- Update README documentation

This completes the transition to CID-only content addressing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 08:02:44 +00:00
gilesb
494a2a8650 Add IPFS CID support for asset lookup
- Upload endpoint returns both CID and content_hash
- Cache manager handles both SHA3-256 hashes and IPFS CIDs
- get_by_cid() fetches from IPFS if not cached locally
- Execute tasks support :cid in addition to :hash

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-12 07:36:18 +00:00
giles
854396680f Refactor storage: remove Redis duplication, use proper data tiers
- Recipes: Now content-addressed only (cache + IPFS), removed Redis storage
- Runs: Completed runs stored in PostgreSQL, Redis only for task_id mapping
- Add list_runs_by_actor() to database.py for paginated run queries
- Add list_by_type() to cache_manager for filtering by node_type
- Fix upload endpoint to return size and filename fields
- Fix recipe run endpoint with proper DAG input binding
- Fix get_run_service() dependency to pass database module

Storage architecture:
- Redis: Ephemeral only (sessions, task mappings with TTL)
- PostgreSQL: Permanent records (completed runs, metadata)
- Cache: Content-addressed files (recipes, media, outputs)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-11 14:05:31 +00:00
gilesb
43788108c0 Fix Celery workers to use Redis for shared cache index
The get_cache_manager() singleton wasn't initializing with Redis,
so workers couldn't see files uploaded via the API server.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 11:25:38 +00:00
gilesb
a0a4c08b9a Use Redis for cache indexes - enables multi-worker scaling
The cache_manager now uses Redis hashes for the content_index and
ipfs_cids mappings. This allows multiple uvicorn workers to share
state, so files added by one worker are immediately visible to all
others.

- Added redis_client parameter to L1CacheManager
- Index lookups check Redis first, then fall back to in-memory
- Index updates go to both Redis and JSON file (backup)
- Migrates existing JSON indexes to Redis on first load
- Re-enabled workers=4 in uvicorn

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-09 04:19:00 +00:00
gilesb
ba244b9ebc Add PostgreSQL + IPFS backend, rename configs to recipes
- Add PostgreSQL database for cache metadata storage with schema for
  cache_items, item_types, pin_reasons, and l2_shares tables
- Add IPFS integration as durable backing store (local cache as hot storage)
- Add postgres and ipfs services to docker-compose.yml
- Update cache_manager to upload to IPFS and track CIDs
- Rename all config references to recipe throughout server.py
- Update API endpoints: /configs/* -> /recipes/*
- Update models: ConfigStatus -> RecipeStatus, ConfigRunRequest -> RecipeRunRequest
- Update UI tabs and pages to show Recipes instead of Configs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 14:58:29 +00:00
gilesb
4639a98231 lists of shares. job deletion only deltes outputs 2026-01-08 03:38:14 +00:00
gilesb
4a99866602 Add config/recipe support for DAG-based jobs
- Add PyYAML dependency for parsing config files
- Add Pydantic models: VariableInput, FixedInput, ConfigStatus, ConfigRunRequest
- Add Redis storage functions for configs
- Add config YAML parsing with variable and fixed input detection
- Add config API endpoints: upload, list, get, delete, run
- Add config UI: Configs tab, list page, detail page with run form
- Add HTMX endpoints for config operations
- Add pinning on publish: configs and their fixed inputs are pinned
  when runs from configs are published to L2
- Clean up debug logging in cache_manager

Config YAML format supports:
- Fixed inputs: resolve asset hashes from registry
- Variable inputs: marked with `input: true`, filled at run time
- DAG definition with nodes and edges
- Registry of assets and effects

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 03:17:50 +00:00
gilesb
f23a721816 Use local pinned metadata for deletion checks instead of L2 API
- Add is_pinned(), pin(), _load_meta(), _save_meta() to L1CacheManager
- Update can_delete() and can_discard_activity() to check local pinned status
- Update run deletion endpoints (API and UI) to check pinned metadata
- Remove L2 shared check fallback from run deletion
- Fix L2SharedChecker to return True on error (safer - prevents accidental deletion)
- Update tests for new pinned behavior

When items are published to L2, the publish flow marks them as pinned
locally. This ensures items remain non-deletable even if L2 is unreachable,
and both outputs AND inputs of published runs are protected.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 02:44:18 +00:00
gilesb
01f6db0621 Add debug logging to cache lookup 2026-01-08 02:11:37 +00:00
gilesb
3e12596dbf Fix cache lookup to work across processes
cache_manager.py:
- get_by_content_hash() now tries direct cache.get(content_hash)
  since uploads use content_hash as node_id
- This works even if cache index hasn't been reloaded

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 02:02:47 +00:00
gilesb
034c7542c4 Fix cache listing to include files from new structure
- Update list_all() to scan cache_dir for legacy files directly
  (old files stored as CACHE_DIR/{hash}, not CACHE_DIR/legacy/)
- Update cache listing endpoints to use cache_manager.list_all()
  instead of iterating CACHE_DIR.iterdir() directly
- This ensures uploaded files appear in the cache UI regardless
  of whether they're in the old or new cache structure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:37:17 +00:00
gilesb
f8ec42b445 Refactor cache access to use cache_manager consistently
- Remove symlink hack from cache_file() - no longer needed
- Add get_cache_path() helper for content_hash lookups
- Update all CACHE_DIR / content_hash patterns to use cache_manager
- Fix cache_manager.get_by_content_hash() to check path.exists()
- Fix legacy path lookup (cache_dir not legacy_dir)
- Update upload endpoint to use cache_manager.put()

This ensures cache lookups work correctly for both legacy files
(stored directly in CACHE_DIR) and new files (stored in nodes/).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 01:21:11 +00:00
gilesb
e4fd5eb010 Integrate artdag cache with deletion rules
- Add cache_manager.py with L1CacheManager wrapping artdag Cache
- Add L2SharedChecker for checking published status via L2 API
- Update server.py to use cache_manager for storage
- Update DELETE /cache/{content_hash} to enforce deletion rules
- Add DELETE /runs/{run_id} endpoint for discarding runs
- Record activities when runs complete for deletion tracking
- Add comprehensive tests for cache manager

Deletion rules enforced:
- Cannot delete items published to L2
- Cannot delete inputs/outputs of runs
- Can delete orphaned items
- Runs can only be discarded if no items are shared

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-08 00:51:18 +00:00