Facility 001228 - Facility Scrapers

Stale Data Warning: This facility has not been successfully scraped in 76 days (threshold: 3 days). Data may be outdated.

Facility Information active

Facility ID: 001228
Name: Copley Mini Storage
URL: http://www.copleystorageunits.com/copy-of-home

Address: 1020 Jacoby Rd, Copley, OH 44321, USA, Copley, Ohio 44321
Platform: custom_facility_001228
Parser File: src/parsers/custom/facility_001228_parser.py

Last Scraped: 2026-03-27 13:49:53.399410
Created: 2026-03-14 16:21:53.706708
Updated: 2026-03-27 13:49:53.430938

Parser & Healing Diagnosis working

Parser Status: ✓ Working
Status Reason: N/A

Last Healing Attempt: Not attempted

Parser Source (src/parsers/custom/facility_001228_parser.py)

"""Parser for Copley Mini Storage."""

from __future__ import annotations

import re

from bs4 import BeautifulSoup

from src.parsers.base import BaseParser, ParseResult, UnitResult


class Facility001228Parser(BaseParser):
    """Extract storage units from Copley Mini Storage.

    Units are displayed in a photo gallery on the /copy-of-home page.
    Each gallery item has a caption-title (size like '5x5') and
    caption-text (description with sq ft). No prices are listed online.
    """

    platform = "custom_facility_001228"

    _SQFT_RE = re.compile(r"(\d+)\s*sq\s*ft", re.IGNORECASE)

    def parse(self, html: str, url: str = "") -> ParseResult:
        soup = BeautifulSoup(html, "lxml")
        result = ParseResult(platform=self.platform, parser_name=self.__class__.__name__)

        # Find the "Unit Sizes and Description" photo gallery
        # It uses the older gallery format with <ul> lists
        gallery = None
        for g in soup.find_all("div", class_="dmPhotoGallery"):
            # The unit gallery has caption-titles like "5x5", "10x10"
            titles = g.find_all("h3", class_="caption-title")
            for t in titles:
                text = t.get_text(strip=True)
                if re.match(r"\d+[xX]\d+", text):
                    gallery = g
                    break
            if gallery:
                break

        if not gallery:
            result.warnings.append("No unit gallery found")
            return result

        seen: set[str] = set()

        for thumb in gallery.find_all("li", class_="photoGalleryThumbs"):
            title_tag = thumb.find("h3", class_="caption-title")
            if not title_tag:
                continue

            size_text = title_tag.get_text(strip=True)
            if not re.match(r"\d+[xX×]\d+", size_text):
                continue

            if size_text.upper() in seen:
                continue
            seen.add(size_text.upper())

            # Normalize size format (e.g., "15X30" -> "15x30")
            normalized_size = re.sub(r"[xX×]", "x", size_text)

            unit = UnitResult()
            unit.size = normalized_size

            w, ln, sq = self.normalize_size(normalized_size)
            if w is not None:
                unit.metadata = {"width": w, "length": ln, "sqft": sq}

            # Extract description from caption-text
            caption_div = thumb.find("div", class_="caption-text")
            if caption_div:
                desc = caption_div.get_text(strip=True)
                unit.description = desc

                # Try to extract sq ft from description
                sqft_match = self._SQFT_RE.search(desc)
                if sqft_match and unit.metadata:
                    unit.metadata["sqft_listed"] = int(sqft_match.group(1))

            result.units.append(unit)

        if not result.units:
            result.warnings.append("No units found in gallery")

        return result

Stage	Duration
Fetch	6168ms
Detect	22ms
Parse	10ms
Export	15ms

Facility: 001228

Scrape Runs (5)

Run #202 Details

Parsed Units (8)

5x5

10x15

15x30

5x10

10x20

20x30

10x10

10x30

HTML Snapshot — Run #202