Facility 080478 - Facility Scrapers

Stale Data Warning: This facility has not been successfully scraped in 81 days (threshold: 3 days). Data may be outdated.

Facility Information active

Facility ID: 080478
Name: C&T Storage
URL: https://www.candtstorage.com/

Address: N/A
Platform: custom_facility_080478
Parser File: src/parsers/custom/facility_080478_parser.py

Last Scraped: 2026-03-23 03:21:51.319900
Created: 2026-03-06 23:45:35.865957
Updated: 2026-03-23 03:21:51.319900

Parser & Healing Diagnosis needs_fix

Parser Status: ⚠ Needs Fix
Status Reason: Parser returned 0 units

Last Healing Attempt: Not attempted

Parser Source (src/parsers/custom/facility_080478_parser.py)

"""Parser for C&T Storage (Google Sites) facility.

This is a Google Sites page that lists storage unit sizes with descriptions
but no pricing. Sizes are split across separate spans inside CjVfdc containers
within h3 headings. The description for each unit appears in a sibling element.
"""

from __future__ import annotations

import re

from bs4 import BeautifulSoup

from src.parsers.base import BaseParser, ParseResult, UnitResult

# Stops that indicate the end of the unit listing section
_STOP_TEXTS = {"larger sizes or outside parking", "rv parking", "indoor & outside storage", "we're here for you"}

# Regex to detect a valid size pattern like "5 X 8" or "10 X 20"
_SIZE_PATTERN = re.compile(r"^\d+\s*X\s*\d+", re.IGNORECASE)


class Facility080478Parser(BaseParser):
    """Extract storage units from C&T Storage (candtstorage.com).

    The site is built on Google Sites. Unit sizes are rendered as three
    separate spans (width, "X", length) inside a ``div.CjVfdc`` container
    within an h3 heading. An optional description follows in a sibling div.

    No pricing data is available on this page.
    """

    platform = "custom_facility_080478"

    def parse(self, html: str, url: str = "") -> ParseResult:
        soup = BeautifulSoup(html, "lxml")
        result = ParseResult(platform=self.platform, parser_name=self.__class__.__name__)

        # Each unit entry is an h3 element with class CDt4Ke that contains
        # a div.CjVfdc with the size spans inside.
        h3_units = soup.find_all("h3", class_="CDt4Ke")

        for h3 in h3_units:
            container = h3.find("div", class_="CjVfdc")
            if not container:
                continue

            spans = container.find_all("span")
            # Filter out empty spans (Google Sites adds empty decorative spans)
            span_texts = [s.get_text(strip=True) for s in spans if s.get_text(strip=True)]

            # Reconstruct size text from spans: expect [width, "X", length, ...]
            # Filter to spans that look like a size pattern
            size_text = " ".join(span_texts).strip()

            # Skip non-unit headings (section titles, footer text, etc.)
            if not _SIZE_PATTERN.match(size_text):
                text_lower = size_text.lower()
                if "rv parking" in text_lower:
                    # Include RV Parking as a special unit type
                    unit = UnitResult(
                        size="RV Parking",
                        description="RV Parking",
                        url=url,
                    )
                    result.units.append(unit)
                continue

            # Parse the size: e.g. "5 X 8 Storage" → "5x8"
            size_match = re.match(r"(\d+)\s*X\s*(\d+)", size_text, re.IGNORECASE)
            if not size_match:
                continue

            width = float(size_match.group(1))
            length = float(size_match.group(2))
            normalized_size = f"{int(width)}x{int(length)}"
            w, ln, sq = self.normalize_size(normalized_size)

            # span_texts (empty-filtered) is now [width, "X", length] or [width, "X", length, type]
            # Extract unit type label (e.g. "Storage") if present as the 4th span
            unit_type = span_texts[3] if len(span_texts) > 3 else ""

            # Find the sibling description element.
            # Structure: h3 → (grandparent chain) → unnamed div with 2 children:
            #   child[0] = the h3 wrapper chain, child[1] = description div
            description = ""
            try:
                # Walk up: h3.parent (tyJCtd div) → jXK9ad-SmKAyb → hJDwNd... → oKdM2c → unnamed div
                ancestor = h3.parent.parent.parent.parent.parent
                sibling_children = [
                    c for c in ancestor.children if hasattr(c, "get_text")
                ]
                if len(sibling_children) > 1:
                    description = sibling_children[1].get_text(strip=True)
            except (AttributeError, IndexError):
                pass

            # Build the display size label
            display_size = f"{int(width)}' x {int(length)}'"
            if unit_type and unit_type.lower() not in ("storage",):
                display_size = f"{display_size} {unit_type}"

            unit = UnitResult(
                size=display_size,
                description=description or unit_type or None,
                url=url,
                metadata={
                    "width": w,
                    "length": ln,
                    "sqft": sq,
                },
            )
            result.units.append(unit)

        if not result.units:
            result.warnings.append("No units found on page")

        return result

Stage	Duration
Fetch	2144ms
Detect	30ms
Parse	14ms
Export	3ms

Facility: 080478

Scrape Runs (5)

Run #168 Details

All Failures for this Facility (5)

HTML Snapshot — Run #168