Facility: 085996

Stor-It Chicago St

Stale Data Warning: This facility has not been successfully scraped in 30 days (threshold: 3 days). Data may be outdated.
⚠ Unit Count Anomaly (Critical): Current run has 0 units, expected baseline is 21 (-100.0% change, delta: -21).
Facility Information active
Facility ID
085996
Name
Stor-It Chicago St
URL
https://www.stor-it.com/location/USA/ID/Caldwell/stor-it-chicago-st/
Address
N/A
Platform
custom_facility_085996
Parser File
src/parsers/custom/facility_085996_parser.py
Last Scraped
2026-03-23 03:16:40.191537
Created
2026-03-06 23:45:35.865957
Updated
2026-03-23 03:16:40.191537
Parser & Healing Diagnosis needs_fix
Parser Status
⚠ Needs Fix
Status Reason
Parser returned 0 units
Last Healing Attempt
Not attempted
Parser Source (src/parsers/custom/facility_085996_parser.py)
"""Parser for Stor-It Self Storage - Chicago St., Caldwell ID (Candee WordPress plugin).

The facility page loads unit listings via a WordPress AJAX call using the
Candee self-storage plugin combined with the WP Speed Ninja (wpspdn) script
optimiser.  The wpspdn plugin blocks all scripts until a user-interaction
event fires (mouseover, keydown, etc.), so a plain page load will only show
a spinner.  A Selenium snapshot captured *after* triggering a synthetic
interaction event will contain fully-rendered unit rows.

Unit data is embedded directly in ``<div class="unitsList lineItem unitMasterData …">``
element attributes:

  - ``data-price``   — monthly price as a decimal string (e.g. ``"68.00"``)
  - ``data-size``    — square footage (e.g. ``"350"``)
  - ``data-features`` — JSON array of feature strings (e.g.
                        ``["Parking","Ground Level","Large","Rent Now"]``)

The unit name and dimensions are extracted from the ``data-name`` attribute
on the preview anchor inside each row (e.g. ``"10x35 Parking"``).

Availability is inferred from the ``available-{N}`` CSS class on the
``.unitsCellRight`` child:

  - ``available-0``   → "Sold out"
  - ``available-N``   (1 ≤ N ≤ 3) → "Only N left"
  - ``available-999`` → available (no scarcity label)

Promotions are extracted from ``div.unitOfferContainer`` when present.

Facility URL: https://www.stor-it.com/location/USA/ID/Caldwell/stor-it-chicago-st/
Candee prop_id: 17223
"""

from __future__ import annotations

import json
import logging
import re

from bs4 import BeautifulSoup

from src.parsers.base import BaseParser, ParseResult, UnitResult

logger = logging.getLogger(__name__)

# Features that indicate unit actions rather than physical characteristics
_ACTION_FEATURES = frozenset({"Rent Now", "Move In", "Select", "Call Us"})

# Features that indicate climate control
_CLIMATE_FEATURES = frozenset({"Climate Control", "Climate Controlled", "Heated", "Cooled"})

# Features that indicate drive-up / exterior access
_DRIVEUP_FEATURES = frozenset({"Drive Up", "Drive-Up", "Exterior Door", "rollup"})

# Parking/vehicle storage features
_PARKING_FEATURES = frozenset({"Parking"})


def _parse_unit_rows(html: str, url: str) -> list[UnitResult]:
    """Extract ``UnitResult`` objects from Candee unit listing HTML.

    Each unit row is a ``<div class="unitsList lineItem unitMasterData …">``
    element.  All key data lives in HTML attributes on the row div and on
    child elements.
    """
    soup = BeautifulSoup(html, "lxml")
    units: list[UnitResult] = []

    rows = soup.find_all(
        "div",
        class_=lambda c: c and "unitsList" in c and "lineItem" in c,
    )

    for row in rows:
        # --- Unit name and dimensions from the preview anchor data-name ---
        preview_anchor = row.find("a", class_="preview", attrs={"data-name": True})
        data_name: str = (preview_anchor.get("data-name", "") if preview_anchor else "").strip()

        # Parse dimensions from the name string (e.g. "10x35 Parking")
        dim_match = re.search(r"(\d+)\s*[xX]\s*(\d+)", data_name)
        if dim_match:
            width = int(dim_match.group(1))
            length = int(dim_match.group(2))
            size_str = f"{width}' x {length}'"
            sqft = width * length
        else:
            width = length = sqft = None  # type: ignore[assignment]
            # Fall back to sqft from data-size attribute
            raw_sqft = row.get("data-size", "")
            size_str = f"{raw_sqft} sqft" if raw_sqft else ""

        # --- Unit type description (everything after the dimensions) ---
        description: str = ""
        if data_name:
            # Strip dimension prefix to get the unit type (e.g. "Parking", "Self Storage")
            description = re.sub(r"^\d+\s*[xX]\s*\d+\s*", "", data_name).strip()

        # --- Price ---
        price: float | None = None
        raw_price = row.get("data-price", "")
        if raw_price:
            try:
                price = float(raw_price.replace(",", ""))
            except (ValueError, TypeError):
                price = None

        # --- Availability / scarcity from unitsCellRight class ---
        scarcity: str | None = None
        right_cell = row.find(class_=re.compile(r"unitsCellRight"))
        if right_cell:
            rc_classes = " ".join(right_cell.get("class", []))
            avail_match = re.search(r"available-(\d+)", rc_classes)
            if avail_match:
                avail_num = int(avail_match.group(1))
                if avail_num == 0:
                    scarcity = "Sold out"
                elif 1 <= avail_num <= 3:
                    scarcity = f"Only {avail_num} left"

        # --- Promotion text ---
        promotion: str | None = None
        offer_el = row.find(class_="unitOfferContainer")
        if offer_el:
            offer_text = offer_el.get_text(separator=" ", strip=True)
            if offer_text:
                promotion = offer_text

        # --- Features ---
        features: list[str] = []
        feat_str = row.get("data-features", "[]")
        try:
            feat_raw = json.loads(feat_str)
            features = [f for f in feat_raw if f not in _ACTION_FEATURES]
        except (json.JSONDecodeError, TypeError):
            pass

        # Derive amenity flags from features list
        climate_control = bool(set(features) & _CLIMATE_FEATURES)
        drive_up = bool(set(features) & _DRIVEUP_FEATURES)
        is_parking = bool(set(features) & _PARKING_FEATURES)

        unit = UnitResult(
            size=size_str or None,
            description=description or None,
            price=price,
            sale_price=None,
            promotion=promotion,
            scarcity=scarcity,
            url=url,
            metadata={
                "width": width,
                "length": length,
                "sqft": sqft or (width * length if width and length else None),
                "unit_type": description or None,
                "features": features,
                "climate_control": climate_control,
                "drive_up": drive_up,
                "is_parking": is_parking,
                "property_id": row.get("data-facility"),
            },
        )
        units.append(unit)

    return units


class Facility085996Parser(BaseParser):
    """Extract storage units from Stor-It Self Storage - Chicago St. (Candee plugin).

    The page uses the Candee WordPress plugin which loads unit listings
    asynchronously via a WP AJAX call, gated behind the WP Speed Ninja
    optimiser.  The snapshot must be captured *after* triggering a synthetic
    user interaction to unblock the JavaScript and allow the AJAX response to
    populate the DOM.

    Parses ``<div class="unitsList lineItem unitMasterData …">`` elements and
    extracts pricing and availability from their HTML attributes.

    Facility URL:
        https://www.stor-it.com/location/USA/ID/Caldwell/stor-it-chicago-st/
    """

    platform = "custom_facility_085996"

    def parse(self, html: str, url: str = "") -> ParseResult:
        result = ParseResult(platform=self.platform, parser_name=self.__class__.__name__)

        units = _parse_unit_rows(html, url)

        if not units:
            result.warnings.append(
                "No unit rows found. The snapshot may have been captured before the Candee "
                "AJAX content loaded. Re-fetch with a synthetic user-interaction event to "
                "allow the WP Speed Ninja plugin to unblock the Candee scripts."
            )
            return result

        result.units = units
        return result

Scrape Runs (4)

Run #954 Details

Status
exported
Parser Used
Facility085996Parser
Platform Detected
unknown
Units Found
0
Stage Reached
exported
Timestamp
2026-03-21 19:09:26.201702
Timing
Stage Duration
Fetch1868ms
Detect4ms
Parse1ms
Export5ms

Snapshot: 085996_20260321T190928Z.html · Show Snapshot · Open in New Tab

No units found in this run.

All Failures for this Facility (2)

parse _WarningAsException scraper no_units_extracted warning Run #N/A | 2026-03-23 03:16:40.185334

No units extracted for 085996

Stack trace
src.reporting.failure_reporter._WarningAsException: No units extracted for 085996
parse _WarningAsException scraper no_units_extracted warning Run #N/A | 2026-03-21 19:09:28.099438

No units extracted for 085996

Stack trace
src.reporting.failure_reporter._WarningAsException: No units extracted for 085996

← Back to dashboard