Facility 043481 - Facility Scrapers

Stale Data Warning: This facility has not been successfully scraped in 81 days (threshold: 3 days). Data may be outdated.

Facility Information active

Facility ID: 043481
Name: Waitsburg Storage
URL: http://www.waitsburgstorage.com/

Address: N/A
Platform: custom_facility_043481
Parser File: src/parsers/custom/facility_043481_parser.py

Last Scraped: 2026-03-23 03:17:17.826847
Created: 2026-03-06 23:45:35.865957
Updated: 2026-03-23 03:17:17.843629

Parser & Healing Diagnosis working

Parser Status: ✓ Working
Status Reason: N/A

Last Healing Attempt: Not attempted

Parser Source (src/parsers/custom/facility_043481_parser.py)

"""Parser for Waitsburg Town & Country Storage (facility 043481).

This is a simple static HTML page that lists available unit sizes as plain
text but does not publish pricing information. The parser extracts the size
offerings from the descriptive paragraph so the facility is represented in
the database even though no prices can be collected.
"""

from __future__ import annotations

import re

from bs4 import BeautifulSoup

from src.parsers.base import BaseParser, ParseResult, UnitResult


class Facility043481Parser(BaseParser):
    """Extract storage unit sizes from Waitsburg Town & Country Storage.

    The page contains a sentence such as:
        "We have storage units in the following sizes: 5 x 10, 10 x 10, 15 x 10 & 20 x 10."

    No pricing is published on the site, so ``price`` and ``sale_price`` are
    left as ``None`` and a warning is recorded.
    """

    platform = "custom_facility_043481"

    # Matches dimension tokens like "5 x 10", "10 x 10", "15 x 10", "20 x 10"
    _SIZE_RE = re.compile(
        r"(\d+(?:\.\d+)?)\s*[xX\u00d7]\s*(\d+(?:\.\d+)?)",
    )

    def parse(self, html: str, url: str = "") -> ParseResult:
        soup = BeautifulSoup(html, "lxml")
        result = ParseResult(platform=self.platform, parser_name=self.__class__.__name__)

        # Find the paragraph/cell that mentions unit sizes; skip <style>/<script> elements
        size_text: str | None = None
        for element in soup.find_all(string=re.compile(r"sizes?", re.IGNORECASE)):
            parent = element.parent
            if not parent:
                continue
            if parent.name in ("style", "script"):
                continue
            size_text = parent.get_text(separator=" ", strip=True)
            break

        if not size_text:
            result.warnings.append("Could not locate a unit-sizes description on the page")
            return result

        for match in self._SIZE_RE.finditer(size_text):
            width = float(match.group(1))
            length = float(match.group(2))
            size_label = f"{int(width)}' x {int(length)}'"

            unit = UnitResult(
                size=size_label,
                description=f"Unit size {size_label} — no pricing published on site",
                price=None,
                sale_price=None,
                metadata={
                    "width": width,
                    "length": length,
                    "sqft": width * length,
                    "no_pricing": True,
                },
                url=url or None,
            )
            result.units.append(unit)

        if result.units:
            result.warnings.append(
                "No pricing information is published on this facility's website; "
                "unit sizes were extracted but all price fields are None."
            )
        else:
            result.warnings.append("Size description found but no dimension patterns matched")

        return result

Stage	Duration
Fetch	3869ms
Detect	1ms
Parse	1ms
Export	7ms

Facility: 043481

Scrape Runs (4)

Run #961 Details

Parsed Units (4)

5' x 10'

10' x 10'

15' x 10'

20' x 10'

HTML Snapshot — Run #961