Facility: 043481

Waitsburg Storage

Stale Data Warning: This facility has not been successfully scraped in 30 days (threshold: 3 days). Data may be outdated.
Facility Information active
Facility ID
043481
Name
Waitsburg Storage
URL
http://www.waitsburgstorage.com/
Address
N/A
Platform
custom_facility_043481
Parser File
src/parsers/custom/facility_043481_parser.py
Last Scraped
2026-03-23 03:17:17.826847
Created
2026-03-06 23:45:35.865957
Updated
2026-03-23 03:17:17.843629
Parser & Healing Diagnosis working
Parser Status
✓ Working
Status Reason
N/A
Last Healing Attempt
Not attempted
Parser Source (src/parsers/custom/facility_043481_parser.py)
"""Parser for Waitsburg Town & Country Storage (facility 043481).

This is a simple static HTML page that lists available unit sizes as plain
text but does not publish pricing information. The parser extracts the size
offerings from the descriptive paragraph so the facility is represented in
the database even though no prices can be collected.
"""

from __future__ import annotations

import re

from bs4 import BeautifulSoup

from src.parsers.base import BaseParser, ParseResult, UnitResult


class Facility043481Parser(BaseParser):
    """Extract storage unit sizes from Waitsburg Town & Country Storage.

    The page contains a sentence such as:
        "We have storage units in the following sizes: 5 x 10, 10 x 10, 15 x 10 & 20 x 10."

    No pricing is published on the site, so ``price`` and ``sale_price`` are
    left as ``None`` and a warning is recorded.
    """

    platform = "custom_facility_043481"

    # Matches dimension tokens like "5 x 10", "10 x 10", "15 x 10", "20 x 10"
    _SIZE_RE = re.compile(
        r"(\d+(?:\.\d+)?)\s*[xX\u00d7]\s*(\d+(?:\.\d+)?)",
    )

    def parse(self, html: str, url: str = "") -> ParseResult:
        soup = BeautifulSoup(html, "lxml")
        result = ParseResult(platform=self.platform, parser_name=self.__class__.__name__)

        # Find the paragraph/cell that mentions unit sizes; skip <style>/<script> elements
        size_text: str | None = None
        for element in soup.find_all(string=re.compile(r"sizes?", re.IGNORECASE)):
            parent = element.parent
            if not parent:
                continue
            if parent.name in ("style", "script"):
                continue
            size_text = parent.get_text(separator=" ", strip=True)
            break

        if not size_text:
            result.warnings.append("Could not locate a unit-sizes description on the page")
            return result

        for match in self._SIZE_RE.finditer(size_text):
            width = float(match.group(1))
            length = float(match.group(2))
            size_label = f"{int(width)}' x {int(length)}'"

            unit = UnitResult(
                size=size_label,
                description=f"Unit size {size_label} — no pricing published on site",
                price=None,
                sale_price=None,
                metadata={
                    "width": width,
                    "length": length,
                    "sqft": width * length,
                    "no_pricing": True,
                },
                url=url or None,
            )
            result.units.append(unit)

        if result.units:
            result.warnings.append(
                "No pricing information is published on this facility's website; "
                "unit sizes were extracted but all price fields are None."
            )
        else:
            result.warnings.append("Size description found but no dimension patterns matched")

        return result

Scrape Runs (4)

Run #961 Details

Status
exported
Parser Used
Facility043481Parser
Platform Detected
table_layout
Units Found
4
Stage Reached
exported
Timestamp
2026-03-21 19:10:02.486012
Timing
Stage Duration
Fetch3869ms
Detect1ms
Parse1ms
Export7ms

Snapshot: 043481_20260321T191006Z.html · Show Snapshot · Open in New Tab

Parsed Units (4)

5' x 10'

No price

10' x 10'

No price

15' x 10'

No price

20' x 10'

No price

← Back to dashboard