Facility 090740 - Facility Scrapers

Stale Data Warning: This facility has not been successfully scraped in 81 days (threshold: 3 days). Data may be outdated.

Facility Information active

Facility ID: 090740
Name: U-Haul Evansville WY
URL: https://www.uhaul.com/Locations/Self-Storage-near-Evansville-WY-82636/792078/

Address: N/A
Platform: custom_facility_090740
Parser File: src/parsers/custom/facility_090740_parser.py

Last Scraped: 2026-03-23 03:21:07.768387
Created: 2026-03-06 23:45:35.865957
Updated: 2026-03-23 03:21:07.776447

Parser & Healing Diagnosis working

Parser Status: ✓ Working
Status Reason: N/A

Last Healing Attempt: Not attempted

Parser Source (src/parsers/custom/facility_090740_parser.py)

"""Parser for U-Haul Storage at Hat Six (Evansville, WY).

U-Haul uses a Foundation-based layout with unit listings inside
``ul.uhjs-unit-list`` elements, one per size category (small, medium, large).
Each unit is a ``li.divider`` containing:
  - Dimensions in an ``h4 span.nowrap`` (e.g. "5' x 10' x 8'")
  - Category label (Small/Medium/Large) in the ``h4`` text
  - Amenities in a ``ul`` with ``list-style-type:disc``
  - Monthly price in ``b.text-2x`` (desktop display)
  - Scarcity in ``span.text-callout.text-bold`` (e.g. "1 Unit Left!")
"""

from __future__ import annotations

import re

from bs4 import BeautifulSoup

from src.parsers.base import BaseParser, ParseResult, UnitResult

# Matches "5' x 10' x 8'" or "10' x 20' x 8'" (width x length x height)
_DIMENSIONS_RE = re.compile(
    r"(\d+(?:\.\d+)?)['\u2019\u2032]\s*[xX\u00d7]\s*"
    r"(\d+(?:\.\d+)?)['\u2019\u2032]\s*[xX\u00d7]\s*"
    r"(\d+(?:\.\d+)?)['\u2019\u2032]"
)

# Matches "1 Unit Left!" or "2 Units Left!"
_SCARCITY_RE = re.compile(r"(\d+)\s+units?\s+left", re.IGNORECASE)


class Facility090740Parser(BaseParser):
    """Extract storage units from U-Haul Storage at Hat Six (facility 090740).

    Units are listed in ``ul.uhjs-unit-list`` elements (one per size category).
    Each ``li.divider`` within that list represents a distinct unit type.
    """

    platform = "custom_facility_090740"

    def parse(self, html: str, url: str = "") -> ParseResult:
        soup = BeautifulSoup(html, "lxml")
        result = ParseResult(platform=self.platform, parser_name=self.__class__.__name__)

        unit_lists = soup.find_all("ul", class_="uhjs-unit-list")
        if not unit_lists:
            result.warnings.append("No uhjs-unit-list elements found on page")
            return result

        for ul in unit_lists:
            items = ul.find_all("li", class_="divider")
            for item in items:
                unit = self._parse_unit_item(item, url)
                if unit is not None:
                    result.units.append(unit)

        if not result.units:
            result.warnings.append("Unit lists found but no units could be extracted")

        return result

    def _parse_unit_item(self, item: BeautifulSoup, url: str) -> UnitResult | None:
        """Parse a single ``li.divider`` unit card."""

        # --- Dimensions ---
        # The h4 contains the category label (Small/Medium/Large) and a span.nowrap
        # with the dimensions string like "5' x 10' x 8'"
        dim_span = item.find("span", class_="nowrap")
        if dim_span is None:
            return None

        dim_text = dim_span.get_text(strip=True)
        dim_match = _DIMENSIONS_RE.search(dim_text)
        if dim_match is None:
            return None

        width = float(dim_match.group(1))
        length = float(dim_match.group(2))
        height = float(dim_match.group(3))

        # Normalise to "WxL" canonical form (drop height — store it in metadata)
        size = self.normalize_size(f"{int(width)}x{int(length)}")

        # --- Category label (Small / Medium / Large) ---
        h4 = item.find("h4")
        category = ""
        if h4:
            # The h4 text before the <br> / <span> is the category
            raw = h4.get_text(separator=" ", strip=True)
            # Strip the dimension part; keep only the leading word(s)
            category = re.split(r"['\d]", raw)[0].strip()

        # --- Description (from the paragraph below the heading) ---
        desc_p = item.find("p", class_="medium-collapse")
        description = ""
        if desc_p:
            spans = desc_p.find_all("span")
            for s in spans:
                txt = s.get_text(strip=True)
                if txt and "Size Guide" not in txt:
                    description = txt
                    break

        # --- Price ---
        # b.text-2x is the desktop-visible monthly price
        price_el = item.find("b", class_="text-2x")
        price: float | None = None
        if price_el:
            raw_price = price_el.get_text(strip=True)
            price = self.normalize_price(raw_price)

        if price is None:
            # Fallback: try b.text-xl (mobile display)
            price_el_xl = item.find("b", class_="text-xl")
            if price_el_xl:
                price = self.normalize_price(price_el_xl.get_text(strip=True))

        # --- Amenities ---
        amenity_ul = item.find("ul", style=re.compile(r"list-style-type\s*:\s*disc", re.I))
        amenities: list[str] = []
        if amenity_ul:
            amenities = [li.get_text(strip=True) for li in amenity_ul.find_all("li") if li.get_text(strip=True)]

        # Derive climate_control and drive_up from amenity list
        amenity_text = " ".join(amenities).lower()
        climate_control = "climate" in amenity_text and "no climate" not in amenity_text
        drive_up = "drive up" in amenity_text

        # --- Scarcity ---
        scarcity_span = item.find("span", class_=re.compile(r"text-callout", re.I))
        scarcity: str | None = None
        if scarcity_span:
            scarcity_text = scarcity_span.get_text(strip=True)
            if _SCARCITY_RE.search(scarcity_text):
                scarcity = scarcity_text

        return UnitResult(
            size=size,
            description=description or (f"{category} unit" if category else ""),
            price=price,
            scarcity=scarcity,
            url=url,
            metadata={
                "width": width,
                "length": length,
                "height": height,
                "sqft": width * length,
                "category": category,
                "amenities": amenities,
                "climate_control": climate_control,
                "drive_up": drive_up,
            },
        )

Stage	Duration
Fetch	7697ms
Detect	155ms
Parse	53ms
Export	14ms

Facility: 090740

Scrape Runs (5)

Run #554 Details

Parsed Units (4)

(5.0,10.0,50.0)

(10.0,10.0,100.0)

(10.0,15.0,150.0)

(10.0,20.0,200.0)

HTML Snapshot — Run #554