Why we built EdgeTelemetry

Do not index

The AI buildout is the largest capex cycle in tech right now. The operational layer beneath it is being assembled by hand, one rack at a time. We got tired of watching that.

We didn't set out to build a product.

DeHaze Labs has been an implementation firm for years. We embed engineers into customer teams. We build production AI and data platforms — for telecom, for sports and entertainment intelligence, for smart-city perception, for industrial operations. The work is satisfying in the way that engineering work is satisfying: real systems, real customers, real metrics that improve when we ship.

But across enough infrastructure engagements, we kept noticing the same thing. Every GPU operator, every colo serving AI workloads, every hyperscaler capacity partner we worked with had built — or was in the middle of building — the same telemetry layer from scratch. Different vendor stacks, different ingestion choices, different reasoning about what counted as "ready" for a rack to come online. But the pattern was identical every time.

Heterogeneous source ingestion. Schema normalization. Validation logic. A lot of glue between systems that should never have been separate in the first place.

By the third or fourth engagement that started with "first we need to fix the telemetry," we were no longer building a custom thing. We were building roughly the same thing, slightly differently each time, scoped to whatever fragment of the problem the customer was willing to fund. Each version got 70% right and shipped. Each was abandoned the moment the engagement ended, which meant the next customer started over.

That pattern is the definition of a product that should exist and doesn't.

The market is moving faster than the tooling under it

The AI buildout is moving faster than the operational tooling that supports it. That's not a controversial claim — anyone reading earnings calls or capex announcements can see it. GPU capacity is being brought online at a pace the operational layer wasn't designed for. The hardware vendors are shipping at scale. The hyperscalers are absorbing capacity as fast as it can be racked. And the layers underneath — telemetry, validation, observability, autonomous operations — are still being assembled rack-by-rack, vendor-by-vendor, customer-by-customer.

This is the gap. Capex is racing ahead. Operational tooling is racing to catch up. The gap, as a market opportunity, is where DeHaze Labs has been working anyway, just one customer at a time.

We made the call to productize earlier this year. The thesis was simple: the unified telemetry layer is going to get built somewhere. It's either going to get built once, well, by a team that has done it enough times to recognize the right shape — or it's going to get built badly, repeatedly, by every operator on earth. We'd rather be the team that builds it once.

What changes when telemetry is solved

The interesting part of EdgeTelemetry isn't the telemetry. It's what becomes possible once telemetry is solved.

Right now, autonomous data center operations is mostly a slide deck. Operators talk about it. Vendors pitch it. Articles get written about it. But it doesn't actually exist at any meaningful scale, and the reason it doesn't exist is structural: you can't build autonomous operations on top of fragmented, vendor-specific, schema-inconsistent data. The reasoning layer everyone wants — a system that diagnoses faults, plans remediations, executes well-defined playbooks, and escalates to humans only when it actually needs to — is architecturally downstream of unified telemetry. The substrate has to come first.

Once that substrate is in place, the reasoning layer becomes a real engineering problem rather than a research problem. Frontier LLMs with appropriate tool use can reason over unified telemetry, follow operational runbooks, hypothesize root causes, and execute remediation against a well-defined surface. The hard parts have always been underneath: getting the data clean, getting the schema consistent, getting the validation gates trustworthy enough to act on.

EdgeTelemetry exists because that substrate is the prerequisite for everything else. We didn't build it because we wanted to ship a product. We built it because the work we were doing on customer engagements kept hitting the same wall, and the right answer was to build the wall once.

What we're not doing

We're being deliberate about what EdgeTelemetry isn't.

It isn't a one-vendor solution. The point of unified telemetry is exactly that it isn't tied to a vendor stack — if it locked you into Nvidia or AMD or any specific cooling system manufacturer, it would defeat its own purpose. The schema is the abstraction; the connectors are the easy part.

It isn't a closed reasoning system. We expose telemetry to a reasoning layer through a clean interface, but we don't bake any single model into the platform. Frontier LLMs are improving fast, and we expect to swap the reasoning layer over time as better models ship. Customers should be able to do the same.

It isn't a replacement for operational expertise. The people running data centers know more about their environment than any product can, and they're going to keep being the operators who decide what "ready" means for their racks, what remediation is acceptable, and where to escalate. EdgeTelemetry exists to give them leverage, not to replace them.

And it isn't being built in a hurry. We're working with a small number of operators on early access. We're going to take the time to get the readiness validation tier right across the diversity of vendor configurations our customers actually run, before we put anything else in the spotlight. The market is moving fast; the substrate underneath has to be the thing that doesn't.

What we'd like to hear

If you're operating GPU capacity, running a colo serving AI workloads, or building infrastructure for hyperscaler partners — and the unified telemetry problem we've described above is one you've been solving by hand, repeatedly — we'd like to talk.

Not because we want to sell you the product today. Because we want to understand whether the way we're thinking about this matches the way you're thinking about it, and because the early-access cohort we're building is the group whose problems will most directly shape what gets built next.

The longer arc of this is autonomous data center operations. The shorter arc is rack onboarding from weeks to hours. We're aimed at both, in that order, because that's the order they actually have to be built in.

EdgeTelemetry is built by DeHaze Labs. Get in touch at hello@dhlabs.ai.

Why we built EdgeTelemetry

The market is moving faster than the tooling under it

What changes when telemetry is solved

What we're not doing

What we'd like to hear

Related posts

EdgeTelemetry: From fragmented telemetry to autonomous infrastructure