TCP Provenance Identifier Option

Introduction In administrative domains (e.g., cloud platforms, enterprise networks, and data centers), TCP traffic often traverses devices such as NATs, load balancers, and service proxies that rewrite transport-layer identifiers or terminate and re-originate TCP connections. As a result, a single end-to-end exchange between two workloads may correspond to a sequence of distinct TCP connections within the domain. This document refers to that end-to-end exchange as a "logical communication". These transformations break provenance continuity. Observations of TCP traffic at different points in the domain cannot reliably be associated with the same logical communication, and operators cannot determine which workload instance originally initiated a connection once rewriting has occurred. This document defines an experimental TCP option that carries a small Provenance Identifier (ProvID). A ProvID is a compact value generated for the duration of a logical communication using workload-scoped attributes, such as host IP with process identifier. As illustrated in Figure 1, the ProvID enables provenance correlation across rewriting boundaries within the domain. The ProvID option is intended for use within administrative domains and is not designed for use on the open Internet.

Provenance Correlation for a Logical Communication across Middleboxes | Relay P1 |--------->| | | [Workload B] | | | | | | ProvID = P2 | | | | | | | | | | | +------------------+ +--------------+ +-------------+ | | | Observation Point A Observation Point B Observation Point C logs: (TCP #1, P1) logs: (TCP #1, P1) logs: (TCP #2, P1) (TCP #2, P1) Logical Communication (ProvID = P1) for Workload A: { TCP #1, TCP #2 } ]]>

Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Use Cases

Association of Traffic Across Rewriting In many administrative domains, operators need to reconstruct the path of a logical communication for troubleshooting, incident investigation, or auditing. Flow logs and measurements collected at different observation points commonly reference different TCP connections that correspond to the same logical communication, and the originating endpoint cannot be reliably identified from transport identifiers alone. In this use case, the originating endpoint includes the ProvID in the TCP header options. Observation points record the ProvID alongside locally observed connection identifiers. When a logical communication is realized as a series of distinct TCP connections within the domain, the ProvID provides a stable correlation handle for aggregating these disparate records. This enables end-to-end reconstruction of communication paths across middleboxes, supports cross-layer observability (for example, linking network telemetry with application or process context), and maintains provenance continuity.

Process-level Origin Attribution for Remediation An operator may detect anomalous TCP behavior at an observation point and need to remediate the issue by acting on the specific originator responsible. At that point, the original source may be obscured by address translation or connection rewriting, and even host-level attribution may be insufficient because multiple independent processes can share a host. In this use case, the originating endpoint generates a ProvID using the pair (host IP address, process identifier) of the process that created the socket. An operator that observes abnormal traffic associated with a ProvID can map the ProvID to the initiating (host IP, process identifier) and take immediate action on that process (for example, suspend or restart the process, isolate the workload instance, or apply a narrowly scoped network policy).

Option Format The Provenance Identifier (ProvID) option uses a fixed-length experimental TCP option format. The option is identified by the experimental option kind and is distinguished by a fixed option length.

ProvID TCP Option Format

Kind: The TCP option kind. The value of this field is 253.
Length: The total length of the TCP option in bytes. For the ProvID option defined in this document, the value of this field is 12.
ExID: The Experiment Identifier (ExID). This 2-byte field identifies the ProvID experiment when used with experimental TCP option kinds. The value of this field is 0xDEE9.
ProvID: The Provenance Identifier. This field is 8 bytes in length and carries a provenance identifier defined by the sender.

Middlebox Considerations To maintain provenance continuity within an administrative domain, middleboxes (as defined in ) MUST handle the ProvID option according to their function.

Non-terminating Middleboxes A non-terminating middlebox is a device that resides on the communication path but does not terminate the end-to-end TCP connection. These middleboxes MUST forward the ProvID option unmodified in any segment where it appears. If the middlebox modifies transport-layer identifiers (e.g., performing Network Address Translation), it MUST NOT strip or alter the ProvID option.

Terminating Middleboxes A terminating middlebox is a device that acts as the endpoint for a TCP connection (e.g., a service proxy). To maintain provenance continuity, these middleboxes MUST relay the ProvID from the incoming connection to the outgoing connection. Specifically, whenever a ProvID is observed in an incoming TCP segment, the middlebox MUST include the identical ProvID value in the corresponding segment of the outgoing connection.

Domain Boundary Handling To prevent the leakage of internal network metadata, middleboxes at the boundary of the administrative domain MUST strip the ProvID option from any TCP segments exiting the domain. Similarly, any ProvID options present in traffic entering the domain from the open Internet MUST be stripped.

IANA Considerations This document's IANA considerations are to be determined and will be provided in a subsequent revision of this draft. [TODO]

Security Considerations This document's security considerations are to be determined and will be provided in a subsequent revision of this draft. [TODO]