<?xml version="1.0" encoding="utf-8"?>
<!-- 
     draft-rfcxml-general-template-standard-00
  
     This template includes examples of the most commonly used features of RFCXML with comments 
     explaining how to customise them. This template can be quickly turned into an I-D by editing 
     the examples provided. Look for [REPLACE], [REPLACE/DELETE], [CHECK] and edit accordingly.
     Note - 'DELETE' means delete the element or attribute, not just the contents.
     
     Documentation is at https://authors.ietf.org/en/templates-and-schemas
-->
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->


<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<!-- If further character entities are required then they should be added to the DOCTYPE above.
     Use of an external entity file is not recommended. -->

<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="std"
  docName="draft-liang-tcp-provenance-option-01"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  version="3">
<!-- [REPLACE] 
       * docName with name of your draft
     [CHECK] 
       * category should be one of std, bcp, info, exp, historic
       * ipr should be one of trust200902, noModificationTrust200902, noDerivativesTrust200902, pre5378Trust200902
       * updates can be an RFC number as NNNN
       * obsoletes can be an RFC number as NNNN 
-->

  <front>
    <title abbrev="TCP Provenance Identifier Option">TCP Provenance Identifier Option</title>
    <seriesInfo name="Internet-Draft" value="draft-liang-tcp-provenance-option-01"/>
   
    <author fullname="Bowen Liang" initials="B." surname="Liang">
      <!-- [CHECK]
             * initials should not include an initial for the surname
             * role="editor" is optional -->
    <!-- Can have more than one author -->
      
    <!-- all of the following elements are optional -->
      <organization>Tsinghua University</organization>
      <address>
    
        <email>liangbw25@mails.tsinghua.edu.cn</email>  

      </address>
    </author>

    <author fullname="Yang Xiang" initials="Y." surname="Xiang">
      <organization>Yunshan Networks</organization>
      <address>
    
        <email>xiangyang@yunshan.net</email>  

      </address>
    </author>

    <author fullname="Xingang Shi" initials="X." surname="Shi">
      <organization>Tsinghua University</organization>
      <address>
    
        <email> shixg@cernet.edu.cn </email>  
      </address>
    </author>

    <author fullname="Xia Yin" initials="X." surname="Yin">
      <organization>Tsinghua University</organization>
      <address>
    
        <email> yinxia@tsinghua.edu.cn </email>  
      </address>
    </author>
   
    <date year="2026"/>
    <!-- On draft submission:
         * If only the current year is specified, the current day and month will be used.
         * If the month and year are both specified and are the current ones, the current day will
           be used
         * If the year is not the current one, it is necessary to specify at least a month and day="1" will be used.
    -->

    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <!-- "Internet Engineering Task Force" is fine for individual submissions.  If this element is 
          not present, the default is "Network Working Group", which is used by the RFC Editor as 
          a nod to the history of the RFC Series. -->

    <keyword>tcp</keyword>
    <keyword>provenance</keyword>
    <keyword>observability</keyword>
    <!-- [REPLACE/DELETE]. Multiple allowed.  Keywords are incorporated into HTML output files for 
         use by search engines. -->

    <abstract>
      <t>This document describes a TCP option that carries a Provenance Identifier (ProvID) to enable 
      correlation of TCP connections when transport-layer identifiers change along the path.</t>
    </abstract>
 
  </front>

  <middle>
    
    <section>
      <name>Introduction</name>
      <t>
      In administrative domains (e.g., cloud platforms, enterprise networks, and
      data centers), TCP traffic often traverses devices such as NATs, 
      load balancers, and service proxies that rewrite transport-layer identifiers or 
      terminate and re-originate TCP connections. As a result, a single end-to-end
      exchange between two workloads may correspond to a sequence of distinct TCP
      connections within the domain. This document refers to that end-to-end exchange
      as a "logical communication".
      </t>

      <t>
      These transformations break provenance continuity. Observations of 
      TCP traffic at different points in the domain cannot reliably be associated 
      with the same logical communication, and operators cannot determine which 
      workload instance originally initiated a connection once rewriting has occurred.
      </t>

      <t>
      This document defines an experimental TCP option that carries a small 
      Provenance Identifier (ProvID). A ProvID is a compact value generated for the duration of a logical communication 
      using workload-scoped attributes, such as host IP with process identifier. 
      As illustrated in Figure 1, the ProvID enables provenance correlation across rewriting 
      boundaries within the domain.
      </t>

      <t>
      The ProvID option is intended for use within administrative domains and is not designed for use on the open Internet.
      </t>

      <figure>
        <name>Provenance Correlation for a Logical Communication across Middleboxes</name>
        <artwork type="ascii-art">
<![CDATA[
+------------------+          +--------------+          +-------------+
| Originating Host |          |  Middlebox   |          | Destination |
|                  |          | (terminates) |          |             |
|  [Workload A]    |  TCP #1  |              |  TCP #2  |             |
|   ProvID = P1    | (P1)     |              | (P1)     |             |
|                  |--------->|   Relay P1   |--------->|             |
|  [Workload B]    |          |              |          |             |
|   ProvID = P2    |          |              |          |             |
|                  |          |              |          |             |
+------------------+          +--------------+          +-------------+
        |                             |                        |
Observation Point A          Observation Point B       Observation Point C
logs: (TCP #1, P1)           logs: (TCP #1, P1)        logs: (TCP #2, P1)
                                   (TCP #2, P1)
Logical Communication (ProvID = P1) for Workload A:
{ TCP #1, TCP #2 }
]]>

        </artwork>
      </figure>
      
      <section>
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>
      <!-- [CHECK] The 'Requirements Language' section is optional -->

    </section>

    <section>
      <name>Use Cases</name>
      <section>
        <name>Association of Traffic Across Rewriting</name>
        <t>In many administrative domains, operators need to reconstruct the path of a 
        logical communication for troubleshooting, incident investigation, or auditing. 
        Flow logs and measurements collected at different observation points commonly reference 
        different TCP connections that correspond to the same logical communication, and 
        the originating endpoint cannot be reliably identified from transport identifiers alone.</t>

        <t>In this use case, the originating endpoint includes the ProvID in the TCP header options. 
        Observation points record the ProvID alongside locally observed connection identifiers. 
        When a logical communication is realized as a series of distinct TCP connections within the domain, 
        the ProvID provides a stable correlation handle for aggregating these disparate records. 
        This enables end-to-end reconstruction of communication paths across middleboxes,
        supports cross-layer observability (for example, linking network telemetry
        with application or process context), and maintains provenance continuity.</t>

      </section>

      <section>
        <name>Process-level Origin Attribution for Remediation</name>
        <t>An operator may detect anomalous TCP behavior at an observation point and
        need to remediate the issue by acting on the specific originator responsible.
        At that point, the original source may be obscured by address translation or
        connection rewriting, and even host-level attribution may be insufficient
        because multiple independent processes can share a host.</t>

        <t>In this use case, the originating endpoint generates a ProvID using the pair (host IP address, process identifier) 
        of the process that created the socket. An operator that observes abnormal traffic associated with a ProvID can map 
        the ProvID to the initiating (host IP, process identifier) and take immediate action on that process 
        (for example, suspend or restart the process, isolate the workload instance, or apply a narrowly scoped network policy).</t>
      </section>
    </section>
    
    <section>
      <name>Option Format</name>

      <t>
        The Provenance Identifier (ProvID) option uses a fixed-length
        experimental TCP option format. The option is identified by
        the experimental option kind and is distinguished by a fixed
        option length.
      </t>

      <figure>
        <name>ProvID TCP Option Format</name>
        <artwork type="ascii-art">
<![CDATA[
  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |     Kind      |    Length     |              ExID             |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                           ProvID                              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
]]>
        </artwork>
      </figure>


      <dl newline="true">
        <dt>Kind</dt>
        <dd>
          The TCP option kind. The value of this field is 253.
        </dd>

        <dt>Length</dt>
        <dd>
          The total length of the TCP option in bytes. For the ProvID
          option defined in this document, the value of this field is 12.
        </dd>

        <dt>ExID</dt>
        <dd>
          The Experiment Identifier (ExID). This 2-byte field identifies
          the ProvID experiment when used with experimental TCP option
          kinds. The value of this field is 0xDEE9.
        </dd>

        <dt>ProvID</dt>
        <dd>
          The Provenance Identifier. This field is 8 bytes in length and
          carries a provenance identifier defined by the sender.
        </dd>
      </dl>
    </section>

        <section>
      <name>Middlebox Considerations</name>
        <t>To maintain provenance continuity within an administrative domain, middleboxes 
        (as defined in <xref target="RFC3234"/>) MUST handle the ProvID option according to their function.</t>

      <section>
        <name>Non-terminating Middleboxes</name>
        <t>
        A non-terminating middlebox is a device that resides on the communication path but does not terminate the end-to-end TCP connection. 
        These middleboxes MUST forward the ProvID option unmodified in any segment where it appears. 
        If the middlebox modifies transport-layer identifiers (e.g., performing Network Address Translation), it MUST NOT strip or alter the ProvID option.
        </t>
      </section>

      <section>
        <name>Terminating Middleboxes</name>
        <t>
        A terminating middlebox is a device that acts as the endpoint for a TCP connection (e.g., a service proxy). 
        To maintain provenance continuity, these middleboxes MUST relay the ProvID from the incoming connection 
        to the outgoing connection. Specifically, whenever a ProvID is observed in an incoming TCP segment, 
        the middlebox MUST include the identical ProvID value in the corresponding segment of the outgoing connection.
        </t>
      </section>

      <section>
        <name>Domain Boundary Handling</name>
          <t>To prevent the leakage of internal network metadata, middleboxes at the boundary of the administrative domain 
          MUST strip the ProvID option from any TCP segments exiting the domain. Similarly, any ProvID options 
          present in traffic entering the domain from the open Internet MUST be stripped.</t>
      </section>
    </section>
    
    <section anchor="IANA">
    <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
      <name>IANA Considerations</name>
      <t>This document's IANA considerations are to be determined and will be
        provided in a subsequent revision of this draft. [TODO]</t>
    </section>
    
    <section anchor="Security">
      <!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->
      <name>Security Considerations</name>
      <t>This document's security considerations are to be determined and will be
        provided in a subsequent revision of this draft. [TODO]</t>
    </section>
    
    <!-- NOTE: The Acknowledgements and Contributors sections are at the end of this template -->
  </middle>

  <back>
    <references>
      <name>References</name>

      <references>
        <name>Normative References</name>

        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6994.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9293.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3234.xml"/>
      </references>

      <references>
        <name>Informative References</name>

      </references>
    </references>
    
    <section>
      <name>Appendix 1</name>
      <t>TODO</t>
    </section>

    
    
</back>
</rfc>