<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rfc [
  <!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY RFC4760 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4760.xml">
  <!ENTITY RFC4786 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4786.xml">
  <!ENTITY RFC7285 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7285.xml">
  <!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
  <!ENTITY RFC8990 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8990.xml">

  <!ENTITY I-D.dcn-cats-req-service-segmentation SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.dcn-cats-req-service-segmentation">
  <!ENTITY I-D.ietf-cats-framework SYSTEM "http://xml2rfc.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-cats-framework">
]>

<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<rfc category="info" docName="draft-ietf-cats-usecases-requirements-14"
     ipr="trust200902" sortRefs="true" submissionType="IETF" symRefs="true"
     tocInclude="true" version="3" xmlns:xi="http://www.w3.org/2001/XInclude">

  <front>
    <title abbrev="CATS: Problem, Use Cases, Requirements ">Computing-Aware
    Traffic Steering (CATS) Problem Statement, Use Cases, and
    Requirements</title>

    <seriesInfo name="Internet-Draft"
                value="draft-ietf-cats-usecases-requirements-14"/>

    <author fullname="Kehan Yao" initials="K." surname="Yao">
      <organization>China Mobile</organization>

      <address>
        <email>yaokehan@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Luis M. Contreras" initials="L. M." surname="Contreras">
      <organization>Telefonica</organization>

      <address>
        <email>luismiguel.contrerasmurillo@telefonica.com</email>
      </address>
    </author>

    <author fullname="Hang Shi" initials="H." surname="Shi">
      <organization>Huawei Technologies</organization>

      <address>
        <email>shihang9@huawei.com</email>
      </address>
    </author>

    <author fullname="Shuai Zhang" initials="S." surname="Zhang">
      <organization>China Unicom</organization>

      <address>
        <email>zhangs366@chinaunicom.cn</email>
      </address>
    </author>

    <author fullname="Qing An" initials="Q." surname="An">
      <organization>Alibaba Group</organization>

      <address>
        <email>anqing.aq@alibaba-inc.com</email>
      </address>
    </author>

    <date day="03" month="February" year="2026"/>

    <workgroup>cats</workgroup>

    <abstract>
      <?line 98?>

      <t>Distributed computing enhances service response time and energy efficiency 
      by utilizing diverse computing facilities for compute-intensive and delay-sensitive 
      services. To optimize throughput and response time, "Computing-Aware Traffic Steering" (CATS) 
      selects servers and directs traffic based on compute capabilities and resources, 
      rather than static dispatch or connectivity metrics alone. This document outlines 
      the problem statement and scenarios for CATS within a single domain, and drives 
      requirements for the CATS framework.</t>

    </abstract>
  </front>

  <middle>

    <section anchor="introduction">
      <name>Introduction</name>

      <t>Computing resources are increasingly being deployed, 
      particularly edge computing resources, to support services 
      that require low latency, high reliability, and dynamic resource scaling.</t>

      <t>Diversified service demands have brought key challenges to service deployment and traffic scheduling. A single-site service instance often lacks sufficient capacity to guarantee the required quality of service, especially during peak hours when local computing resources may fail to handle all incoming requests, leading to longer response times or even request drops. Regular capacity expansion of a single site is often neither practical nor economical. Additionally, relying solely on computing capabilities enhancements of client devices cannot meet the computing requirements of all applications.</t>

      <t>It is necessary to deploy services across multiple sites (either edge or central nodes) to improve availability and scalability. To this end, traffic should be steered to the "best" service instance based on factors like current computing load, where "best" is largely determined by application requirements.</t> 

      <t>However, existing routing schemes and traffic engineering methods often fall short of addressing these challenges. The underlying networking infrastructures that include computing resources usually provide relatively static service dispatching or depend solely on connectivity metrics for traffic steering, failing to account for compute capabilities and resource status, which are critical for meeting the quality requirements of modern services.</t>

      <t>To tackle this issue, the choice of service instance and network resources should further consider compute-oriented metrics beyond connectivity metrics. The process of selecting service instances and locations based on metrics that are oriented towards compute capabilities and resources, and of directing traffic to them on chosen network resources is called "Computing-Aware Traffic Steering" (CATS). It should be noted that CATS is not limited to edge computing scenarios, however, Section 3 of this document will focus on edge computing scenarios for problem statement.</t>

      <t>This document describes sample usage scenarios that drive CATS
      requirements and will help to identify candidate solution architectures
      and approaches. The use cases and requirements within this document 
      are limited to single-domain scenarios.</t>
    </section>

    <section anchor="definition-of-terms">
      <name>Definition of Terms</name>

      <t>This document uses the terms defined in <xref
      target="I-D.ietf-cats-framework"/>, including service site, service instance, CATS service identifier(CS-ID), flow, client.</t>

      <dl indent="2">

        <dt>Edge Computing:</dt>

        <dd>
          <t>Edge computing is a computing pattern that moves computing
          infrastructures, i.e, servers, away from centralized data centers
          and instead places it close to the end users for low latency
          communication. </t>

        </dd>
      </dl>

      <t>Even though this document is not a protocol specification, it makes
      use of upper case key words to define requirements unambiguously.</t>

      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
      "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
      "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are
      to be interpreted as described in BCP 14 <xref target="RFC2119"/> <xref
      target="RFC8174"/> when, and only when, they appear in all capitals, as
      shown here.</t>

      <?line -18?>
    </section>

    <section anchor="problem-statement">
      <name>Problem Statement</name>

      <section anchor="multi-deployment-of-edge-sites-and-service">
        <name>Multi-deployment of Edge Service Sites and Service</name>

        <t>In edge computing environments, service instances typically adopt a multi-site deployment model. 
        It should be clarified that specific service instance deployment strategies are not within the 
        scope of CATS. However, there is a close correlation between service 
        instance deployment and traffic scheduling, especially in the definition and selection of core metrics 
        such as computing capabilities and resources. This dual applicability allows a common set of metrics 
        to inform both traffic steering and higher-level service management decisions, without requiring 
        CATS to define orchestration behavior.</t>

        <t>Therefore, to present a clear and comprehensive problem statement, 
        it is necessary to first introduce the relevant considerations for multi-edge 
        service site deployment. This premise can better support the subsequent 
        elaboration on CATS requirements and solutions.</t>

        <t>Before deploying edge service sites, the following factors need to 
        be considered:</t>
       
        <ul spacing="normal">
          <li>
            <t>Geographic location: Including the number of users, differences in service types, and the number of connection requests from users. For edge service sites located in densely populated areas with a large number of users and service requests, more service replicas can be deployed compared to other areas.</t>
          </li>

          <li>
            <t>The type, scale, and usage frequency of required computing resources. For example, distributed AI inference services require the deployment of more GPU resources.</t>
          </li>

          <li>
            <t>The status of network resources associated with computing resources, such as network topology, network access methods, connectivity, link bandwidth, and path protection or redundancy information.</t>
          </li>
        </ul>

        <t>To improve the overall quality of service, during the service deployment phase, it is necessary to analyze the approximate network and computing resource requirements of the service, comprehensively form a reasonable network and computing resource topology, and clarify the location, overall distribution, and relative position of computing resources in the network topology. This process relies on standardized consensus on computing and network resources related metrics, which is also the point most closely related to the problem space addressed by CATS traffic scheduling.</t> 
      </section>

      <section anchor="traffic-steering-among-edges-sites-and-service-instances">
        <name>Traffic Steering among Edges Service Sites and Service Instances</name>

        <t>This section describes how existing edge computing systems do not
        provide all of the support needed for real-time or near-real-time
        services, and how it is necessary to steer traffic to different sites
        considering changes in client distribution, different time slots, events, server
        loads, network capabilities, and some other factors which might not be
        directly measured, i.e., properties of edge service sites(e.g., geographical
        location), etc.</t>

        <t>It's assumed that service instances are multi-site deployed, and they are reachable
         through a network infrastructure.</t>

        <t>When a client issues a service request for a required service, the
        request is steered to one of the available service instances. Each
        service instance may act as a client towards another service, thereby
        seeing its own outbound traffic steered to a suitable service instance
        of the requested service and so on, achieving service composition and
        chaining as a result.</t>

        <t>The aforementioned selection of a service instance from the set of candidates is performed using traffic steering methods.</t>

        <t>In edge computing, traffic is steered to an edge service site that is "closest" or to one of a few "close"
        sites using load-balancing. Such traffic steering can be initiated either by the application layer or by the network layer: 
        the application layer may actively query for the optimal node and guide traffic using mechanisms such as the ALTO protocol<xref target="RFC7285"/>, 
        while the network layer may leverage Anycast routing<xref target="RFC4786"/>, 
        where routing systems automatically distribute traffic according to routing tables in an application-transparent manner. 
        However, regardless of whether the steering is performed by the application or the network, the core criteria for selecting "closest" or "close" sites often rely solely on communication metrics
         (such as physical distance, hop count, or network latency). This decision logic can easily lead to suboptimal choices, meaning that the "closest" site is not always the "best" one. 
         This is because the computing resources and states of edge service sites can change in real time:</t>

        <ul spacing="normal">
          <li>
            <t>The closest site may not have sufficient resources.</t>
          </li>

          <li>
            <t>The closest site may not have the specific computing resources required.</t>
          </li>
        </ul>

        <t>To address these issues, enhancements to traffic steering mechanisms are needed to direct traffic to sites that can adequately support the requested services. 
        Steering decision may take into account more complex and possibly
        dynamic metric information, such as load of service instances, latency
        experienced or similar, for selection of a more suitable
        service instance.</t>

        <t>It is important to note that clients may move. This means that the
        service instance that was "best" at one moment might no longer be best
        when a new service request is issued. This creates a (physical)
        dynamicity that will need to be catered for in addition to the changes
        in server and network load. From a routing perspective, CATS is an application-transparent routing mechanism that can provide scheduling for both stateful and stateless services. 
        However, in scenarios where clients move and the service is stateful, CATS requires the application to explicitly indicate whether it allows the routing system to enable CATS functionality. 
        Otherwise, mid-session scheduling triggered by CATS may cause application context inconsistency among service sites or even service interruption.</t>

        <t><xref target="fig-edge-site-deployment"/> shows a common way to
        deploy edge service sites in the metro. Edge service sites are connected with Provider
        Edges(PEs). There is an edge data center for metro area which has high
        computing resource and provides the service to more User
        Equipments(UEs) (UE1 to UEn) at the working time. Because more office buildings are
        in the metro area. And there are also some remote edge service sites which
        have limited computing resource and provide the service to the UEs (UEa, UEb)
        close to them.</t>

        <t>Applications to meet service demands could be deployed in both the
        edge data center in metro area and the remote edge service sites. In this
        case, the service request and the resource are matched well. Some
        potential traffic steering may be needed just for special service
        request or some small scheduling demand.</t>

        <figure anchor="fig-edge-site-deployment"
                title="Common Deployment of Edge Service Sites">
          <artwork>
     +----------------+    +---+                  +------------+
   +----------------+ |- - |UE1|                +------------+ |
   | +-----------+  | |    +---+             +--|    Edge    | |
   | |Edge server|  | |    +---+       +- - -|PE|            | |
   | +-----------+  | |- - |UE2|       |     +--|   Site 1   |-+
   | +-----------+  | |    +---+                +------------+
   | |Edge server|  | |     ...        |            |
   | +-----------+  | +--+         Potential      +---+ +---+
   | +-----------+  | |PE|- - - - - - -+          |UEa| |UEb|
   | |Edge server|  | +--+         Steering       +---+ +---+
   | +-----------+  | |    +---+       |                  |
   | +-----------+  | |- - |UE3|                  +------------+
   | |  ... ...  |  | |    +---+       |        +------------+ |
   | +-----------+  | |     ...              +--|    Edge    | |
   |                | |    +---+       +- - -|PE|            | |
   |Edge data center|-+- - |UEn|             +--|   Site 2   |-+
   +----------------+      +---+                +------------+
   High computing resource              Limited computing resource
   and more UE at metro area            and less UE at remote area</artwork>
        </figure>

        <t><xref target="fig-edge-mobility"/> shows that during non-working
        hours, for example at weekend or daily night, more UEs move to the
        remote area that are close to their house or for some weekend events.
        So there will be more service request at remote but with limited
        computing resource, while the rich computing resource might not be
        used with less UE in the metro area. It is possible for many people to
        request services at the remote area, but with the limited computing
        resource, moreover, as the people move from the metro area to the
        remote area, the edge service sites that serve common services will also
        change, so it may be necessary to steer some traffic back to the metro
        data center.</t>

        <figure anchor="fig-edge-mobility"
                title="Steering Traffic among Edge Service Sites">
          <artwork>
     +----------------+                           +------------+
   +----------------+ |                         +------------+ |
   | +-----------+  | |  Steering traffic    +--|    Edge    | |
   | |Edge server|  | |          +-----------|PE|            | |
   | +-----------+  | |          |           +--|   Site 1   |-+
   | +-----------+  | |- - - - - - - -+         +-+----------+
   | |Edge server|  | |          |    |           |          |
   | +-----------+  | +--+       |  +---+ +---+ +---+ +---+ +---+
   | +-----------+  | |PE|-------+  |UEa| |UEb| |UE1| |...| |UEn|
   | |Edge server|  | +--+       |  +---+ +---+ +---+ +---+ +---+
   | +-----------+  | |          |          |           |
   | +-----------+  | |- - - - - - - - - - -+           +------+
   | |  ... ...  |  | |          |              +------------+ |
   | +-----------+  | |          |           +--|    Edge    | |
   |                | |          +-----------|PE|            | |
   |Edge data center|-+  Steering traffic    +--|   Site 2   |-+
   +----------------+                           +------------+
   High computing resource              Limited computing resource
   and less UE at metro area            and more UE at remote area</artwork>
        </figure>

        <t>There will also be the common variable of network and computing
        resources, for someone who is not moving but experiences poor latency
        sometime. Because of other UEs moving, a large number of request for
        temporary events such as vocal concert, shopping festival and so on,
        and there will also be the normal change of the network and computing
        resource status. So for some fixed UEs, it is also expected to steer
        the traffic to appropriate sites dynamically.</t>

        <t>Those problems indicate that traffic needs to be steered among
        different edge service sites, because of the mobility of the UE and the common
        variable of network and computing resources. Moreover, some use cases
        in the following section require both low latency and high computing
        resource usage or specific computing hardware capabilities (such as
        local GPU); hence joint optimization of network and computing resource
        is needed to guarantee the Quality of Experience (QoE).</t>
      </section>

    </section>

    <section anchor="use-cases">
      <name>Use Cases</name>

      <section anchor="Overview-of-Use-Cases">
        <name>Overview of Use Cases</name>

        <t>The five use cases outlined in the sections below serve as examples 
        to show the need for CATS. In particular, while these use cases may be 
        solved in a simplistic way with current tools, CATS adds the ability to 
        make dynamic selection between services sites and service instances to 
        take account of network capabilities and status, compute capabilities and 
        current load, and to achieve load-balancing.</t>

        <t>Considering that these use cases are enough to derive common requirements, 
        this document only includes these five use cases in the main body, although 
        there have been more similar use cases proposed in CATS working group 
        (e.g., <xref target="I-D.dcn-cats-req-service-segmentation"/>). The applicability of CATS 
        may be further extended in future use cases brought to the working group and 
        possibly arising from work in other standards bodies such as ETSI and 3GPP, 
        but it is believed that the five use cases presented here are sufficient to 
        drive the requirements expressed in this document and future applicability.</t>

        <t>If new use cases do raise additional requirements they will need to be documented 
        separately and might necessitate modifications to the CATS framework 
        <xref target="I-D.ietf-cats-framework"/>.</t>

        <t>Further potential use cases are attached in Appendix A of this document.</t>
      </section>

      <section anchor="example-1-computing-aware-ar-or-vr">
        <name>Example 1: Computing-aware AR or VR</name>

        <t>Cloud Virtual Reality (VR) and Augmented Reality (AR) introduce 
        the concept of cloud computing to the rendering of audiovisual assets 
        in such applications. Here, the edge
        cloud helps encode/decode and render content. The edge cloud refers to cloud
        computing located at the edge of the network to be closer to users and applications.
        The client device usually only uploads posture or control information to the edge cloud 
        and then VR/AR contents are rendered in the edge cloud. The video and audio outputs
        generated from the edge cloud are encoded, compressed, and transmitted
        back to the client device or further transmitted to central data center
        via high bandwidth networks.</t>

        <t>A Cloud VR service is delay-sensitive and influenced by both
        network and computing resources. Therefore, the edge service site which
        executes the service has to be carefully selected to make sure it has
        sufficient computing resource and good network condition to guarantee
        the end-to-end service delay. For example, for an entry-level cloud VR
        (panoramic 8K 2D video) with 110-degree Field of View (FOV)
        transmission, the typical network requirements are bandwidth 40Mbps,
        20ms for motion-to-photon latency, packet loss rate is 2.4E-5; the
        typical computing requirements are 8K H.265 real-time decoding, 2K
        H.264 real-time encoding. Further, the 20ms latency can be categorised
        as:</t>

        <ol spacing="normal" type="(%i)">
          <li>
            <t>Sensor sampling delay(client), which is considered
            imperceptible by users is less than 1.5ms including an extra 0.5ms
            for digitalization and client device processing.</t>
          </li>

          <li>
            <t>Display refresh delay(client), which take 7.9ms based on the
            144Hz display refreshing rate and 1ms extra delay to light up.</t>
          </li>

          <li>
            <t>Image/frame rendering delay(server), which could be reduced to
            5.5ms.</t>
          </li>

          <li>
            <t>Round-trip network delay: The remaining latency budget is 5.1
            ms, calculated as 20-1.5-5.5-7.9 = 5.1ms.</t>
          </li>
        </ol>

        <t>So the budgets for server(computing) delay and network delay are
        almost equivalent, which make sense to consider both of the delay for
        computing and network. And it could not meet the total delay
        requirements or find the best choice by either optimizing the network
        or computing resource.</t>

        <t>Based on the analysis, here are some further assumption as <xref
        target="Computing-Aware-AR-VR"/> shows, the client could request any
        service instance among 3 edge service sites. The delay of client could be
        same, and the differences of edge service sites and corresponding network path
        have different delays:</t>

        <ul spacing="normal">
          <li>
            <t>Edge service site 1: The computing delay=4ms based on a light load, and
            the corresponding network delay=9ms based on a heavy traffic.</t>
          </li>

          <li>
            <t>Edge service site 2: The computing delay=10ms based on a heavy load,
            and the corresponding network delay=4ms based on a light
            traffic.</t>
          </li>

          <li>
            <t>Edge service site 3: The edge service site 3's computing delay=5ms based on a
            normal load, and the corresponding network delay=5ms based on a
            normal traffic.</t>
          </li>
        </ul>

        <t>In this case, the optimal network and computing total
        delay can not be achieved if choosing the resource only based on either of computing or
        network status:</t>

        <ul spacing="normal">
          <li>
            <t>The edge service site based on the best computing delay it will be the
            edge service site 1, the end-to-end (E2E) delay=22.4ms.</t>
          </li>

          <li>
            <t>The edge service site based on the best network delay it will be the
            edge service site 2, the E2E delay=23.4ms.</t>
          </li>

          <li>
            <t>The edge service site based on both of the status it will be the edge
            site 3, the E2E delay=19.4ms.</t>
          </li>
        </ul>

        <t>So, the best choice to ensure the E2E delay is edge service site 3, which
        is 19.4ms and is less than 20ms. The differences of the E2E delay is
        only 3~4ms among the three, but some of them will meet the application
        demand while the others don't.</t>

        <t>In conclusion, AR/VR clients are increasingly produced as low-end devices 
        with reduced compute capability, while the AR/VR services required are ever 
        more complex needing more computation. It makes sense, therefore, to perform 
        at least some of the computation on specialized servers across the network. 
        As the computation work gets larger, it may make sense to break it into 
        components that are processed at different and more specialized sites. All of 
        the computation must, however, be performed in a way that enables the resulting 
        streams to be delivered in a timely way. Thus, it is necessary to select service 
        sites that can cooperate, can perform the correct work, are not already overloaded, 
        and have sufficiently good network connectivity with the client. This needs to be 
        coordinated through a CATS system.</t>

        <figure anchor="Computing-Aware-AR-VR"
                title="Computing-Aware AR or VR">
          <artwork>     Light Load          Heavy Load           Normal load
   +------------+      +------------+       +------------+
   |    Edge    |      |    Edge    |       |    Edge    |
   |   Site 1   |      |   Site 2   |       |   Site 3   |
   +-----+------+      +------+-----+       +------+-----+
computing|delay(4ms)          |           computing|delay(5ms)
         |           computing|delay(10ms)         |
    +----+-----+        +-----+----+         +-----+----+
    |  Egress  |        |  Egress  |         |  Egress  |
    | Router 1 |        | Router 2 |         | Router 3 |
    +----+-----+        +-----+----+         +-----+----+
  network|delay(9ms)   network|delay(4ms)   network|delay(5ms)
         |                    |                    |
         |           +--------+--------+           |
         +-----------|  Infrastructure |-----------+
                     +--------+--------+
                              |
                         +----+----+
                         | Ingress |
         +---------------|  Router |--------------+
         |               +----+----+              |
         |                    |                   |
      +--+--+              +--+---+           +---+--+
    +------+|            +------+ |         +------+ |
    |Client|+            |Client|-+         |Client|-+
    +------+             +------+           +------+
                   client delay=1.5+7.9=9.4ms</artwork>
        </figure>

        <t>Furthermore, specific techniques may be employed to divide the
        overall rendering into base assets that are common across a number of
        clients participating in the service, while the client-specific input
        data is being utilized to render additional assets. When being
        delivered to the client, those two assets are being combined into the
        overall content being consumed by the client. The requirements for
        sending the client input data as well as the requests for the base
        assets may be different in terms of which service instances may serve
        the request, where base assets may be served from any nearby service
        instance (since those base assets may be served without requiring
        cross-request state being maintained), while the client-specific input
        data is being processed by a stateful service instance that changes,
        if at all, only slowly over time due to the stickiness of the service
        that is being created by the client-specific data. Other splits of
        rendering and input tasks can be found in <xref target="TR22.874"/> for
        further reading.</t>

        <t>When it comes to the service instances themselves, those may be
        instantiated on-demand, e.g., driven by network or client demand
        metrics, while resources may also be released, e.g., after an idle
        timeout, to free up resources for other services. Depending on the
        utilized node technologies, the lifetime of such "function as a
        service" may range from many minutes down to millisecond scale.
        Therefore, computing resources across participating edges exhibit a
        distributed (in terms of locations) as well as dynamic (in terms of
        resource availability) nature. In order to achieve a satisfying
        service quality to end users, a service request will need to be sent
        to and served by an edge with sufficient computing resource and a good
        network path.</t>
      </section>

      <section anchor="example-2-computing-aware-intelligent-transportation">
        <name>Example 2: Computing-aware Intelligent Transportation</name>

        <t>Urban intelligent transportation relies on a large number of high-quality 
        video capture devices and light detection and ranging (LiDAR) devices, whose data needs to be processed at edge service sites 
        (e.g., pedestrian flow statistics, vehicle tracking). This imposes stringent 
        requirements on the computing capabilities of edge service sites and network 
        performance, including high throughput for concurrent video stream decoding 
        and AI inference, as well as low latency for real-time decision-making. CATS 
        can address the issue by coordinating network and computing resources.</t>

        <t>In auxiliary driving scenarios (for example, "Extended Electronic Horizon" <xref target="HORITA"/>), 
        edge service sites collect road and traffic data via V2X to address blind-spot 
        and collision risks, and provide real-time warnings and manoeuvre guidance. 
        Requests are typically sent preferentially to the closest edge node. 
        However, if the closest node becomes overloaded, it may lead to response delays and 
        safety risks, which requires CATS to perform traffic steering.</t>

        <t>Specifically, delay-insensitive services (e.g., in-vehicle entertainment) can be 
        offloaded via CATS to edge service sites with lighter loads (even if they are farther away), 
        while delay-sensitive assisted driving services are preferentially processed at local 
        service sites. As mentioned in the problem statement section, CATS is an application-transparent
         network-layer solution. Unlike ALTO<xref target="RFC7285"/>, it enables coordinated scheduling of network 
         and computing resources without requiring application modifications. For moving vehicles, 
         CATS supports smooth and proactive context migration between edge nodes, provided that 
         the application allows it, to maintain service continuity. In addition, vehicle speed is 
         a key factor: faster movement requires higher frequency of metric updates (to be detailed 
         in the requirements section) to ensure that CATS steering decisions remain valid 
         as vehicles switch services among base stations or edge service sites.</t>

        <t>In video recognition scenarios, traffic surges (e.g., during rush hours or weekends) 
        can easily overload the closest edge service sites. CATS addresses this scalability challenge 
        by steering excess service requests to other appropriate sites, ensuring that processing 
        capacity matches user demand.</t>
      </section>

      <section anchor="example-3-computing-aware-digital-twin">
        <name>Example 3: Computing-aware Digital Twin</name>

        <t>A number of industry associations, such as the Industrial Digital
        Twin Association or the Digital Twin Consortium
        (https://www.digitaltwinconsortium.org/), have been founded to promote
        the concept of Digital Twin (DT) for a number of use case areas,
        such as smart cities, transportation, industrial control, among
        others. The core concept of the DT is the "administrative shell" <xref
        target="Industry4.0"/>, which serves as a digital representation of
        the information and technical functionality pertaining to the "assets"
        (such as an industrial machinery, a transportation vehicle, an object
        in a smart city or others) that is intended to be managed, controlled,
        and actuated.</t>

        <t>As an example for industrial control, the programmable logic
        controller (PLC) may be virtualized and the functionality aggregated
        across a number of physical assets into a single administrative shell
        for the purpose of managing those assets. PLCs may be virtualized in
        order to move the PLC capabilities from the physical assets to the
        edge cloud. Several PLC instances may exist to enable load balancing
        and fail-over capabilities, while also enabling physical mobility of
        the asset and the connection to a suitable "nearby" PLC instance. With
        this, traffic dynamicity may be similar to that observed in the
        connected car scenario in the previous subsection. Crucial here is
        high availability and bounded latency since a failure of the (overall)
        PLC functionality may lead to a production line stop, while boundary
        violations of the latency may lead to loosing synchronization with
        other processes and, ultimately, to production faults, tool failures
        or similar.</t>

        <t>Particular attention in Digital Twin scenarios is given to the
        problem of data storage. Here, decentralization, not only driven by
        the scenario (such as outlined in the connected car scenario for cases
        of localized reasoning over data originating from driving vehicles)
        but also through proposed platform solutions, such as those in <xref
        target="GAIA-X"/>, plays an important role. With decentralization,
        endpoint relations between client and (storage) service instances may
        frequently change as a result.</t>

        <t>In this use case, CATS is required for selecting the optimal PLC instance 
        and storage node, ensuring low latency and reliability for data processing in 
        industrial scenarios, as well as low latency for data reading/writing during 
        twin control processes.</t>
      </section>

      <section anchor="example-4-computing-aware-sd-wan">
        <name>Example 4: Computing-aware SD-WAN</name>

        <t>Software-defined Wide-area Network (SD-WAN) is an overlay connectivity service that optimizes the transport of IP packets over one or more underlay connectivity services by recognizing applications and determining forwarding behavior through the application of policies <xref target="MEF70.2"/>. SD-WAN can be deployed by both service providers and enterprises to support connectivity across branch sites, data centers, and cloud environments. Applications or services may be deployed at multiple locations to achieve performance, resiliency, or cost objectives.</t>

        <t>In current SD-WAN deployments, forwarding decisions are primarily based on network-related metrics such as available bandwidth, latency, packet loss, or path availability. However, these decisions typically lack visibility into the computing resources available at the destination sites, such as CPU or GPU utilization, memory pressure, or other composite cost metrics.</t>

        <t>CATS metrics can complement existing SD-WAN network metrics by providing information about the availability and condition of computing resources associated with service instances at edge or cloud sites. Such metrics may be consumed by a centralized SD-WAN controller when deriving policies or computing preferred paths, and/or by SD-WAN edge devices to make distributed, real-time traffic steering decisions among already-deployed service instances. In both cases, the goal is to enable application traffic to be steered towards service instances and sites that best satisfy application requirements by jointly considering network and computing conditions.</t>

        <t>For the scenario of enterprises deploying applications in the cloud, SD-WAN provides enterprises with
        centralized control over Customer-Premises Equipments(CPEs) in branch
        offices and the cloudified CPEs(vCPEs) in the clouds. The CPEs connect
        the clients in branch offices and the application servers in clouds.
        The same application server in different clouds is called an
        application instance. Different application instances have different
        computing resource.</t>

        <t>SD-WAN is aware of the computing resource of applications deployed
        in the clouds by vCPEs, and selects the application instance for the
        client to visit according to computing power and the network state of
        WAN.</t>

        <t>Additionally, in order to provide cost-effective solutions, the
        SD-WAN may also consider cost, e.g., in terms of energy prices
        incurred or energy source used, when selecting a specific application
        instance over another. For this, suitable metric information would
        need to be exposed, e.g., by the cloud provider, in terms of utilized
        energy or incurred energy costs per computing resource.</t>

        <t> <xref
        target="Computing-Aware-SD-WAN"/>  below illustrates Computing-aware SD-WAN for Enterprise
        Cloudification.</t>

<figure anchor="Computing-Aware-SD-WAN"
                title="Illustration of Computing-aware SD-WAN for Enterprise Cloudification">
            <artwork align="center">                                                    +---------------+
   +-------+                      +----------+      |    Cloud1     |
   |Client1|            /---------|   WAN1   |------|  vCPE1  APP1  |
   +-------+           /          +----------+      +---------------+
     +-------+        +-------+
     |Client2| ------ |  CPE  |
     +-------+        +-------+                     +---------------+
   +-------+           \          +----------+      |    Cloud2     |
   |Client3|            \---------|   WAN2   |------|  vCPE2  APP1  |
   +-------+                      +----------+      +---------------+
</artwork>
          </figure>

        <t>The current computing load status of the application APP1 in cloud1
        and cloud2 is as follows: each application uses 6 vCPUs. The load of
        application in cloud1 is 50%. The load of application in cloud2 is
        20%. The computing resource of APP1 are collected by vCPE1 and vCPE2
        respectively. Client1 and Client2 are visiting APP1 in cloud1. WAN1
        and WAN2 have the same network states. Considering lightly loaded
        application SD-WAN selects APP1 in cloud2 for the client3 in branch
        office. The traffic of client3 follows the path: Client3 -&gt; CPE
        -&gt; WAN2 -&gt; Cloud2 vCPE1 -&gt; Cloud2 APP1</t>
      </section>

      <section anchor="example-5-computing-aware-distributed-ai-training-and-inference">
        <name>Example 5: Computing-aware Distributed AI Training and
        Inference</name>

        <t>Artificial Intelligence (AI) large model refers to models that are
        characterized by their large size, high complexity, and high
        computational requirements. AI large models have become increasingly
        important in various fields, such as natural language processing for
        text classification, computer vision for image classification and
        object detection, and speech recognition.</t>

        <t>AI large model contains two key phases: training and inference.
        Training refers to the process of developing an AI model by feeding it
        with large amounts of data and optimizing it to learn and improve its
        performance. On the other hand, inference is the process of using the
        trained AI model to make predictions or decisions based on new input
        data.</t>

        <section anchor="distributed-ai-inference">
          <name>Distributed AI Inference</name>

          <t>With the fast development of AI large language models, more
          lightweight models can be deployed at edge service sites. <xref
          target="fig5"/> shows the potential deployment of this case.</t>

          <t>AI inference contains two major steps, prefilling and decoding.
          Prefilling processes a user's prompt to generate the first token of
          the response in one step. Following it, decoding sequentially
          generates subsequent tokens step-by-step until the termination
          token. These stages consume much computing resource. Important
          metrics for AI inference are processor cores which transform prompts
          to tokens, and memory resources which are used to store key-values
          and cache tokens. The generation and processing of tokens indicates
          the service capability of an AI inference system. Single site
          deployment of the prefilling and decoding might not provide enough
          resources when there are many clients sending requests (prompts) to
          access AI inference service.</t>

          <t>More generally, we also see the use of cost information,
          specifically on the cost for energy expended on AI inferencing of
          the overall provided AI-based service, as a possible criteria for
          steering traffic. Here, we envision (AI) service tiers being exposed
          to end users, allowing to prioritize, e.g., 'greener energy costs'
          as a key criteria for service fulfilment. For this, the system would
          employ metric information on, e.g., utilized energy mix at the AI
          inference sites and costs for energy to prioritize a 'greener' site
          over another, while providing similar response times.</t>

          <figure anchor="fig5">
            <name>Illustration of Computing-aware AI large model
            inference</name>

            <artwork align="center">
       +----------------------------------------------------------+
       |  +--------------+  +--------------+   +--------------+   |
       |  |     Edge     |  |     Edge     |   |     Edge     |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  | |  Prefill | |  | |  Prefill | |   | |  Prefill | |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  | |  Decode  | |  | |  Decode  | |   | |  Decode  | |   |
       |  | +----------+ |  | +----------+ |   | +----------+ |   |
       |  +--------------+  +--------------+   +--------------+   |
       +----------+-----------------------------+-----------------+
                  | Prompt                      | Prompt
                  |                             |
             +----+-----+                     +-+--------+
             | Client_1 |           ...       | Client_2 |
             +----------+                     +----------+
 </artwork>
          </figure>
        </section>

        <section anchor="distributed-ai-training">
          <name>Distributed AI Training</name>

          <t>Although large language models are nowadays confined to be
          trained with very large centers with computational, often GPU-based,
          resources, platforms for federated or distributed training are being
          positioned, specifically when employing edge computing
          resources <xref target="Cost-Aware-Federated-Learning-in-Mobile-Edge-Networks"/>.</t>

          <t>While those approaches apply their own (collective) communication
          approach to steer the training and gradient data towards the various
          (often edge) computing sites, we also see a case for CATS traffic
          steering here. For this, the training clusters themselves may be
          multi-site, i.e., combining resources from more than one site, but
          acting as service instances in a CATS sense, i.e., providing the
          respective training round as a service to the overall
          distributed/federated learning platform with the CATS system responsible 
          for selecting service instances and steering traffic to them.</t>

          <t>One (cluster) site can be selected over another based on compute,
          network but also cost metrics, or a combination thereof. For
          instance, training may be constrained based on the network resources
          to ensure timely delivery of the required training and gradient
          information to the cluster site, while also computational load may
          be considered, particularly when the cluster sites are multi-homed,
          thus hosting more than one application and therefore become
          (temporarily) overloaded. But equally to our inferencing use case in
          the previous section, the overall training service may also be
          constrained by cost, specifically energy aspects, e.g., when
          positioning the service utilizing the trained model is advertising
          its 'green' credentials to the end users. For this, costs based on
          energy pricing (over time) as well as the energy mix may be
          considered. One could foresee, for instance, the coupling of surplus
          energy in renewable energy resources to a cost metric upon which
          traffic is steered preferably to those cluster sites that are merely
          consuming surplus and not grid energy.</t>

          <t>Storage is also necessary for performing distributed/federated
          learning due to several key reasons. Firstly, it is needed to store
          model checkpoints produced throughout the training process, allowing
          for progress tracking and recovery in case of interruptions.
          Additionally, storage is used to keep samples of the dataset used to
          train the model, which often come from distributed sensors such as
          cameras, microphones, etc. Furthermore, storage is required to hold
          the models themselves, which can be very large and complex. Knowing
          the storage performance metrics is also important. For instance,
          understanding the I/O transfer rate of the storage helps in
          determining the latency of accessing data from disk. Additionally,
          knowing the size of the storage is relevant to understand how many
          model checkpoints can be stored or the maximum size of the model
          that can be locally stored.</t>
        </section>
      </section>

    </section>

    <section anchor="requirements">
      <name>Requirements</name>

      <t>In the following, we outline the requirements for the CATS system to
      overcome the observed problems in the realization of the use cases
      above.</t>

      <section anchor="multi-access">
        <name>Support Dynamic and Effective Selection among Multiple Service
        Instances</name>

        <t>The basic requirement of CATS is to support the dynamic access to
        different service instances residing in multiple computing sites and
        then being aware of their status, which is also the fundamental model
        to enable the traffic steering and to further optimize the network and
        computing services. A specific service is identified by a CATS service 
        identifier (CS-ID). All instances of a specific service use the same CS-ID 
        no matter at which edge service site they are located. The CS-ID is unique 
        for the service so that it unambiguously identifies the service. The mapping 
        of this CS-ID to a network locator is basic to steer traffic to any of the 
        service instances deployed in various edge service sites.</t>

        <t>Moreover, according to CATS use cases, some applications require
        E2E low latency, which warrants a quick mapping of the service
        identifier to the network locator. This leads to naturally the in-band
        methods, involving the consideration of using metrics that are
        oriented towards compute capabilities and resources, and their
        correlation with services. Therefore, a desirable system</t>

        <t>R1: <bcp14>MUST</bcp14> provide a dynamic discovery and resolution
        method for mapping CS-ID to one or more current service
        instance addresses, based on up-to-date system state assuming the CS-ID is valid.</t>

        <t>R2: <bcp14>MUST</bcp14> provide a method to dynamically assess the
        availability of service instances, based on up-to-date status metrics
        (e.g., health, load, reachability).</t>

        <t>Note: The term "up-to-date" herein refers to the latest metric information 
        collected by the system in accordance with the preset metric update cycle. 
        The principle for setting the cycle is generally pre-determined by the network. 
        For example, based on historical statistical data, a relatively appropriate update 
        cycle (either second-level or millisecond-level) is selected for a specific type or 
        certain types of services.</t>
      </section>

      <section anchor="support-agreement-on-metric-representation-and-definition">
        <name>Support Agreement on Metric Representation and Definition</name>

        <t>Computing metrics can have many different semantics, particularly
        for being service-specific. Even the notion of a "computing load"
        metric could be represented in many different ways, as with percentile-quantified 
        metrics across various categories (e.g., latency, throughput). Such
        representation may entail information on the semantics of the metric
        or it may be purely one or more semantic-free numerals. Agreement of
        the chosen representation among all service and network elements
        participating in the service instance selection decision is important.
        Therefore, a desirable system</t>

        <t>R3: The implementations <bcp14>MUST</bcp14> agree on using metrics
        that are oriented towards compute capabilities and resources and their
        representation among service instances in the participating edges, at
        both design time and runtime.</t>

        <t>To better understand the meaning of different metrics and to better
        support appropriate use of metrics,</t>

        <t>R4: An information model of the compute and network resources
        <bcp14>MUST</bcp14> be defined. Such a model <bcp14>MUST</bcp14>
        characterize how metrics are abstracted out from the compute and
        network resources. We refer to this information model as the Resource Model.</t>

        <t>R5: The Resource Model <bcp14>MUST</bcp14> be implementable in an
        interoperable manner. That is, metrics generated by this resource model 
        <bcp14>MUST</bcp14> be understood and interoperable across independent CATS implementations.</t>

        <t>R6: It <bcp14>MUST</bcp14> be possible to implement the Resource Model in a scalable 
        manner. That is, the Resource Model <bcp14>MUST</bcp14> be capable of scaling in memory, 
        energy, and processing no worse than linearly with an increase in the amount 
        of CATS metrics and CATS service instances it supports.</t>

        <t>We recognize that different network nodes, e.g., routers, switches,
        etc., may have diversified capabilities even in the same routing
        domain, let alone in different administrative domains and from
        different vendors. Therefore, to work properly in a CATS system,</t>

        <t>R7: CATS systems <bcp14>MUST</bcp14> support staleness handling for CATS metrics 
        and provide indications of when metrics should be refreshed, so that CATS components 
        can know if a metric value is valid or not.</t>

        <t>R8: All metric information used in CATS <bcp14>MUST</bcp14> be
        produced and encoded in a standardised format that is understood by all participating
        CATS components. For metrics that CATS components do not understand or
        support, CATS components will ignore them.</t> 
        

        <t>R9: CATS components <bcp14>SHOULD</bcp14> support a mechanism to
        advertise or negotiate supported metric types and encodings to ensure
        compatibility across implementations.</t>

        <t>R10: The computation and use of metrics in CATS <bcp14>MUST</bcp14>
        be designed to avoid introducing routing loops or path oscillations
        when metrics are distributed and used for path selection.</t>

        <t>Compute metrics can change rapidly, which may lead to path oscillation 
        if metrics are updated too frequently or become stale if updated too infrequently. 
        R10 ensures that CATS components can negotiate metric types for consistent 
        interpretation, while R11 requires that metrics be used in a way that avoids 
        routing loops and path instability. Together, they balance responsiveness with stability.</t>
      </section>

      <section anchor="use-of-cats-metrics">
        <name>Use of CATS Metrics</name>

        <t>Network path costs in the current routing system usually do not
        change very frequently. Network traffic engineering metrics (such as
        available bandwidth) may change more frequently as traffic demands
        fluctuate, but distribution of these changes is normally damped so
        that only significant changes cause routing protocol messages.</t>

        <t>However, metrics that are oriented towards compute capabilities and
        resources in general can be highly dynamic, e.g., changing rapidly
        with the number of sessions, the CPU/GPU utilization and the memory
        consumption, etc. Service providers must determine at what interval or
        based on what events such information needs to be distributed. Overly
        frequent distribution with more accurate synchronization may result in
        unnecessary overhead in terms of signaling.</t>

        <t>Moreover, depending on the service related decision logic, one or
        more metrics need to be conveyed in a CATS domain (that is, between the 
        clients, services, decision-making points, and traffic steering elements 
        cooperating to perform CATS function). The problem to be addressed here 
        may be the frequency of such conveyance, and which CATS component is the 
        decision maker for the service instance selection
        should also be considered. Thereby, choosing appropriate protocols for
        conveying CATS metrics is important. While existing routing protocols
        may serve as a baseline for signaling metrics, for example, BGP
        extensions<xref target="RFC4760"/> and GRASP<xref target="RFC8990"/>.
        These routing protocols may be more suitable for distributed systems.
        Considering about some centralized approaches to select CATS service
        instances, other means to convey the metrics can equally be chosen and
        even be realized, for example, leveraging restful API for publication of CATS metrics to a
        centralized decision maker. Specifically, a desirable system,</t>

        <t>R11: <bcp14>MUST</bcp14> provide mechanisms for metric
        collection, including specifying the responsible entity for collection.</t>

        <t>Collecting metrics from all of the services instances may incur
        much overhead for decision makers. Hierarchical aggregation helps reduce 
        this burden by consolidating metrics at intermediate nodes, providing a 
        more scalable and efficient view of resource conditions.</t>

        <t>CATS components do not need to be aware of how metrics are
        collected behind the aggregator. The decision point may not be
        directly connected with service instances or metric collectors,
        therefore,</t>

        <t>R12: <bcp14>MUST</bcp14> provide mechanisms to distribute the
        metrics.</t>

        <t>There may be various update frequencies for different computing
        metrics. Some of the metrics may be more dynamic, while others are
        relatively static. Accordingly, different distribution methods may
        need to be chosen with respect to different update frequencies of
        different metrics. Therefore a system,</t>
      
        <t>R13: <bcp14>MUST</bcp14> continue to operate (even if sub-optimally) if metric updates 
       are delayed by low frequency updates or by problems with the mechanisms used to 
       distribute the metrics.</t>

        <t>For example, In highly mobile scenarios, such as fast-moving vehicles 
        mentioned in <xref target="example-2-computing-aware-intelligent-transportation"/>, 
        compute metrics can quickly become outdated as the UE moves across base 
        stations and edge service sites, potentially requiring more frequent updates. 
        However, updates should remain stable and avoid excessive overhead.</t>

      </section>

      <section anchor="session-continuity">
        <name>Support Instance Affinity</name>

        <t>In the CATS system, a service may be provided by one or more
        service instances that would be deployed at different locations in the
        network. Each instance provides equivalent service functionality to
        its respective clients. The decision logic of the instance selection
        is subject to the packet level communication and packets are forwarded
        based on the operating status of both network and computing resources.
        This resource status will likely change over time, leading to
        individual packets potentially being sent to different network
        locations, possibly segmenting individual service transactions and
        breaking service-level semantics. Moreover, when a client moves, the
        access point might change and successively lead to the migration of
        service instances. If execution changes from one (e.g., virtualized)
        service instance to another, state/context needs to be transferred to
        the new instance. Such required transfer of state/context makes it
        desirable to have instance affinity as the default, removing the need
        for explicit context transfer, while also supporting an explicit
        state/context transfer (e.g., when metrics change significantly).</t>

        <t>The nature of this affinity is highly dependent on the nature of
        the service, which could be seen as an 'instance affinity' to
        represent the relationship. The minimal affinity of a single request
        represents a stateless service, where each service request may be
        responded to without any state being held at the service instance for
        fulfilling the request.</t>

        <t>Providing any necessary information/state in the manner of in-band
        as part of the service request, e.g., in the form of a multi-form body
        in an HTTP request or through the URL provided as part of the request,
        is one way to achieve such stateless nature.</t>

        <t>Alternatively, the affinity to a particular service instance may
        span more than one request, as in the AR/VR use case, where the
        previous client input is needed to render subsequent frames.</t>

        <t>However, a client, e.g., a mobile UE, may have many applications
        running. If all, or majority, of the applications request the CATS-
        based services, then the runtime states that need to be created and
        accordingly maintained would require high granularity. In the extreme
        scenario, this granular requirement could reach the level of per-UE,
        per-APP, and per-(sub)flow with regard to a service instance, where 
        a 'flow' is a logical grouping of packets during a time interval, identified 
        by some fields from the packet header, such as the 5-tuple transport 
        coordinates (source address and destination address, source and destination port numbers, 
        and protocol) (see also <xref target="I-D.ietf-cats-framework"/>).
        Evidently, these fine-granular runtime states can potentially place a
        heavy burden on network devices if they have to dynamically create and
        maintain them. On the other hand, it is not appropriate either to
        place the state-keeping task on clients themselves.</t>

        <t>Besides, there might be the case that UE moves to a new (access)
        network or the service instance is migrated to another cloud, which
        cause the unreachable or inconvenient of the original service
        instance. So the UE and service instance mobility also need to be
        considered.</t>

        <t>Therefore, a desirable system,</t>

        <t>R14: CATS systems <bcp14>MUST</bcp14> maintain instance affinity
        for stateful sessions and transactions on a per-flow basis.</t>

        <t>R15: <bcp14>MUST</bcp14> avoid maintaining per-flow states for
        specific applications in network nodes for providing instance
        affinity.</t>

        <t>R16: <bcp14>SHOULD</bcp14> support service continuity in the
        presence of UE or service instance mobility.</t>
      </section>

      <section anchor="preserve-communication-confidentiality">
        <name>Preserve Communication Confidentiality</name>

        <t>Exposing CATS metrics to the network may lead to the leakage of
        application privacy. In order to prevent it, it is necessary to
        consider the methods to handle the sensitive information. For
        instance, using general anonymization methods, including hiding the
        key information representing the identification of devices, or using
        an index to represent the service level of computing resources, or
        using customized information exposure strategies according to specific
        application requirements or network scheduling requirements. At the
        same time, when anonymity is achieved, it is important to ensure that
        the exposed computing information remains sufficient to enable
        effective traffic steering. Therefore, a CATS system</t>

        <t>R17: <bcp14>MUST</bcp14> preserve the confidentiality of the
        communication relation between a user and a service provider by
        minimizing the exposure of user-relevant information according to
        user's demands, but allowing for regulatory requirements in the 
        environment where CATS is deployed. See also <xref target="security-considerations"/> 
        for a discussion of confidentiality.</t>
      </section>

      <section anchor="correlation-between-use-cases-and-requirements">
        <name>Correlation between Use Cases and Requirements</name>

        <t>A table is presented in this section to better illustrate the
        correlation between CATS use cases and requirements, 'X' is for
        marking that the requirement can be derived from the corresponding use
        case.</t>

        <figure anchor="fig6">
          <name>Mapping between CATS Use Cases and Requirements</name>

          <artwork>
            +-------------------------------------------------+
            |                |           Use cases            |
            +--Requirements--+-----+-----+------+------+------+
            |                |AR/VR| ITS |  DT  |SD-WAN|  AI  |
            +-----------+----+-----+-----+------+------+------+
            | Instance  | R1 |  X  |  X  |  X   |  X   |  X   |
            | Selection +----+-----+-----+------+------+------+
            |           | R2 |  X  |  X  |  X   |  X   |  X   |
            +-----------+----+-----+-----+------+------+------+
            |           | R3 |  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |           | R4 |  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |  Metric   | R5 |  X  |  X  |  X   |  X   |  X   |
            |Definition +----+-----+-----+------+------+------+
            |           | R6 |  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |           | R7 |  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |           | R8 |  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |           | R9 |  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |           | R10|  X  |  X  |  X   |  X   |  X   |
            +-----------+----+-----+-----+------+------+------+
            |           | R11|  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |  Use of   | R12|  X  |  X  |  X   |  X   |  X   |
            |  Metrics  +----+-----+-----+------+------+------+
            |           | R13|  X  |  X  |  X   |  X   |  X   |
            +-----------+----+-----+-----+------+------+------+
            |           | R14|  X  |  X  |  X   |  X   |  X   |
            | Instance  +----+-----+-----+------+------+------+
            | Affinity  | R15|  X  |  X  |  X   |  X   |  X   |
            |           +----+-----+-----+------+------+------+
            |           | R16|  X  |  X  |      |      |  X   |
            +-----------+----+-----+-----+------+------+------+
            | Confiden- | R17|  X  |  X  |  X   |  X   |  X   |
            | -tiality  |    |     |     |      |      |      |
            +-----------+----+-----+-----+------+------+------+ </artwork>
        </figure>
      </section>
    </section>

    <section anchor="security-considerations">
      <name>Security Considerations</name>

      <t>CATS decision-making relies on real-time computing and network status as well as service information, requiring robust security safeguards to mitigate risks associated with dynamic service and resource scheduling, and cross-node data transmission.</t>

      <t>Core Security Risks and Requirements include:</t>

      <t>* User Privacy Leakage Risk</t>
      <t>Description: CATS involves user-related data (e.g., access patterns, service requests) across edge service sites. Unauthorized disclosure of user identifiers or per-user behavior tracking risks profiling or identity theft, especially in use cases with personal/context-rich data (e.g., AR/VR, vehicle trajectories, AI prompts), violating regulations and eroding trust.</t>
      <t>R19: User activity privacy <bcp14>MUST</bcp14> be preserved by anonymizing identifying information. Per-user behavior pattern tracking is prohibited.</t>

      <t>* Service Instance Identity Spoofing and Traffic Hijacking</t>
      <t>Description: Attackers may spoof legitimate service instance identities or tamper with "CS-ID-instance address" mappings (per R1), diverting traffic to malicious nodes. This undermines CATS' core scheduling logic, causing service disruptions, data leaks, and potential physical harm in safety-critical scenarios.</t>
      <t>R20: Service instances <bcp14>MUST</bcp14> be authenticated. and 
      digital signatures <bcp14>SHOULD</bcp14> be used to provide proof of authentication.
       "CS-ID - instance address" mapping results <bcp14>MUST</bcp14> be encrypted.</t>

      <t>* Tampering and False Reporting of CATS Metrics</t>
      <t>Description: Attackers may tamper with core scheduling metrics or submit false data (per R3-R17), misleading traffic steering decisions. This leads to node overload, link congestion, or "resource exhaustion attacks," directly degrading Quality of Experience (QoE). </t>
      <t>R21: Metric collection and distribution <bcp14>MUST</bcp14> employ integrity checks and encryption. Mechanisms for secondary validation and traceability of abnormal metrics <bcp14>MUST</bcp14> be supported, avoiding over-reliance on single-node reports.</t>

      <t>* Security of Cross-Node Context Migration Data</t>
      <t>Description: During user or terminal mobility, session states and computing context (e.g., AR rendering progress, vehicle status) may be intercepted or tampered with during cross-node migration (per R18-R22). This impairs service continuity, leaks sensitive data, or causes state inconsistency. </t>
      <t>R22: Migration data <bcp14>MUST</bcp14> use end-to-end encryption, accessible only to authorized target instances using, for example, Authenticated Encryption with Associated Data (AEAD). Migration instructions <bcp14>MUST</bcp14> include integrity check codes.</t>
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>

      <t>This document makes no requests for IANA action.</t>
    </section>
  </middle>

  <back>
    <references anchor="sec-combined-references">
      <name>References</name>

      <references anchor="sec-normative-references">
        <name>Normative References</name>

          &RFC2119;
          &RFC4760;
          &RFC4786;
          &RFC7285;
          &RFC8174;
          &RFC8990;
          &I-D.ietf-cats-framework;

      </references>

      <references anchor="sec-informative-references">
        <name>Informative References</name>
           &I-D.dcn-cats-req-service-segmentation;

        <reference anchor="TR22.874">
          <front>
            <title>Study on traffic characteristics and performance
            requirements for AI/ML model transfer in 5GS (Release 18)</title>

            <author>
              <organization>3GPP</organization>
            </author>

            <date year="2021"/>
          </front>
        </reference>


        <reference anchor="HORITA">
          <front>
            <title>Extended electronic horizon for automated driving</title>

            <author fullname="Y. Horita" initials="Y." surname="Horita">
              <organization/>
            </author>

            <date year="2015"/>
          </front>

          <refcontent>Proceedings of 14th International Conference on ITS
          Telecommunications (ITST)</refcontent>
        </reference>

        <reference anchor="Industry4.0">
          <front>
            <title>Details of the Asset Administration Shell, Part 1 &amp;
            Part 2</title>

            <author>
              <organization>Industry4.0</organization>
            </author>

            <date year="2020"/>
          </front>
        </reference>

        <reference anchor="GAIA-X">
          <front>
            <title>GAIA-X: A Federated Data Infrastructure for Europe</title>

            <author fullname="Gaia-X" initials="" surname="Gaia-X">
              <organization/>
            </author>

            <date year="2021"/>
          </front>
        </reference>

      <reference anchor="MEF70.2">
        <front>
          <title>SD-WAN Service Attributes and Service Framework</title>
          <author fullname="MEF" role="editor"/>
          <date year="2023"/>
        </front>
      </reference>

      <reference anchor="Cost-Aware-Federated-Learning-in-Mobile-Edge-Networks">
        <front>
          <title>Cost-Aware Federated Learning in Mobile Edge Networks</title>
          <author initials="Q." surname="Gu" />
          <author initials="K." surname="Jiang" />
          <author initials="L." surname="Zhao" />
          <author initials="H." surname="Zhou" />
          <author initials="T." surname="Jiang" />
          <date year="2024"/>
        </front>
        <seriesInfo name="Publisher:" value="Association for Computing Machinery" />
        <seriesInfo name="Available:" value="https://dl.acm.org/doi/10.1145/3694908.3696173" /> 
      </reference>

      </references>
    </references>

    <section anchor="appendix-a">
      <name>Appendix A</name>

      <t>This section presents an additional CATS use case, which is not
      included in the main body of this document. Reasons are that the use
      case may bring new requirements that are not considered in the initial
      charter of CATS working group. The requirements impact the design of
      CATS framework and may need further modification or enhancement on the
      initial CATS framework that serves all the existing use cases listed in
      the main body. However, the ISAC use case is promising and has gained
      industry consensus. Therefore, this use case may be considered in future
      work of CATS working group.</t>

      <section anchor="isac-uc"
               title="Integrated Sensing and Communications (ISAC)">
        <t>Integrated Sensing and Communications (ISAC) enables wireless
        networks to perform simultaneous data transmission and environmental
        sensing. In a distributed sensing scenario, multiple network nodes
        --such as base stations, access points, or edge devices-- collect raw
        sensing data from the environment. This data can include radio
        frequency (RF) reflections, Doppler shifts, channel state information
        (CSI), or other physical-layer features that provide insights into
        object movement, material composition, or environmental conditions. To
        extract meaningful information, the collected raw data must be
        aggregated and processed by a designated computing node with
        sufficient computational resources. This requires efficient
        coordination between sensing nodes and computing resources to ensure
        timely and accurate analysis, making it a relevant use case for
        Computing-Aware Traffic Steering (CATS) in IETF.</t>

        <t>This use case aligns with ongoing efforts in standardization bodies
        such as the ETSI ISAC Industry Specification Group (ISG), particularly
        Work Item #5 (WI#5), titled 'Integration of Computing with ISAC'. WI#5
        focuses on exploring different forms of computing integration within
        ISAC systems, including sensing combined with computing,
        communications combined with computing, and the holistic integration
        of ISAC with computing. The considerations outlined in this document
        complement ETSI's work by examining how computing-aware networking
        solutions, as developed within CATS, can optimize the processing and
        routing of ISAC sensing data.</t>

        <t>As an example, we can consider a network domain with multiple sites
        capable of hosting the ISAC computing "service", each with potentially
        different connectivity and computing characteristics. <xref
        target="isac"/> shows an exemplary scenario. Considering the
        connectivity and computing latencies (just as an example of metrics),
        the best service site is #n-1 in the example used in the Figure. Note
        that in the figure we still use the old terminology in which by ICR we
        mean Ingress CATS-Forwarder <xref target="I-D.ietf-cats-framework"/>,
        and by ECR we mean Egress CATS-Forwarder.</t>

        <figure anchor="isac" title="Exemplary ISAC Scenario">
          <artwork>
                            _______________
                           (     --------  )
                          (     |        |  )
                         (     --------  |   )
   ________________     (     |        | |   )     ________________
  (      --------  )    (    --------- | |   )    (      --------  )
 (      |        |  )   (   |service | |-    )   (      |        |  )
(      --------  |   )  (   |contact | |     )  (      --------  |  )
(     |        | |   )  (   |instance|-      )  (     |        | |  )
(    --------  | |   )   (   ---------       )  (    --------  | |  )
(   |service | |-    )    ( Serv. site #N-1 )   (   |service | |-   )
(   |contact | |     )     -------+---------    (   |contact | |   )
(   |instance|-     )   Computing  \             (  |instance|-    )
 (   --------      )    delay:4ms   \             (  --------      )
  ( Serv. site #1 )            ------+--           ( Serv. site #N )
   -------+-------        ----| ECR#N-1 |----       ---------+-----
           \  Computing --     ---------      --  Computing  /
            \ delay:10ms      Networking          delay:5ms /
           --+----            delay:7ms               -----+-
        ( | ECR#1 |            //                    | ECR#N | )
       (   -------            //                      -------   )
      ( Networking           //                       Networking )
     (  delay:5ms           //                         delay:15ms )
    (                      //                                      )
    (                     //                                       )
     (                   //                                       )
      (                 //                                       )
       (               //                                       )
        (        -------                       -------         )
         -------| ICR#1 |---------------------| ICR#2 |--------
                 -------            __         -------
                (.)   (.)        / (  )          (.)
               (.)    -----    -  (    )         (.)
              (.)    | UE2 | /     (__) \        (.)
             (.)      -----     /         -    -----
            (.)               /  (sensing) \  | UE3 |
           -----    -----------                -----
          | UE1 | /
           -----
</artwork>
        </figure>

        <t>In the distributed sensing use case, the sensed data collected by
        multiple nodes must be efficiently routed to a computing node capable
        of processing it. The choice of the computing node depends on several
        factors, including computational load, network congestion, and latency
        constraints. CATS mechanisms can optimize the selection of the
        processing node by dynamically steering the traffic based on computing
        resource availability and network conditions. Additionally, as sensing
        data is often time-sensitive, CATS can ensure low-latency paths while
        balancing computational demands across different processing entities.
        This capability is essential for real-time applications such as
        cooperative perception for autonomous systems, industrial monitoring,
        and smart city infrastructure.</t>

        <section anchor="isac-uc-reqs" title="Requirements">
          <t>In addition to some of the requirements already identified for
          CATS in the main body of this document, there are several additional
          challenges and requirements that need to be addressed for efficient
          distributed sensing in ISAC-enabled networks:</t>

          <t>CATS systems should be able to select an instance
          where multiple nodes can steer traffic to simultaneously, ensuring
          that packets arrive within a maximum time period. This is required
          because there are distributed tasks in which there are multiple
          nodes acting as sensors that produce sensing data that has to be
          then processed by a sensing processing function, typically hosted at
          the edge. This implies that there is a multi-point to point kind of
          direction of the traffic, with connectivity and computing
          requirements associated (which can be very strict for some types of
          sensing schema).</t>

          <t>CATS systems should provide mechanisms that implement
          per node/flow security and privacy policies to adapt to the nature
          of the sensitive information that might be exchanged in a sensing
          task.</t>
        </section>
      </section>
    </section>

    <section anchor="Acknowledgements" numbered="false">
      <name>Acknowledgements</name>

      <t>The authors would like to thank Adrian Farrel, Peng Liu, Joel
      Halpern, Jim Guichard, Cheng Li, Luigi Iannone, Christian Jacquenet,
      Xiaodong Duan, Yuexia Fu, Huijuan Yao, Zongpeng Du, Jing Wang, Erum Welling, Ines Robles, Linda Dunbar, Jim Reid, 
      Zaheduzzaman Sarker, Tim Bray, Samier Barguil, Daniel Migault, Roni Even, Roman Danyliw, 
      Gorry Fairhurst, Ketan Talaulikar, Andy Newton, Deb Cooley, Erik Kline, and Paul Wouters 
      for their valuable suggestions to this document.</t>

      <t>The authors would like to thank Yizhou Li for her early IETF work of
      Compute First Network (CFN) and Dynamic Anycast (Dyncast) which inspired
      the CATS work.</t>
    </section>

    <section anchor="contributors" numbered="false" removeInRFC="false"
             toc="include">
      <name>Contributors</name>

      <t>The following people have substantially contributed to this
      document:</t>

      <contact fullname="Yizhou Li" initials="Y." surname="Li">
        <organization>Huawei Technologies</organization>

        <address>
          <email>liyizhou@huawei.com</email>
        </address>
      </contact>

      <contact fullname="Dirk Trossen" initials="D." surname="Trossen">
        <organization/>

        <address>
          <email>dirk@trossen.tech</email>
        </address>
      </contact>

      <contact fullname="Mohamed Boucadair" initials="M." surname="Boucadair">
        <organization>Orange</organization>

        <address>
          <email>mohamed.boucadair@orange.com</email>
        </address>
      </contact>

      <contact fullname="Carlos J. Bernardos" initials="CJ."
               surname="Bernardos">
        <organization>UC3M</organization>

        <address>
          <email>cjbc@it.uc3m.es</email>
        </address>
      </contact>

      <contact fullname="Peter Willis" initials="P." surname="Willis">
        <organization/>

        <address>
          <email>pjw7904@rit.edu</email>
        </address>
      </contact>

      <contact fullname="Philip Eardley" initials="P." surname="Eardley">
        <organization/>

        <address>
          <email>ietf.philip.eardley@gmail.com</email>
        </address>
      </contact>

      <contact fullname="Tianji Jiang" initials="T." surname="Jiang">
        <organization>China Mobile</organization>

        <address>
          <email>tianjijiang@chinamobile.com</email>
        </address>
      </contact>

      <contact fullname="Minh-Ngoc Tran" initials="N." surname="Tran">
        <organization>ETRI</organization>

        <address>
          <email>mipearlska@etri.re.kr</email>
        </address>
      </contact>

      <contact fullname="Markus Amend" initials="M." surname="Amend">
        <organization>Deutsche Telekom</organization>

        <address>
          <email>Markus.Amend@telekom.de</email>
        </address>
      </contact>

      <contact fullname="Guangping Huang" initials="G." surname="Huang">
        <organization>ZTE</organization>

        <address>
          <email>huang.guangping@zte.com.cn</email>
        </address>
      </contact>

      <contact fullname="Dongyu Yuan" initials="D." surname="Yuan">
        <organization>ZTE</organization>

        <address>
          <email>yuan.dongyu@zte.com.cn</email>
        </address>
      </contact>

      <contact fullname="Xinxin Yi" initials="X." surname="Yi">
        <organization>China Unicom</organization>

        <address>
          <email>yixx3@chinaunicom.cn</email>
        </address>
      </contact>

      <contact fullname="Tao Fu" initials="T." surname="Fu">
        <organization>CAICT</organization>

        <address>
          <email>futao@caict.ac.cn</email>
        </address>
      </contact>

      <contact fullname="Jordi Ros-Giralt" initials="J." surname="Ros-Giralt">
        <organization>Qualcomm Europe, Inc.</organization>

        <address>
          <email>jros@qti.qualcomm.com</email>
        </address>
      </contact>

      <contact fullname="Jaehoon Paul Jeong" initials="J. P." surname="Jeong">
        <organization>Sungkyunkwan University</organization>

        <address>
          <email>pauljeong@skku.edu</email>
        </address>
      </contact>

      <contact fullname="Yan Wang" initials="Y." surname="Wang">
        <organization>Migu Culture Technology Co.,Ltd</organization>

        <address>
          <email>wangyan_hy1@migu.chinamobile.com</email>
        </address>
      </contact>
    </section>
  </back>

</rfc>
