Blob Blame History Raw
From 48dfb211ef1ecbcee39df4647d905a924b95fcb3 Mon Sep 17 00:00:00 2001
From: Numan Siddique <numans@ovn.org>
Date: Sat, 13 Feb 2021 12:38:37 +0530
Subject: [PATCH] controller: Fix toggling ct zone ids.

ovn-controller maintains a shash of pending ct zone entries
to flush the ct zone ids and to store/remove the allocated zone id
in/from the OVS bridge.external_ids.  While adding an entry to
the shash, the function 'add_pending_ct_zone_entry()' doesn't
check for existing entry for the same key in shash.  If suppose
there are multiple entries for the samestring key, then it results in
an infinite loop of adding and deleting the key entries in
OVS bridge.external_ids.

The pending ct zone entries are deleted from the shash when they
reach the state - CT_ZONE_DB_SENT and when
ovsdb_idl_loop_commit_and_wait(ovsidl) returns 1.  In a highly loaded
compute node this loop gets triggered when this function doesn't return
1 and there are duplicate ct zone entries.

These duplicate entries are mostly observed for logical ports of
type 'virtual' when this virtual port keeps moving from one chassis
to another.  But this scenario can get triggered for other logical ports
too.

*********************
2021-02-12T17:25:56.974Z|04363|jsonrpc|DBG|unix:/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"mutations":[["external_ids","delete",["set",["ct-zone-313f5395-5170-4c4c-8820-e6288a765d27","ct-zone-efa8530b-93d5-45b3-b16f-edfe1856233c"]]],["external_ids","insert",["map",[["ct-zone-da4a2f90-aef9-4fea-ad6d-aed364fa9988","63"],["ct-zone-7a97e014-a753-473b-a0f3-b2ec6e9f11d6","71"]]]]],"where":[["_uuid","==",["uuid","653e7315-47b3-4c39-a5f9-665aa3dddb9e"]]],"op":"mutate","table":"Bridge"},{"comment":"ovn-controller\novn-controller: modifying OVS tunnels 'e3af60d7-3942-4aa2-84ad-e02dcd3b183d'","op":"comment"}], id=4336
2021-02-12T17:25:56.979Z|04364|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Bridge":{"653e7315-47b3-4c39-a5f9-665aa3dddb9e":{"modify":{"external_ids":["map",[["ct-zone-313f5395-5170-4c4c-8820-e6288a765d27","71"],["ct-zone-7a97e014-a753-473b-a0f3-b2ec6e9f11d6","71"],["ct-zone-da4a2f90-aef9-4fea-ad6d-aed364fa9988","63"],["ct-zone-efa8530b-93d5-45b3-b16f-edfe1856233c","63"]]]}}}}]
2021-02-12T17:25:56.988Z|04365|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received reply, result=[{"count":1},{}], id=4336
2021-02-12T17:25:57.006Z|04366|jsonrpc|DBG|unix:/run/openvswitch/db.sock: send request, method="transact", params=["Open_vSwitch",{"mutations":[["external_ids","delete",["set",["ct-zone-da4a2f90-aef9-4fea-ad6d-aed364fa9988","ct-zone-7a97e014-a753-473b-a0f3-b2ec6e9f11d6"]]],["external_ids","insert",["map",[["ct-zone-313f5395-5170-4c4c-8820-e6288a765d27","71"],["ct-zone-efa8530b-93d5-45b3-b16f-edfe1856233c","63"]]]]],"where":[["_uuid","==",["uuid","653e7315-47b3-4c39-a5f9-665aa3dddb9e"]]],"op":"mutate","table":"Bridge"},{"comment":"ovn-controller\novn-controller: modifying OVS tunnels 'e3af60d7-3942-4aa2-84ad-e02dcd3b183d'","op":"comment"}], id=4337
2021-02-12T17:25:57.011Z|04367|jsonrpc|DBG|unix:/run/openvswitch/db.sock: received notification, method="update3", params=[["monid","Open_vSwitch"],"00000000-0000-0000-0000-000000000000",{"Bridge":{"653e7315-47b3-4c39-a5f9-665aa3dddb9e":{"modify":{"external_ids":["map",[["ct-zone-313f5395-5170-4c4c-8820-e6288a765d27","71"],["ct-zone-7a97e014-a753-473b-a0f3-b2ec6e9f11d6","71"],["ct-zone-da4a2f90-aef9-4fea-ad6d-aed364fa9988","63"],["ct-zone-efa8530b-93d5-45b3-b16f-edfe1856233c","63"]]]}}}}]
...
...
*************************

This patch fixes this issue by using shash_replace() when adding the
entry to the shash.

Note: I was not able to reproduce the issue with a test setup and
hence couldn't add test cases.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1903210
Signed-off-by: Numan Siddique <numans@ovn.org>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Numan Siddique <numans@ovn.org>

(cherry-picked from master commit e7788554a7f5e824fc0d8afc6cbf20e94fe4245f)
---
 controller/ovn-controller.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 5599ea4d9..fc2a7673c 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -587,7 +587,18 @@ add_pending_ct_zone_entry(struct shash *pending_ct_zones,
     pending->state = state; /* Skip flushing zone. */
     pending->zone = zone;
     pending->add = add;
-    shash_add(pending_ct_zones, name, pending);
+
+    /* Its important that we add only one entry for the key 'name'.
+     * Replace 'pending' with 'existing' and free up 'existing'.
+     * Otherwise, we may end up in a continuous loop of adding
+     * and deleting the zone entry in the 'external_ids' of
+     * integration bridge.
+     */
+    struct ct_zone_pending_entry *existing =
+        shash_replace(pending_ct_zones, name, pending);
+    if (existing) {
+        free(existing);
+    }
 }
 
 static void
-- 
2.29.2