Date: 2-Jan-2025
Write-up for the Clock Changes Proposed for QFX5240-64OD
Problem Statement
All server-facing ports flap at random across multiple Juniper QFX5240-64OD access switches.
Description
There are multiple fronts for the issue of troubleshooting ranging from temperature to power to internal clocks. The depth of individual testing might be time-consuming and hence we propose a preventive measure by isolating a source of clock from the critical path.
The plan is to roll out the changes in 5 devices where we have seen flaps recently and observe if there are any undesirable issues. Based on the outcome, we can roll out the same changes to other switches in DH3.
Proposed Rollout Plan
Phase 1 – DH3 through Scripts
- Day 0: 5 devices
- Day 1: Additional 25 devices
- Day 2: Entire data hall
- Monitor: Overall DH3 health for a week
- Caveats: Script requires execution after every power cycle/reboot
Phase 2 – Software Patch
- All data halls
- Targeted timeline: 01/08/2025
Phase 3 – Software Upgrade
- Upcoming deployments
- D44 image will also have enhancements for other xAI asks
- Targeted timeline: 4th week of January
Detailed Description
The QFX5240 uses a clock synthesizer to generate TH5 reference clocks. This synthesizer has its own crystal and can generate TH5 clocks independently, but it can also synchronize its clock output to another clock source. On the QFX5240, the other clock source comes from a second clocking device used for synchronous ethernet and PTP boundary clocking.
Since these modes are not in use, the first change is to have the clock synthesizer generate the TH5 clock on its own rather than synchronizing to the second device. The second change involves updating a synthesizer parameter recommended by the vendor to improve stability.
Commands
To Configure
i2cset -y 1 9 0xfd 0x5; i2cset -y 1 9 0x4 0x62
i2cset -y 1 9 0xfd 0x2; i2cset -y 1 9 0x40 0x18
To Verify
i2cset -y 1 9 0xfd 0x5; i2cget -y 1 9 0x4
# Expected: 0x62 (verifies last written)
i2cset -y 1 9 0xfd 0x2; i2cget -y 1 9 0x40
# Expected: 0x18 (verifies last written)
Rollback Strategies
Option 1 – Power Cycle (Preferred)
- i2c registers reset after a power cycle.
- A normal reboot will not reset these values.
Option 2 – Unset i2c values via command
i2cset -y 1 9 0xfd 0x5; i2cset -y 1 9 0x4 0x66
i2cset -y 1 9 0xfd 0x2; i2cset -y 1 9 0x40 0x78
⚠️ This option puts the isolated clock back in the critical path and may lead to interface flaps. Contact JTAC if undesired behavior occurs.
JSU Installation and Options
Image location:
/volume/evoimages/release/evo/rel/23.4X100-D40-J1/rel_23.4X100-D40-J1.1
Apply on top of 23.4X100-D40.7-EVO.
Test Scenarios
Scenario 1: System already configured with i2cset commands
Do not reboot after JSU installation. Stage JSU and let the system run for a day, then power-cycle and verify:
request system software add /var/tmp/junos-evo-install-qfx-ms-x86-64-23.4X100-D40-J1.1-EVO.iso
Verify:
i2cset -y 1 9 0xfd 0x5; i2cget -y 1 9 0x4
i2cset -y 1 9 0xfd 0x2; i2cget -y 1 9 0x40
show trace application clockd | grep RS32312
Scenario 2: System not configured with i2cset commands
Use reboot option:
request system software add /var/tmp/junos-evo-install-qfx-ms-x86-64-23.4X100-D40-J1.1-EVO.iso reboot
Scenario 3: Apply JSU with restart option
Recommended for X.AI if systems are on UTC:
request system software add /var/tmp/junos-evo-install-qfx-ms-x86-64-23.4X100-D40-J1.1-EVO.iso restart
Juniper Business Use Only