VTUNE and VPU hotspot analysis

I'm using VTUNE to look at the hotspots in my code. I'm down into the vector instructions and confused by a few things. I understand that VTUNE isn't a cycle accurate simulator, but why it is that I see things like the following:

   ...
   vmovaps %zmm8, %k0, %zmm0                324.718 ms
   vmovaps %zmm0, %k0, %zmm8                 84.007 ms
   vpslld $0x1f, %zmm2, %k0, %zmm1           96.931 ms
   vpsrld $0x01, %zmm2, %k0, %zmm2          134.087 ms
   vpord %zmm2, %zmm1, %k0, %zmm1           143.781 ms
   vmovaps %zmm9, %k0, %zmm2                245.558 ms
   vmovaps %zmm21, %k0, %zmm9                75.929 ms
   ...

As far a I can tell there are no data dependencies to prior instructions.

Q1: why are the movaps times all over the map?

Q2: why is vpssld 30% different than vpslrd?

Q3: why is there no indication of a pipeline stall on the vpord (due to the prior vpslld/vpsrld instructions)?

Q4: though I can't show it here, all my "CPU time" histogram bars in the source and assembly windows are red indicating "poor". How is a single vmovaps deemed to be "poor" (vs. Idle, Ok, Ideal, Over)?

My application is compiled with icc -g -debug extended -debug inline-debug-info -debug expr-source-pos -std=c9x -O3 -Wall -openmp -offload ...

Extra credit: It appear that the compiler (icc) is reluctant to rearrange vector instructions to avoid data dependency pipeline stalls especially if the instruction come from different source code expessions (i.e. different source lines). Is there some information as to how aggressive I should expect that compiler optimization to be?

VTUNE and VPU hotspot analysis

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112