The latest Intel processors developed for the tablets and large phones have TDP 4.5W and still almost beat 150W (if overclocked) desktop processors. Here are results of the same linear algebra tests we have wrote here a year ago
First table shows the times for the matrix algebra tests you have seen here before which uses almost the fastest processor on the planet a year ago and which is still probably near the top because of being overclocked to 4.5GHz (fastest today are close to 4.9-5.0 GHz with water cooling, fastest in terms of single core performance)
i7 4770k 4.5GHz overclocked, 4 cores/8 threads
matrix size --> 1000 2000 3000 4000
--------------------------------------------
Dense/Block 2.22 30.4 127. 297.
Dense/Block Tr. 0.20 2.06 7.36 17.5
SSE 0.12 1.81 6.70 16.2
LAIPE 0.09 0.75 2.44 5.90
And here is Lenovo Yoga 3 Pro thinnest and lightest convertible 13.3' tablet-laptop
i7 5Y70 1.1-2.6 GHz (turbo) 2 cores/4 threads, Lenovo Yoga3 Pro tablet
matrix size --> 1000 2000 3000 4000 5000 6000
--------------------------------------------------------------
Dense/Block 1.9 23.5 112 335. xxxx xxxx
Dense/Block Tr. 0.94 7.5 26.7 65.7 128. xxxx
SSE 0.27 2.9 8.8 20.7 42.9 73.7
LAIPE 0.2 2.1 7.0 22.1 50.1 90.4
And look at that, DavidB's SSE method is beating parallel LAIPE! Possibly the thermal throttling is the reason why two cores can not work for a long time in parallel. The CPUZ shows that multiplier gradualy drops from 26 to 20 in SSE case (means 2.6GHz to 2.0GHz) while in LAIPE one it drops from 26 to 16.
By the way, Intel planned to add more instructions to its extended sets like fused multiply-add