Parallel processing performance penalties can be pretty painful. Any developer that has tried to mix code in the single-precision and double-precision pipelines in the coprocessor knows this first hand. For good advice on how to avoid such mistakes, check out the recently published “Intel Xeon Phi Coprocessor Vector Architecture”. The report will help you better understand how SP instructions can come to your rescue, enabling good performance results.
(Read my complete blog on the topic at GoParallel)