Floating license usage of IPP....

February 19, 2009, 10:33 am

Latest and popular articles on Intel Technologies

≫ Next: Recommended deployment methods for IPP 6.0

Language English

This is perhaps a pre-sales question, but an answer may be useful to others as well.

I have a use-case where we may need to use a number of floating licenses of IPP, but we'd like to use it with existing Visual Studio 2008 code base where we haven't implemented Intel's compiler..

A few pieces of online documentation hint that the only valid use case of IPP floating licenses is in combination with Intel's compiler..

ergo, if we want a floating license of IPP, but never intend to use Intel's compiler, then we need to buy Intel's compiler anyway?

Thanks..

↧

Recommended deployment methods for IPP 6.0

February 18, 2009, 8:38 pm

Latest and popular articles on Intel Technologies

≫ Next: mkl_set_num_threads() or MKL_Set_Num_Threads()?

≪ Previous: Floating license usage of IPP....

Language English

Hi,

Version 5.3 used to include an installer that installed all required binaries in the system32 folder, and performed all other required updates (such as env variables if I am not mistaken).

However, IPP 6.0 does not include this installer.
I searched in the IPP documentation but failed to find a recommendation for deployment.

- Could you describe your recommended deployment method(s)?
- I would mention I found the installer (which had a silent feature) most convinient for client-site deployment. Why did Intel gave up on it?

Regards, Hagay

↧

mkl_set_num_threads() or MKL_Set_Num_Threads()?

December 1, 2008, 6:51 pm

Latest and popular articles on Intel Technologies

≫ Next: IPP Virtual Machine (VM) component

≪ Previous: Recommended deployment methods for IPP 6.0

Language English

I need to set MKL_NUM_THREADS dynamically in my programm. MKL user manual tells me to use mkl_set_num_threads(), but this function always crash my program. I find another function in MKL header file, MKL_Set_Num_Threads(). When I use the latter one instead, it is OK.

↧

IPP Virtual Machine (VM) component

November 29, 2008, 3:18 pm

Latest and popular articles on Intel Technologies

≫ Next: Linux: IA32: 5.3.1.062 ipp, 5.3.072 samples, gcc4 Compile Failure

≪ Previous: mkl_set_num_threads() or MKL_Set_Num_Threads()?

Language English

I was looking for simple file I/O implementation
for C language, to speed my development.
browsing the IPP samples at:
.\ipp-samples\audio-video-codecs\core\vm\src\
I viewed the vm_file_win.c file.

It is "VM 64-bits buffered file operations libraryWindows implementation"

The implementation is very simple and clear. I would like to find more
info about it, there is no documentation in IPP.

Thanks in advance,

Constantine

↧

Linux: IA32: 5.3.1.062 ipp, 5.3.072 samples, gcc4 Compile Failure

December 20, 2007, 3:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Does IPP support MS GSM 6.10

≪ Previous: IPP Virtual Machine (VM) component

Language English

Hi,
I am getting following error when try to compile 5.3 samples. audio-video-codecs.

Linux: IA32: 5.3.1.062 ipp, 5.3.072 samples, gcc4

src/alsa_audio_render.cpp:58: error: 'snd_pcm_hw_params_set_rate_resample' was not declared in this scope
src/alsa_audio_render.cpp: In member function 'virtual UMC::Status UMC::ALSAAudioRender::SendFrame(UMC::MediaData*)':

thanks,
SAI

↧

Does IPP support MS GSM 6.10

October 24, 2008, 3:22 pm

Latest and popular articles on Intel Technologies

≫ Next: ipView_8u_C3R

≪ Previous: Linux: IA32: 5.3.1.062 ipp, 5.3.072 samples, gcc4 Compile Failure

Language English

Hi,

I downloaded the latest IPP library, v5.3. As for speech codecs, does it support MS GSM 6.10? I built the sample speech codec app, usc_speech_codec, that came with the library. first, i ran the encoder, it seems like the IPP GSMFR output produces pure 260 bit frames. Then, i ran the decoder, it is working fine.

However, if i try to decode MS GSM 6.10(i guess the format is different than standard GSM 6.10 file) file with the sample app, the decoder fails.

can someone help me?

thx

↧

ipView_8u_C3R

October 7, 2008, 1:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Need help with resizing video frame

≪ Previous: Does IPP support MS GSM 6.10

Language English

I'm trying some code sample and found a function "ipView_8u_C3R". I don't know where it is and how to use it to display an image. Anybody please shed?

Regards,

↧

Need help with resizing video frame

September 29, 2008, 12:34 pm

Latest and popular articles on Intel Technologies

≫ Next: Access violation in release code

≪ Previous: ipView_8u_C3R

Language English

I'm capture a YUY2 640x480 video frame from camera in DirectShow. I have a transform filter and all I want to do is downsize the 640x480 frame to 320x240 frame. I have no clue what functions I should use for malloc and the actual downsize. Here's the functions I AM using and am not getting the desired results:

IppiSize roiInputSize = { 640, 480};

IppiSize roiOutputSize = { 320, 240 };

IppiRect roi = {0, 0, 640, 480};

ImageYUY2 = ippiMalloc_8u_C2( roiOutputSize.width, roiOutputSize.height, &stepYUY2 );

↧

Access violation in release code

November 5, 2005, 8:21 pm

Latest and popular articles on Intel Technologies

≫ Next: IPP under QNX

≪ Previous: Need help with resizing video frame

Language English

We are using the IPP JPEG library as a replacement for the IJG JPEG library. The application is experiencing failures on random image files when processing 180,000+ images.

This failure does not occur on the same image file twice. It happens only with the release build of the IPP JPEG library and apparently not with the debug build. At least we have been getting the failure on every run with the release library and not yet with the debug build.

↧

IPP under QNX

October 27, 2004, 8:21 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® MKL Sparse BLAS Overview

≪ Previous: Access violation in release code

Language English

Hi there,

has somebody experiences in running the ipp under QNX? Is there a chance to compile it and get it work?

Greetings ... Guybrush

↧

Intel® MKL Sparse BLAS Overview

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® AVX2 optimization in Intel® MKL

≪ Previous: IPP under QNX

Sparse BLAS routines can be useful to implement iterative methods for solving large sparse systems of equations or eigenvalue problems.

Intel MKL provides Sparse BLAS Level 2 and Level 3 routines with typical (or conventional) interface.

Please find additional details in the overview training material - "Intel MKL Sparse_Blas_overview.pdf" into attachment.

↧

Intel® AVX2 optimization in Intel® MKL

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel® Integrated Performance Primitives (Intel® IPP) Functions Optimized for Intel® Advanced Vector Extensions (Intel® AVX)

≪ Previous: Intel® MKL Sparse BLAS Overview

Haswell is the codename next generation x86 processor micro architecture (tock). This architecture is expected in 2013. Haswell's new instructions accelerate a broad category of applications and usage models. Download the full Intel® Advanced Vector Extensions Programming Reference (319433-011). This new instruction set is built upon the instructions of Intel® microarchitecture code-named Ivy Bridge, including the digital random number generator, half-float (float16) accelerators, and an extended set of Intel® Advanced Vector extensions (Intel® AVX) instructions.
The instructions fit into the following categories:

AVX2 - Integer data types expanded to 256-bit SIMD. AVX2's integer support is particularly useful for processing visual data commonly encountered in consumer imaging and video processing workloads. With Haswell, we have both Intel® Advanced Vector Extensions (Intel® AVX) for floating point data types as well as AVX2 for integer data types.

Bit manipulation instructionsare useful for compressed databases, hashes, large number arithmetic, and a variety of general purpose codes.

Gather instructions are useful for vectorized code that accesses non-adjacent data elements. Haswell gather operations are mask-based for safety (like conditional loads and stores introduced in Intel® AVX). Gather operations are favorable to clip values, to clamp boundaries, or similar conditional operations.

Any-to-Any permutesare incredibly useful shuffle operations. Haswell adds support for DWORD and QWORD granularity and allows to permute across an entire 256-bit register.

Vector-Vector Shifts are added to shift vectors where the amount of shift is controlled by vector. These are critical in vectorized loops with variable shifts.

Floating Point Multiply Accumulate - Our floating-point multiply accumulate significantly increases peak flops and provides improved precision to further improve transcendental mathematics. They are broadly usable in high performance computing, professional quality imaging, and face detection. They operate on scalars, 128-bit packed single and double precision data types, and 256-bit packed single and double-precision data types. [These instructions were described previously, in the initial Intel® AVX specification].

Intel MKL 11.0 is fully supporting AVX2; more optimizations are available in the following functions.

Basic Linear Algebra Subprograms (BLAS)

• xGEMM

• xTRSM

• xTRMM

• xSYRK

• xSYMM

• xHERK

• xHEMM

• xHER2K

LAPACK:

• DGETRF

• DPOTRF

• DGEQRF

Discrete Fourier transform (DFT):

• 1D, power-of-2

• 2D, power-of-2

• 3D, power-of-2

• 1D, non-power-of-2

• 2D, non-power-of-2

• 3D, non-power-of-2

Sparse BLAS

• dcsrmm

• scsrmm

• dcoomm

• scoomm

Vector Statistical Library (VSL)
• MRG32k3a

Reference:

Intel® AVX optimization in Intel® MKL

Haswell New Instruction Descriptions Now Available!

Intel® Advanced Vector Extensions Programming Reference

↧

Intel® Integrated Performance Primitives (Intel® IPP) Functions Optimized for Intel® Advanced Vector Extensions (Intel® AVX)

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: JPEG new threading model in IPP

≪ Previous: Intel® AVX2 optimization in Intel® MKL

The table below reflects the Intel AVX support provided in the Intel IPP 7.0.2 library release.
Intel AVX optimized code is available in both the 32-bit and 64-bit editions of the 7.0 library.
There is very limited support for Intel AVX in the 6.1 library; if you plan to use Intel IPP on an Intel AVX platform you should upgrade to the 7.0 version of the Intel IPP library.

Intel® AVX (Intel® Advanced Vector Extensions) is a 256-bit instruction set extension to SSE designed to provide even higher performance for applications that are floating-point intensive. Intel AVX adds new functionality to the the existing Intel SIMD instruction set (based on SSE) and includes a more compact SIMD encoding format. A large number (200+) of Intel SSEx instructions have been "upgraded" in AVX to take advantage of features like a distinct destination operand and flexible memory alignment. Approximately 100 of the legacy 128-bit Intel SSEx instructions have been promoted to process 256-bit vector data. In addition, approximately 100 new data processing and arithmetic operations, not present in the legacy Intel SSEx SIMD instruction set, have been added.

The primary benefits of Intel AVX are:

Support for wider vector data (up to 256-bit).
Efficient instruction encoding scheme that supports 3 and 4 operand instruction syntaxes.
Flexible programming environment, ranging from branch handling to relaxed memory alignment requirements.
New data manipulation and arithmetic compute primitives, including broadcast, permute, fused-multiply-add, etc.

ippGetCpuFeatures() reports information regarding the SIMD features available to your processor. Alternatively, ippGetCpuType() detects the processor type in your system. A return value of ippCpuAVX means your processor supports the Intel AVX instruction set. These functions are declared in ippcore.h.

Mask the value returned by ippGetCpuFeatures() with ippCPUID_AVX (0x0100) to determine if the Intel AVX SIMD instructions are supported by your processor (ippGetCpuFeatures() & ippCPUID_AVX is TRUE). To determine if your operating system also supports the Intel AVX instructions (saves the extended SIMD registers), mask the returned value from ippGetCpuFeatures() with ippAVX_ENABLEDBYOS (0x0200). Both conditions (i.e., CPU and OS support) must be met before your application can utilize the Intel AVX SIMD instructions.

The Intel IPP library has been optimized for a variety of SIMD instruction sets. Automatic "dispatching" detects the SIMD instruction set that is available on the running processor and selects the optimal SIMD instructions for that processor. Please review Understanding CPU Dispatching in the Intel® IPP Library for more information regarding dispatching.

Intel AVX optimization in the Intel IPP library consists of "hand-optimized" and "compiler-tuned" functions – code that has been directly optimized for the Intel AVX instruction set. Given the large number of primitives in the Intel IPP library, it is impossible to directly optimize every Intel IPP function for the large set of new instructions represented by the Intel AVX instruction set within the period of a single product release or update (processor-specific optimizations may also take into consideration cache size and number of cores/threads). Therefore, the functions in the table below represent those that either receive the greatest benefit from the new Intel AVX instructions or are the most widely used by Intel IPP customers.

If you have some specific Intel IPP functions that are not listed in the following table, and would like to see them added to the priority list for future AVX optimization, please create a thread on the IPP forum stating which functions you would like to see added to the AVX optimization priority list.

Functions directly optimized for Intel AVX are added to the table below as they become available with each new release or update of the library.

The following conventions are used in the table below to allow multiple similar functions to be denoted on a single line:

{x} - Braces enclose a required (function name) element.
[x] - Square brackets enclose an optional (function name) element.
| - A vertical line indicates an exclusive choice within a set of optional or required elements.
{x|y|z} - Example of three mutually exclusive choices within a required element in the function name.
[x|y|z] - Example of three mutually exclusive choices within an optional element in the function name.

Signal Processing

ippsAbs_{16s|32s|32f|64f}[_I] 
ippsAdd_{32f|32fc|64f|64fc}[_I] 
ippsAddC_{32f|64f}[_I] 
ippsAddProductC_32f 
ippsAddProduct_{32fc|64f|64fc} 
ippsAutoCorr_{32f|64f}
ippsConv_32f 
ippsConvert_{8s|8u|16s|16u|32s|64f}32f 
ippsConvert_{32s|32f}64f 
ippsConvert_32f{8s|8u|16s|16u}_Sfs 
ippsConvert_64f32s_Sfs 
ippsCopy_{16s|32s|32f|64f} 
ippsCrossCorr_{32f|64f} 
ippsDFTFwd_CToC_{32f|32fc|64f|64fc} 
ippsDFTFwd_RTo{CCS|Pack|Perm}_{32f|64f} 
ippsDFTInv_CCSToR_{32f|64f} 
ippsDFTInv_CToC_{32f|32fc|64f|64fc} 
ippsDFTInv_{Pack|Perm}ToR_{32f|64f} 
ippsDFTOutOrd{Fwd|Inv}_CToC_{32fc|64fc} 
ippsDiv[C]_32f[_I] 
ippsDotProd_32f64f 
ippsFFTFwd_CToC_{32f|32fc|64f|64fc}[_I] 
ippsFFTFwd_RTo{CCS|Pack|Perm}_{32f|64f}[_I] 
ippsFFTInv_CCSToR_{32f|64f}[_I] 
ippsFFTInv_CToC_{32f|32fc|64f|64fc}[_I] 
ippsFFTInv_{Pack|Perm}ToR_{32f|64f}[_I] 
ippsFIR64f_32f[_I] 
ippsFIR64fc_32fc[_I] 
ippsFIRLMS_32f 
ippsFIR_{32f|32fc|64f|64fc}[_I] 
ippsIIR32fc_16sc_[I]Sfs 
ippsIIR64fc_32fc[_I] 
ippsIIR_32f[_I] 
ippsLShiftC_16s_I 
ippsMagnitude_16sc_Sfs 
ipps{Min|Max}Indx_{32f|64f} 
ippsMul_32fc[_I] 
ippsMul[C]_{32f|32fc|64f|64fc}[_I] 
ippsMulC_64f64s_ISfs 
ipps{Not|Or}_8u 
ippsPhase_{16s|16sc|32sc}_Sfs 
ippsPowerSpectr_{32f|32fc} 
ippsRShiftC_16u_I 
ippsSet_{8u|16s|32s} 
ippsSqr_{8u|16s|16u|16sc}_[I]Sfs 
ippsSqr_{32f|32fc|64f|64fc}[_I] 
ippsSqrt_32f[_I] 
ippsSub_{32f|32fc|64f|64fc}[_I] 
ippsSubC_{32f|32fc|64f|64fc}[_I] 
ippsSubCRev_{32f|32fc|64f|64fc}[_I] 
ippsSum_{32f|64f} 
ippsThreshold_{32f|GT_32f|LT_32f}_[_I] 
ippsThreshold_{GT|LT}Abs_{32f|64f}[_I] 
ippsThreshold_GTVal_32f[_I] 
ippsWinBartlett_{32f|32fc|64f|64fc}[_I] 
ippsWinBlackman_{32f|64f|64fc}[_I] 
ippsWinBlackmanOpt_{32f|64f|64fc}[_I] 
ippsWinBlackmanStd_{32f|64f|64fc}[_I] 
ippsWinKaiser_{32f|64f|64fc}[_I] 
ippsZero_{8u|16s|32f}

SPIRAL (GEN) Functions

ippgDFTFwd_CToC_8_64fc ippgDFTFwd_CToC_12_64fc 
ippgDFTFwd_CToC_16_{32fc|64fc}
ippgDFTFwd_CToC_20_64fc
ippgDFTFwd_CToC_24_64fc
ippgDFTFwd_CToC_28_64fc 
ippgDFTFwd_CToC_32_{32fc|64fc}
ippgDFTFwd_CToC_36_64fc
ippgDFTFwd_CToC_40_64fc
ippgDFTFwd_CToC_44_64fc 
ippgDFTFwd_CToC_48_{32fc|64fc}
ippgDFTFwd_CToC_52_64fc 
ippgDFTFwd_CToC_56_64fc 
ippgDFTFwd_CToC_60_64fc 
ippgDFTFwd_CToC_64_{32fc|64fc} 
ippgDFTInv_CToC_8_64fc 
ippgDFTInv_CToC_12_64fc 
ippgDFTInv_CToC_16_{32fc|64fc} 
ippgDFTInv_CToC_20_64fc 
ippgDFTInv_CToC_24_64fc 
ippgDFTInv_CToC_28_64fc 
ippgDFTInv_CToC_32_{32fc|64fc} 
ippgDFTInv_CToC_36_64fc 
ippgDFTInv_CToC_40_64fc 
ippgDFTInv_CToC_44_64fc 
ippgDFTInv_CToC_48_{32fc|64fc} 
ippgDFTInv_CToC_52_64fc 
ippgDFTInv_CToC_56_64fc 
ippgDFTInv_CToC_60_64fc 
ippgDFTInv_CToC_64_{32fc|64fc}

Audio Coding

iippsDeinterleave_32f

Speech Coding

ippsAdaptiveCodebookSearch_RTA_32f
ippsFixedCodebookSearch_RTA_32f
ippsFixedCodebookSearchRandom_RTA_32f
ippsHighPassFilter_RTA_32f
ippsLSPQuant_RTA_32f
ippsLSPToLPC_RTA_32f
ippsPostFilter_RTA_32f_I
ippsQMFDecode_RTA_32f
ippsSynthesisFilter_G729_32f

Color Conversion

ippiRGBToHLS_8u_AC4R
ippiRGBToHLS_8u_C3R

Realistic Rendering

ipprCastEye_32f
ipprCastShadowSO_32f
ipprDot_32f_P3C1M
ipprHitPoint3DEpsM0_32f_M
ipprHitPoint3DEpsS0_32f_M
ipprMul_32f_C1P3IM

Computer Vision

ippiEigenValsVecs_[8u]32f_C1R 
ippiFilterGaussBorder_32f_C1R 
ippiMinEigenVal_[8u]32f_C1R 
ippiNorm_Inf_{8u|8s|16u|32f}_C{1|3C}MR 
ippiNorm_L1_{8u|8s|16u|32f}_C{1|3C}MR 
ippiNorm_L2_{8u|8s|16u|32f}_C{1|3C}MR 
ippiNormRel_L2_32f_C3CMR 
ippiUpdateMotionHistory_[8u|16u]32f_C1IR

Image Processing

ippiAddC_32f_C1[I]R 
ippiConvert_32f* 
ippiCopy_16s* 
ippiCopy_8u* 
ippiConvFull_32f_{AC4|C1|C3}R 
ippiConvValid_32f_{AC4|C1|C3}R 
ippiCrossCorrFull_NormLevel_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_64f_C1R 
ippiCrossCorrFull_NormLevel_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrFull_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrSame_NormLevel_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrSame_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrValid_{8u32f|8s32f|16u32f|32f}_C1R 
ippiCrossCorrValid_NormLevel_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_64f_C1R 
ippiCrossCorrValid_NormLevel_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrValid_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiDCT8x8FwdLS_8u16s_C1R 
ippiDCT8x8Fwd_16s_C1[I|R] 
ippiDCT8x8Fwd_32f_C1[I] 
ippiDCT8x8Fwd_8u16s_C1R 
ippiDCT8x8InvLSClip_16s8u_C1R 
ippiDCT8x8Inv_16s8u_C1R 
ippiDCT8x8Inv_16s_C1[I|R] 
ippiDCT8x8Inv_2x2_16s_C1[I] 
ippiDCT8x8Inv_32f_C1[I] 
ippiDCT8x8Inv_4x4_16s_C1[I] 
ippiDCT8x8Inv_A10_16s_C1[I] 
ippiDCT8x8To2x2Inv_16s_C1[I] 
ippiDCT8x8To4x4Inv_16s_C1[I] 
ippiDFTFwd_CToC_32fc_C1[I]R 
ippiDFTFwd_RToPack_32f_{AC4|C1|C3|C4}[I]R 
ippiDFTFwd_RToPack_8u32s_{AC4|C1|C3|C4}RSfs 
ippiDFTInv_CToC_32fc_C1[I]R 
ippiDFTInv_PackToR_32f_{AC4|C1|C3|C4}[I]R 
ippiDFTInv_PackToR_32s8u_{AC4|C1|C3|C4}RSfs 
ippiDilate3x3_32f_C1[I]R 
ippiDilate3x3_64f_C1R 
ippiDivC_32f_C1[I]R 
ippiDiv_32f_{C1|C3}[I]R 
ippiDotProd_32f64f_{C1|C3}R 
ippiErode3x3_64f_C1R 
ippiFFTFwd_CToC_32fc_C1[I]R 
ippiFFTFwd_RToPack_32f_{AC4|C1|C3|C4}[I]R 
ippiFFTFwd_RToPack_8u32s_{AC4|C1|C3|C4}RSfs 
ippiFFTInv_CToC_32fc_C1[I]R 
ippiFFTInv_PackToR_32f_{AC4|C1|C3|C4}[I]R 
ippiFFTInv_PackToR_32s8u_{AC4|C1|C3|C4}RSfs 
ippiFilter_32f_{C1|C3|C4}R 
ippiFilter_32f_AC4R 
ippiFilter_64f_{C1|C3}R 
ippiFilter32f_{8s|8u|16s|16u|32s}_C{1|3|4}R 
ippiFilter32f_{8u|16s|16u}_AC4R 
ippiFilter32f_{8s|8u}16s_C{1|3|4}R 
ippiFilterBox_8u_{C1|C3}R 
ippiFilterBox_32f_{C1|C4|AC4}R 
ippiFilterColumn32f_{8u|16s|16u}_{C1|C3|C4|AC4}R 
ippiFilterColumn_32f_{C1|C3|C4|AC4}R 
ippiFilterGauss_32f_{C1|C3}R 
ippiFilterHipass_32f_{C1|C3|C4|AC4}R 
ippiFilterLaplace_32f_{C1|C3|C4|AC4}R 
ippiFilterLowpass_32f_{C1|C3|AC4}R 
ippiFilterMax_32f_{C1|C3|C4|AC4}R 
ippiFilterMedian_32f_C1R 
ippiFilterMin_32f_{C1|C3|C4|AC4}R 
ippiFilterRow_32f_{C1|C3|C4|AC4}R 
ippiFilterRow32f_{8u|16s|16u}_{C1|C3|C4|AC4}R 
ippiFilterSobelHoriz_32f_{C1|C3}R 
ippiFilterSobelVert_32f_{C1|C3}R 
ippiMean_32f_{C1|C3}R 
ippiMulC_32f_C1[I]R 
ippiMul_32f_{C1|C3|C4}[I]R 
ippiResizeSqrPixel_{32f|64f}_{C1|C3|C4|AC4}R 
ippiResizeSqrPixel_{32f|64f}_{P3|P4}R 
ippiSqrDistanceFull_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiSqrDistanceSame_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiSqrDistanceValid_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiSqrt_32f_C1R 
ippiSqrt_32f_C3IR 
ippiSubC_32f_C1[I]R 
ippiSub_32f_{C1|C3|C4}[I]R 
ippiSum_32f_C{1|3}R 
ippiTranspose_32f_C1R

Image Compression

ippiPCTFwd_JPEGXR_32f_C1IR 
ippiPCTFwd16x16_JPEGXR_32f_C1IR 
ippiPCTFwd8x16_JPEGXR_32f_C1IR 
ippiPCTFwd8x8_JPEGXR_32f_C1IR 
ippiPCTInv_JPEGXR_32f_C1IR_128 
ippiPCTInv16x16_JPEGXR_32f_C1IR 
ippiPCTInv8x16_JPEGXR_32f_C1IR 
ippiPCTInv8x8_JPEGXR_32f_C1IR

Those functions that have not been directly optimized for AVX (i.e., functions that do not appear in the table) have been compiled using the Intel Compiler "xG" switch (enable AVX optimization). Additional performance improvements are achieved by adherence to an AVX ABI (application binary interface) feature that inserts the special AVX "vzeroupper" instruction after any function with AVX code to eliminate any AVX to SSE transition penalties.

For those functions that are not directly optimized for AVX, the g9/e9 library utilizes optimizations from prior compatible SSE optimizations, such as those tuned for the p8/y8 libraries and preceding SIMD optimizations (e.g., SSE4.x, AES-NI and SSE2/3). Thus, functions not listed above will include the highest level of directly optimized code based on the AES-NI, SSE4.x, SSSE3, SSE3 and SSE2 SIMD instruction sets, wherever applicable.

For more information about the g9/e9 optimization layer and Intel AVX in the Intel IPP library, please refer to the Intel Integrated Performance Primitives for Windows* OS on Intel® 64 Architecture 'User's Guide'.

Review How to Compile for Intel® AVX for more information and check out the Intel Parallel Studio web site where you can learn more about the tools available to develop, debug, and tune your multi-threaded applications.

Optimization Notice in English

↧

JPEG new threading model in IPP

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: India Webinar Series - Intel Parallel Studio XE 2013

≪ Previous: Intel® Integrated Performance Primitives (Intel® IPP) Functions Optimized for Intel® Advanced Vector Extensions (Intel® AVX)

JPEG old threading model - IPP 6.1 and earlier

Based on parallel processing of one row of MCUs(Minimum Coded Units) by each thread
Each thread perform JPEG actions under own MCU row (CC, SS, DCT) in parallel, except VLC(Variable Length Coding) step. (CC-Color Conversion;SS-Sub Sampling;DCT-Discrete Cosine Transform)
VLC can be done only in serial manner due to data dependency of MCU blocks for this operation
- VLC is main challenge in parallel JPEG processing

JPEG new threading model in IPP 7.0

JPEG standard allow to split data stream to Independently processed segments called Restart Intervals (RSTI).
- Each restart interval contain a fixed number of MCUs
- RSTI separated by restart markers (RSTm)
New threading model based on parallel processing of this RSTI
- Using RSTI allow to resolve main bottleneck of old model - existence of serial part in JPEG pipeline the VLC.
- It can be achieved due to main property of RSTI - independency of MCU blocks of one RSTI from MCU blocks another RSTI.
- This property allow to do all JPEG operation - CC, SS, DCT and VLC - for each RSTI by threads in parallel

Performance Comparison (IPP 7.0 vs. 6.1)

Optimization Notice in English

↧

India Webinar Series - Intel Parallel Studio XE 2013

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Building OpenCV based embedded application using Intel® System Studio

≪ Previous: JPEG new threading model in IPP

Title : What's New in Intel^® Vtune Amplifier XE 2013

Abstract: Intel^® VTune™ Amplifier XE 2013 is the premier profiler for C, C++, C#, FORTRAN, Assembly and Java*. This presentation/demo will help you to understand new features like Call counts, hardware stack sampling, better bandwidth analysis, Java profiling, tune Intel^® Xeon Phi™ products, user tasks, DirectX* frames, power analysis and more. This session is intended for users who have some experience with Intel^® Vtune Amplifier XE 2011, but new users should also benefit from the presentation.

Title: Using the Intel^® Math Kernel Library 11.0 and Compiler to Obtain Run-to-Run Reproducible Results

Abstract: Floating-point intensive applications-from Hollywood to Wall Street-have for years been challenged to provide both great performance and exactly the same results from run to run. Factors such as run-time selectable and optimized code paths, non-deterministic parallelism, array alignment and even the underlying operating system may influence the results computed. Intel^® Math Kernel Library (Intel^® MKL) 11.0 now includes features that can help users obtain Conditional Numerical Reproducibility (CNR) results when calling Intel MKL functions from their applications. This presentation will educate Intel software tool users and programmers on how to use Intel MKL and the Intel compiler to balance performance with the reproducible results their applications require.

↧

Building OpenCV based embedded application using Intel® System Studio

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Memory profiling techniques using Intel System Studio

≪ Previous: India Webinar Series - Intel Parallel Studio XE 2013

To Download this article : Building OpenCV using Intel System Studio.pdf(344.97 KB)

Introduction

We describe how to use Intel® System Studio to build the OpenCV* based embedded application on Intel platforms. In this paper, we have considered a sample code that is part of OpenCV, how to use different components of Intel® System Studio to build OpenCV sample code.

OpenCV is the most prominent computer vision library and many embedded applications are built using features of OpenCV. It is filled with a lot of features for performing tasks like image registration, tracking, classification and much more. An easy way to improve OpenCV application performance on Intel Architecture is to re build using components of Intel® System Studio like Intel® C++ compiler and Intel® IPP.

Intel® System Studio

Intel® System Studio a new comprehensive integrated tool suite provides developers with advanced system tools and technologies that help accelerate the delivery of the next generation power efficient, high performance, and reliable embedded and mobile devices.

To get more information about Intel® System Studio – http://software.intel.com/en-us/intel-system-studio

Intel® C++ Compiler:

Intel® C++ Compiler delivers outstanding performance for your applications as they run on systems using Intel® Atom or Intel® Core™ or Xeon® processors and IA-compatible processors. The Intel® C++ Compiler can generate code for IA-32, Intel® 64, and Intel® Many Integrated Core Architecture (Intel® MIC Architecture) applications on Intel®-based Linux* system. IA-32 architecture applications (32-bit) can run on all Intel®-based Linux systems. Intel® 64 architecture applications can run only on Intel® 64 architecture-based Linux systems. You can use the compiler on the command line or in the Eclipse* integrated development environment.

Some of important key features are

Parallel C/C++ language extension support
Auto-vectorization taking advantage of Intel® AVX instruction
SSSE3 for Intel® Atom™ Processor targeted applications
Compatibility with existing GNU* Compiler generated code base
Cross-build support for 1.2, and CE Linux* PR28

Refer to Intel C++ compiler 13.0 reference manual for more information about new features.

http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/index.htm

Intel® Integrated Performance Primitives (Intel® IPP)

Intel® IPP is an extensive library of multicore-ready, highly optimized software functions for digital media and data-processing applications. Intel® IPP offers thousands of optimized functions covering frequently-used fundamental algorithms. Intel® IPP functions are designed to deliver performance beyond what optimized compilers alone can deliver. Intel® IPP software building blocks are highly optimized using SSE and Intel® AVX instruction sets so your application will perform faster than what an optimized compiler can produce alone. More information about Intel® IPP can be retrieved at http://software.intel.com/en-us/intel-ipp

Building OpenCV based application using Intel® System Studio

To take benefit of Intel® System Studio components, step by step approach will be used.

Download and install Intel® System Studio
Configure Intel® Compiler and Intel® IPP
Build OpenCv application with Intel® C++ Compiler
Modify 'cmake' file to enable Intel® IPP
Build OpenCV with Intel® Compiler and Intel® IPP
Use Intel® Inspector 2013 for Systems to find any memory and threading issues
Power analysis using Intel® VTune Amplifier 2013 for Systems

In the paper we will be focusing on only building OpenCV sample with Intel® compiler and Intel® IPP (up to step 5).

Step 1: Download and install Intel® System Studio

Download Intel® System Studio from – http://software.intel.com/en-us/intel-system-studio

The default installation directory:

/opt/intel/ system_studio_2013.0.xxxx /

Step 2: Configure Intel® C++ Compiler and Intel® IPP

Set the environment variables for a terminal window using one of the following (replace "intel64" with "ia32" if you are using a 32-bit platform).

For csh/tcsh:

$ source  /opt/intel/system_studio_2013.0.xxxx/bin/iccvars.csh intel64

For bash:

$ source  /opt/intel/Intel_system_studio_2013.0.xxxx/bin/iccvars.sh intel64

To invoke the installed compilers:

For C++: icpc

For C: icc

Step 3: Build OpenCv sample with Intel® C++ Compiler

icc morophology2.cpp  `pkg-config  --cflags  --libs  opencv`  -lm  -lstdc++

Step 4: Modify 'OpenCVFindIPP.cmake' file to enable Intel® IPP

Original file:

  		if (IPP_X64)
	      if(NOT EXISTS ${IPP_ROOT_DIR}/../intel64)
	         message(SEND_ERROR "IPP EM64T libraries not found")
	       endif()
	    else()
	       if(NOT EXISTS ${IPP_ROOT_DIR}../ia32)
	         message(SEND_ERROR "IPP IA32 libraries not found")
	       endif()
	    endif()

Modified file:

	   if (IPP_X64)
	      if(NOT EXISTS ${IPP_ROOT_DIR}/lib/intel64)
	         message(SEND_ERROR "IPP EM64T libraries not found")
	       endif()
	    else()
	       if(NOT EXISTS ${IPP_ROOT_DIR}/lib/ia32)
	         message(SEND_ERROR "IPP IA32 libraries not found")
	       endif()
	    endif()

Run 'cmake' command as follows to verify Intel® IPP support

$ cmake  -D  CMAKE_BUILD_TYPE=RELEASE  /usr/OpenCv-2.4.3

You will get third party libraries support information as below

	Other third-party libraries:
	 
	Use IPP      NO
	 
	Use TBB      NO
	 
	Use Cuda     NO
	 
	Use OpenCL    NO
	 
	Use Eigen    NO

To switch 'ON' IPP support, run following command

$ cmake -D WITH_IPP=ON

You will get third party libraries support information as below

Finally type "make" and it should start compiling with IPP.

Future work

Using Intel® Inspector 2013 for Systems and Intel® VTune Amplifier 2013 for Systems to analyze Intel® IPP integrated OpenCV based samples (steps which are mentioned in Building OpenCV based application using Intel® System Studio)

Step 6: Use Intel® Inspector 2013 for Systems to find any memory and threading issues

Step 7: Power analysis using Intel® VTune Amplifier 2013 for Systems

Summary

Intel® System Studio is a comprehensive and integrated tool suite that provides developers with advanced system tools and technologies to help accelerate the delivery of the next generation power efficient, high performance, and reliable embedded and mobile devices.

In this paper, we showed how to build OpenCV based sample using Intel® Compiler and Intel® IPP.

Intel® C++ compiler supports vectorization, which can generate Streaming SIMD Extensions (SSE) instructions. Use of such instructions through the compiler can lead to improved application performance on Intel architectures. Intel IPP library provides low-level, but high performance basic image and video processing functions. It is better to integrate Intel® IPP and OpenCV via OpenCV build process and the Have_IPP flag. Thus we can benefit Intel® IPP integrated into the OpenCV library automatically. In regard to the performance of routines with the same functionality in Intel® IPP and OpenCV, the direct Intel® IPP function call has the best performance.

Reference

http://software.intel.com/en-us/blogs/2010/12/29/using-the-intel-ipp-library-in-an-embedded-system-on-non-standard-operating-systems

↧

Memory profiling techniques using Intel System Studio

July 16, 2013, 10:49 pm

Latest and popular articles on Intel Technologies

≫ Next: What Host/Targeted platforms are supported in Intel® System Studio?

≪ Previous: Building OpenCV based embedded application using Intel® System Studio

To Download this article :- memory-profiling-using-intel-system-studio.pdf (344.62 KB)

Introduction

One of the problems with developing embedded systems is the detection of memory errors; like

Memory leaks
Memory corruption
Allocation / de-allocation API mismatches
Inconsistent memory API usage etc.

These memory errors degrade performance of any embedded systems. Designing and programming an embedded application requires great care. The application must be robust enough to handle every possible error that can occur; care should be taken to anticipate these errors and handle them accordingly—especially in the area of memory.

In this article we have described how to use Intel® System Studio to find dynamic and static memory issues in any embedded application. The two approaches are complementary because no single approach can find all memory error.

Intel® System Studio

To get more information about Intel® System Studio – http://software.intel.com/en-us/intel-system-studio

Dynamic Memory Analysis

Dynamic memory analysis is the testing and evaluation of an embedded application for any memory errors during runtime.

Advantage of dynamic memory analysis: Dynamic memory analysis is the analysis of an application that is performed by executing application. For dynamic memory analysis to be effective, the target program must be executed with sufficient test inputs to analyze entire program.

Intel® Inspector 2013 for Systems

The Intel® Inspector 2013 for Systems helps developers identify and resolve memory and threading correctness issues in their unmanaged C, C++ and Fortran programs as well as in the unmanaged portion of mixed managed and unmanaged programs. Additionally the tool identifies threading correctness issues in managed .NET C# programs.

Installation, Configure and Build

You need to follow certain steps to run Inspector 2013 for Systems on an embedded platform. Please refer to article - How to use Intel® Inspector 2013 for Systems

Intel® Inspector 2013 for Systems will currently identifies following type of dynamic memory problems.

Problem Type	Description
Incorrect memcpy call	When an application calls the memcpy function with two pointers that overlap within the range to be copied.
Invalid deallocation	When an application calls a deallocation function with an address that does not correspond to dynamically allocated memory.
Invalid memory access	When a read or write instruction references memory that is logically or physically invalid.
Invalid partial memory access	When a read or write instruction references a block (2-bytes or more) of memory where part of the block is logically invalid.
Memory growth	When a block of memory is allocated but not deallocated within a specific time segment during application execution.
Memory leak	When a block of memory is allocated, never deallocated, and not reachable at application exit (there is no pointer available to deallocate the block).
Memory not deallocated	When a block of memory is allocated, never deallocated, but still reachable at application exit (there is a pointer available to deallocate the block).
Mismatched allocation/deallocation	When a deallocation is attempted with a function that is not the logical reflection of the allocator used.
Missing allocation	When an invalid pointer is passed to a deallocation function. The invalid address may point to a previously released heap block.
Uninitialized memory access	When a read of an uninitialized memory location is reported.
Uninitialized partial memory access	When a read instruction references a block (2-bytes or more) of memory where part of the block is uninitialized.
Cross-thread stack access	When a thread accesses a different thread's stack

To get more information about each type of dynamic memory problem, sample code and Possible correction strategies, please refer to Intel® Inspector 2013 for systems reference manual (Problem Type reference section).

Static Memory Analysis

Static memory analysis is the testing and evaluation of an application by examining the code without executing the application.

Advantage of static memory analysis: It examines all possible execution paths and variable values, not just those invoked during execution. Thus static memory analysis can reveal memory errors that may not manifest themselves until years after release. This aspect of static memory analysis is especially valuable in security assurance.

Intel® C++ Compiler:

Intel compiler delivers outstanding performance for your applications as they run on systems using Intel® Atom or Intel® Core™ or Xeon® processors and IA-compatible processors. The Intel® C++ Compiler can generate code for IA-32, Intel® 64, and Intel® Many Integrated Core Architecture (Intel® MIC Architecture) applications on Intel®-based Linux* system.

Configure and Build

Intel® Static Analysis can be enabled with Intel® C++ compiler using the Compiler option “-diag-enable sc [n]”. To get more information refer to article – ‘Developing secured embedded applications using Intel® System Studio’

Problem Type	Description
Bounds violation	An attempt to access outside the bounds of a variable (usually an array) was found. Bounds violations can corrupt memory or read from uninitialized data, leading to unpredictable behavior.
Object size overflow	Buffer overflow error at block assignment operation.This error indicates that the destination size is too small to accommodate the data being moved to the destination.
Buffer overflow through pointer	A memory write through a pointer creates a buffer overflow.
Double free	Dynamically allocated storage, for example from new or malloc, must be freed only once. Freeing the same data twice can corrupt the heap.
Insufficient memory allocation	The size of allocated memory is less than the size of the pointed-to type of the pointer to which it is assigned. Usually, this error results from an incorrect size computation. This error often leads to a subsequent bounds violation.
Reference to freed storage	Memory is accessed after it has been deallocated.
Uses address after free	Use of pointer to deallocated storage.
Incorrect memory deallocation	An improper value was passed to a memory deallocation routine. This error indicates that the value being passed to a memory deallocation routine did not come from a call to the matching memory allocation routine.
Unchecked memory allocation	A pointer to allocated memory was not checked for null before dereference.