Readme for the Intel® IXP400 Software Linux* Performance Monitoring Module Patch

=============================================================

 

Copyright Notice

 

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS. INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life-saving, life-sustaining, critical control or safety systems, or in nuclear-facility applications.

 

Intel may make changes to specifications and product descriptions at any time, without notice.

 

Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

 

Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details.

 

The Intel IXP400 Software may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

 

This ReadMe as well as the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document.

 

Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.

 

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

 

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site.

 

BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino logo, Core Inside, FlashFile, i960, InstantIP, Intel, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Core, Intel Inside, Intel Inside logo, Intel Leap ahead, Intel Leap ahead logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.

 

*Other names and brands may be claimed as the property of others.

 

Copyright © 2007, Intel Corporation

 

 

July 17, 2007

 

 

<Introduction>

===========

 

This readme contains instructions for installing the Linux* Performance Monitoring Module for the Intel® IXP43X Product Line of Network Processors (Filename: BSD_ixp400PMU-3.0.zip).

 

The BSD_ixp400PMU-3.0.zip file contains the files for Performance Monitoring module for Intel® IXP43X network processors.

 

Note: Performance Monitoring Module is referred as perfProf or perfProfAcc in this document

 

 

<Supported Versions>

=================

 

Intel® IXP400 Software Release v3.0 and Linux kernel 2.6.20

 

 

<Prerequisites>

============

 

<Instructions for Unzip>

==================

 

1.    Unzip BSD_ixp400PMU-3.0.zip under the /src folder

The unzip should have created a perfProf folder and the necessary files should be found under src/. IxPerfProf.h should be added to src/include folder.

 

For codelets you will find perfProf under src/codelets folder.

 

The following files appear under the folder \ixp400_xscale_sw\src\perfProfAcc:

The include file under \ixp400_xscale_sw\src\include is:

The codelet files under \ixp400_xscale_sw\src\codelets\perfProfAcc are:

2.    Modify the Makefile_ixp43x under folder \ixp400_xscale_sw

Modify the Makefile to include perfProfAcc under BI_ENDIAN_COMPONENTS and BI_ENDIAN_CODELETS_COMPONENTS

 

 

<Instructions for building the perfProf module>

==================================

 

(Note: PerfProf is supported only for Linux* Big Endian only)

 

1.    To build the perfProf module use the command:

make module COMP=perfProfAcc IX_EXTRA_WARNINGS=1 IX_TARGET=linuxbe

This will generate the ixp400_perfProfAcc.ko

 

To Build the PerfProf codelets use the command:

make module COMP=codelets_perfProfAcc IX_EXTRA_WARNINGS=1 IX_TARGET=linuxbe

This will generate the ixp400_codelets_perfProfAcc.ko

 

 

<Instructions on Usage>

==================

 

<XScale PMU>

 

<Event/Clock counting>

 

1.        To configure the counters, call the config function with parameters:

        ixPerfProfAccXscalePmuEventCounterConfig (

            BOOL clkCntDiv,

            UINT32 numEvents,

            IxPerfProfAccXscalePmuEvent pmuEvent1,

            IxPerfProfAccXscalePmuEvent pmuEvent2,

            IxPerfProfAccXscalePmuEvent pmuEvent3,

            IxPerfProfAccXscalePmuEvent pmuEvent4 )  

                                                   

2.    To begin counting, call this function at the start of the point where measurements are supposed to begin:    

         ixPerfProfAccXscalePmuEventCountStart (

             UINT32 eventNumber,

             IxPerfProfAccCounterValue *xscalePmuEventCountResults)

 

3.    To end the counting, call the stop function, with parameters:

          ixPerfProfAccXscalePmuEventCountStop (

            UINT32 eventNumber,

            IxPerfProfAccCounterValue  *xscalePmuEventCountResults)

     

Example 1: If you have declared a variable “IxPerfProfAccCounters  xscalePmuEventCountStopResults”, then you can print out the result for all the  counters as follows:

 

       printf("Lower 32 bits of clock count = %u\n",        

                               xscalePmuEventCountResults.clk_value);

       printf("Upper 32 bits of clock count = %u\n",  

                               xscalePmuEventCountResults.clk_samples);

       printf("Lower 32 bits of event %d count = %u\n", eventNumber, 

                               xscalePmuEventCountResults.lower32BitCounterValue);

       printf("Upper 32 bits of event  %d count = %u\n", eventNumber,  

                               xscalePmuEventCountResults.upper32BitCounterValue);

 

 

Note 1: Each start and stop can monitor only one event at a time. To monitor more than one event, use multiple starts/stops simultaneously and select the required events for each.

Note 2: Unlike event and time sampling, event counting for Intel XScale® Processor can support multiple monitoring of the same event, at various locations of the codes.

 

<Time-Based Sampling>

 

1.      To begin the time sampling, call the start function, with parameters:

          ixPerfProfAccXscalePmuTimeSampStart(

            UINT32 2 2 2 2 2 samplingRate,

            BOOL clkCntDiv)

 

2.      To end the time sampling, call the stop function, with parameters:

          ixPerfProfAccXscalePmuTimeSampStop(

            IxPerfProfAccXscalePmuEvtCnt *clkCount,

            IxPerfProfAccXscalePmuSamplePcProfile *timeProfile)

 

Example 2:  If you have declared a pointer “IxPerfProfAccXscalePmuEvtCnt clkCount”, then you can print out the value of the clock counter (which indicates the number of clock cycles that elapsed) as follows: 

 

        printf("\n Lower 32 bits of clock count: 0x%x", clkCount.lower32BitsEventCount);

         printf("\n Upper 32 bits of clock count: 0x%x", clkCount.upper32BitsEventCount);

 

Example 3: If you have declared a pointer to an array “IxPerfProfAccXscalePmuSamplePcProfile timeProfile[IX_PERFPROF_ACC_XSCALE_PMU_MAX_PROFILE_SAMPLES]”, then you can print out the top five PC addresses in the time profile:

 

      i.    Obtain the  number of clkCount samples, which were taken, that is (from Example 2):

              clkSamples = clkCount.upper32BitsEventCount   

 

     ii.    Determine the number of elements in the timeProfile array, which is the number of unique PC addresses by adding up the elements in the array that contain results:

             UINT32 frequency;  /*total number of samples collected*/

             UINT32 counter    /*counter to move through timeProfile array*/;

             UINT32 numPc = 0; /*number of unique PC addresses*/

 

             for (frequency=0; frequency< =clkSamples; frequency+=test_freq)

               {

                  test_freq = timeProfile[counter].freq;

                  numPc ++;

               }

 

     iii.  Print out the first five elements:

             for (i=0; i++; i<5)

               {

                  printf(“timeprofile element %d pc value = 0x%x\n", i, 

                  timeProfile[i].programCounter);

                  printf("timeprofile element %d freq value = %d\n", i,   

                  timeProfile[i].freq);

               }

 

These profile results show those places in your code that are most frequently being executed and that are taking up the most processor cycles.

 

<Event-Based Sampling>

 

1.     To begin the event sampling, call the start function, with parameters:

           ixPerfProfAccXscalePmuEventSampStart(

             UINT32 numEvents,  

             IxPerfProfAccXscalePmuEvent pmuEvent1, UINT32 eventRate1,  

             IxPerfProfAccXscalePmuEvent pmuEvent2, UINT32 eventRate2,  

             IxPerfProfAccXscalePmuEvent pmuEvent3, UINT32 eventRate3,  

             IxPerfProfAccXscalePmuEvent pmuEvent4, UINT32 eventRate4)

 

2.     To end the event sampling, call the stop function, with parameters:

           ixPerfProfAccXscalePmuEventSampStop(

             IxPerfProfAccXscalePmuSamplePcProfile *eventProfile1,

             IxPerfProfAccXscalePmuSamplePcProfile *eventProfile2,

             IxPerfProfAccXscalePmuSamplePcProfile *eventProfile3,

             IxPerfProfAccXscalePmuSamplePcProfile *eventProfile4)

 

<Using Intel XScale processor PMU to determine Cache efficiency>

 

1.     To begin the counting, call the start function with the following parameters:

          ixPerfProfAccXscalePmuEventCounting (

              FALSE,

              2,

              IX_PERFPROF_ACC_XSCALE_PMU_EVENT_INST_EXEC,

              IX_PERFPROF_ACC_XSCALE_PMU_EVENT_CACHE_MISS,

              IX_PERFPROF_ACC_XSCALE_PMU_EVENT_MAX,

              IX_PERFPROF_ACC_XSCALE_PMU_EVENT_MAX)

 

2.     Declare a results variable:

           IxPerfProfAccXscalePmuResults results;

 

3.     To end the counting, call the stop function, with parameters:

           ixPerfProfAccXscalePmuEventCountStop (

             IxPerfProfAccXscalePmuResults &results)

 

4.     Print the total value (combining the upper and lower 32 bits) of all the counters:

        printf(“total clk count = 0x%x%x\n”, results.clk_samples, results.clk_value);

        printf(“total event 1 count = 0x%x%x\n”, results.event1_samples, results.event1_value);

        printf(“total event 2 count = 0x%x%x\n”, results.event2_samples, results.event2_value);

        printf(“total event 3 count = 0x%x%x\n”, results.event3_samples, results.event3_value);

        printf(“total event 4 count = 0x%x%x\n”, results.event4_samples, results.event4_value);

 

 

<Internal Bus PMU>

 

1.     To measure values for a segment of code, call the start function in the beginning of code:

           ixPerfProfAccBusPmuStartCount(IxPerfProfAccBusPmuPECSelect * pPecCtr)

 

The pPecCtr is a pointer to an array of eight elements, which specifies the PECx and MPECx counters that are to be enabled and the events that should be monitored for counting. PECx are eight event counters associated with AHB clock source, and MPECx are counters associated with the MCU clock source. You can start all or the required counters based on the inputs specified in the pointer.

 

The start function must be called only once; calling it multiple times will reset the counters.  

 

2.     To end the measurements for a particular event, call the stop function with the event counter to be stopped

           ixPerfProfAccBusPmuStopCount(IxPerfProfBusPmuPEC PECxEvent)

 

You can stop the event counters by calling the StopCount function required number of times.

 

Refer to section Interface for more details on the structure and its parameters. It is your responsibility to initialize the pPecCtr  pointer before calling this function. 

 

3.     To stop all the event counters simultaneously, call the Stop function:

           ixPerfProfAccBusPmuStop(void)

 

This function will disable the bits in Counter Enable Mode Register to 0, which will stop the event counting.

 

4.     To read the results of the required PECx/MECx Counter, you can use the function:

           ixPerfProfAccBusPmuResultsGet (IxPerfProfAccBusPmuResults *BusPmuResults)

 

The results for all 16 counters are stored in the BusPmuResults structure and can be printed as shown below:

 

           printf("PEC %d lower 27 bit value = %u\n", pecCounter,

                                  BusPmuResults.statsToGetLower27Bit[pecCounter]);

           printf("PEC %d upper 32 bit value = %u\n", pecCounter,

                                  BusPmuResults.statsToGetUpper32Bit[pecCounter]);

 

Note 1:  For the ixPerfProfAccBusPmuPMSRGet() function, you can refer to the codelet for a detailed description.

Note 2: Each start and stop can monitor any one or all of the eight events at a time simultaneously.

 

<Xcycle (Idlecycle counter)>

 

1.   Before creating any other threads, perform calibration and obtain the baseline (that is,the total available cycles in the period of time specified) when there is no load:

           ixPerfProfAccXcycleBaselineRun (UINT32 *numBaselineCycle)

 

2.   Create a thread that runs the code to be monitored.  To begin the Xcycle measurements, call the start function, with parameter:

           ixPerfProfAccXcycleStart(UINT32 numMeasurementsRequested)

 

3.   If ixPerfProfAccXcycleStart() is called with an input of zero, this indicates continuous measurements.  In this case, the measurements are stopped, by calling the stop function:

           ixPerfProfAccXcycleStop(void)

 

      As it takes some time for the measurements to complete, you should call the following function to determine if any measurements are still running:

           ixPerfProfAccXcycleInProgress(void);

 

4.   To obtain the results of the measurements made, you should call the results function with parameter:

           ixPerfProfAccXcycleResultsGet(IxPerfProfAccXcycleResults *xcycleResult)

 

Example 6:  If you have declared a pointer “IxPerfProfAccXcycleResults *xcycleResult”, then you can print out the results of the Xcycle measurements as follows: 

 

           printf("Maximum percentage of idle cycles = %f\n", xcycleResult-

                                           >maxIdlePercentage);

           printf("Minimum percentage of idle cycles = %f\n", xcycleResult-

                                           >minIdlePercentage);

           printf("Average percentage of idle cycles = %f\n", xcycleResult-

                                           >aveIdlePercentage);

           printf("Total number of measurements = %u\n", xcycleResult->totalMeasurements);