วันพฤหัสบดีที่ 20 มีนาคม พ.ศ. 2557

Pipeline


Samsung Galaxy Pocket

ARM11

Hisilicon SD5113 (ARM11) 530 MHz, 16-bit DDR2-667, Huawei EchoLife HG8245 GPON Terminal.
  • ARMv6 architecture.
  • L1 Data cache = 16 KB. 32 B/line, 4-WAY.
  • L1 Instruction cache = 16 KB. 32 B/line, 4-WAY.
  • L1 TLB size = 10 items (Micro-TLB), fully associative.
  • L2 TLB size = 64 items (Main TLB), 2-WAY.
  • Single-issue out-of-order-completion CPU.
  • Dynamic prediction: BTAC (Branch Target Addresses Cache): 128-entry, direct-mapped, 2-bit saturating prediction history scheme. BTAC hits enable branch prediction with zero cycle delay.
  • Static branch prediction: The processor predicts that all forward conditional branches are not taken and all backward branches are taken.
  • Return stack: 3-entry circular buffer used for the prediction of procedure calls and procedure returns. Only unconditional procedure returns are predicted.
  • Hit-under-miss: When an instruction requests data from a cache, if the data is not there, ARM11 treats this as a non-blocking operation. The cache is instructed to get the missing data, then the pipeline execution can continue as long as the next instructions are not dependent on the missing data. Even if the next instruction is another data load, the ARM11 microarchitecture permits this operation if the data is in the cache (i.e. a hit-under-miss). Only if three successive data misses are encountered, will the pipeline stall.
  • The execution of an ALU or MAC instruction will not be delayed by a waiting LS instruction.

    Pipeline

    Branch misprediction penalty = 6 cycles.
    #StageL/SDescription
    1Fe1Instruction fetch + dynamic branch prediction
    2Fe2
    3DeDecode + static branch prediction + Return Stack
    4IssUnstruction issue + Register read
    5ShADDShifter / Address generation
    6ALUDC1Main integer operation calculation / First stage of data cache access
    7SatDC2Saturation of integer results / Second stage of data cache access
    8WBexWBlsWrite back

เพิ่มเติม

1.7. Pipeline stages

Figure 1.2 shows:
  • the two Fetch stages
  • a Decode stage
  • an Issue stage
  • the four stages of the MP11 CPU integer execution pipeline.
These eight stages make up the MP11 CPU pipeline.
Figure 1.2. MP11 CPU pipeline stages

The pipeline stages are:
Fe1
First stage of instruction fetch and branch prediction.
Fe2
Second stage of instruction fetch and branch prediction.
De
Instruction decode.
Iss
Register read and instruction issue.
Sh
Shifter stage.
ALU
Main integer operation calculation.
Sat
Pipeline stage to enable saturation of integer results.
WBex
Write back of data from the multiply or main execution pipelines.
MAC1
First stage of the multiply-accumulate pipeline.
MAC2
Second stage of the multiply-accumulate pipeline.
MAC3
Third stage of the multiply-accumulate pipeline.
ADD
Address generation stage.
DC1
First stage of Data Cache access.
DC2
Second stage of Data Cache access.
WBls
Write back of data from the Load Store Unit.
By overlapping the various stages of operation, the MP11 CPU maximizes the clock rate achievable to execute each instruction. It delivers a throughput approaching one instruction for each cycle.
The Fetch stages can hold up to four instructions, where branch prediction is performed on instructions ahead of execution of earlier instructions.
The Issue and Decode stages can contain any instruction in parallel with a predicted branch.
The Execute, Memory, and Write stages can contain a predicted branch, an ALU or multiply instruction, a load/store multiple instruction, and a coprocessor instruction in parallel execution.

อ้างอิง:
Architecture for the Digital World:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360e/I1002919.html
:http://www.7-cpu.com/cpu/ARM11.html
http://www.gsmarena.com/samsung_galaxy_pocket_s5300-4612.php