Pipeline
Samsung Galaxy Pocket
ARM11
Hisilicon SD5113 (ARM11) 530 MHz, 16-bit DDR2-667, Huawei EchoLife HG8245 GPON Terminal.
- ARMv6 architecture.
- L1 Data cache = 16 KB. 32 B/line, 4-WAY.
- L1 Instruction cache = 16 KB. 32 B/line, 4-WAY.
- L1 TLB size = 10 items (Micro-TLB), fully associative.
- L2 TLB size = 64 items (Main TLB), 2-WAY.
- Single-issue out-of-order-completion CPU.
- Dynamic prediction: BTAC (Branch Target Addresses Cache): 128-entry, direct-mapped, 2-bit saturating prediction history scheme. BTAC hits enable branch prediction with zero cycle delay.
- Static branch prediction: The processor predicts that all forward conditional branches are not taken and all backward branches are taken.
- Return stack: 3-entry circular buffer used for the prediction of procedure calls and procedure returns. Only unconditional procedure returns are predicted.
- Hit-under-miss: When an instruction requests data from a cache, if the data is not there, ARM11 treats this as a non-blocking operation. The cache is instructed to get the missing data, then the pipeline execution can continue as long as the next instructions are not dependent on the missing data. Even if the next instruction is another data load, the ARM11 microarchitecture permits this operation if the data is in the cache (i.e. a hit-under-miss). Only if three successive data misses are encountered, will the pipeline stall.
- The execution of an ALU or MAC instruction will not be delayed by a waiting LS instruction.
Pipeline
Branch misprediction penalty = 6 cycles.
# Stage L/S Description 1 Fe1 Instruction fetch + dynamic branch prediction 2 Fe2 3 De Decode + static branch prediction + Return Stack 4 Iss Unstruction issue + Register read 5 Sh ADD Shifter / Address generation 6 ALU DC1 Main integer operation calculation / First stage of data cache access 7 Sat DC2 Saturation of integer results / Second stage of data cache access 8 WBex WBls Write back
เพิ่มเติม
| |||
Figure 1.2 shows:
- the two Fetch stages
- a Decode stage
- an Issue stage
- the four stages of the MP11 CPU integer execution pipeline.
These eight stages make up the MP11 CPU pipeline.
The pipeline stages are:
- Fe1
- First stage of instruction fetch and branch prediction.
- Fe2
- Second stage of instruction fetch and branch prediction.
- De
- Instruction decode.
- Iss
- Register read and instruction issue.
- Sh
- Shifter stage.
- ALU
- Main integer operation calculation.
- Sat
- Pipeline stage to enable saturation of integer results.
- WBex
- Write back of data from the multiply or main execution pipelines.
- MAC1
- First stage of the multiply-accumulate pipeline.
- MAC2
- Second stage of the multiply-accumulate pipeline.
- MAC3
- Third stage of the multiply-accumulate pipeline.
- ADD
- Address generation stage.
- DC1
- First stage of Data Cache access.
- DC2
- Second stage of Data Cache access.
- WBls
- Write back of data from the Load Store Unit.
By overlapping the various stages of operation, the MP11 CPU maximizes the clock rate achievable to execute each instruction. It delivers a throughput approaching one instruction for each cycle.
The Fetch stages can hold up to four instructions, where branch prediction is performed on instructions ahead of execution of earlier instructions.
The Issue and Decode stages can contain any instruction in parallel with a predicted branch.
The Execute, Memory, and Write stages can contain a predicted branch, an ALU or multiply instruction, a load/store multiple instruction, and a coprocessor instruction in parallel execution.
อ้างอิง:

:http://www.7-cpu.com/cpu/ARM11.html
http://www.gsmarena.com/samsung_galaxy_pocket_s5300-4612.php



ไม่มีความคิดเห็น:
แสดงความคิดเห็น