Pipeline

Samsung Galaxy Pocket

ARM11

Hisilicon SD5113 (ARM11) 530 MHz, 16-bit DDR2-667, Huawei EchoLife HG8245 GPON Terminal.

ARMv6 architecture.
L1 Data cache = 16 KB. 32 B/line, 4-WAY.
L1 Instruction cache = 16 KB. 32 B/line, 4-WAY.
L1 TLB size = 10 items (Micro-TLB), fully associative.
L2 TLB size = 64 items (Main TLB), 2-WAY.
Single-issue out-of-order-completion CPU.
Dynamic prediction: BTAC (Branch Target Addresses Cache): 128-entry, direct-mapped, 2-bit saturating prediction history scheme. BTAC hits enable branch prediction with zero cycle delay.
Static branch prediction: The processor predicts that all forward conditional branches are not taken and all backward branches are taken.
Return stack: 3-entry circular buffer used for the prediction of procedure calls and procedure returns. Only unconditional procedure returns are predicted.
Hit-under-miss: When an instruction requests data from a cache, if the data is not there, ARM11 treats this as a non-blocking operation. The cache is instructed to get the missing data, then the pipeline execution can continue as long as the next instructions are not dependent on the missing data. Even if the next instruction is another data load, the ARM11 microarchitecture permits this operation if the data is in the cache (i.e. a hit-under-miss). Only if three successive data misses are encountered, will the pipeline stall.

The execution of an ALU or MAC instruction will not be delayed by a waiting LS instruction.

Pipeline

Branch misprediction penalty = 6 cycles.

#	Stage	L/S	Description
1	Fe1		Instruction fetch + dynamic branch prediction
2	Fe2		Instruction fetch + dynamic branch prediction
3	De		Decode + static branch prediction + Return Stack
4	Iss		Unstruction issue + Register read
5	Sh	ADD	Shifter / Address generation
6	ALU	DC1	Main integer operation calculation / First stage of data cache access
7	Sat	DC2	Saturation of integer results / Second stage of data cache access
8	WBex	WBls	Write back

เพิ่มเติม

ARM11 MPCore Processor Technical Reference Manual

1.7. Pipeline stages

Figure 1.2 shows:

the two Fetch stages
a Decode stage
an Issue stage
the four stages of the MP11 CPU integer execution pipeline.

These eight stages make up the MP11 CPU pipeline.

Figure 1.2. MP11 CPU pipeline stages

The pipeline stages are:

Fe1: First stage of instruction fetch and branch prediction.
Fe2: Second stage of instruction fetch and branch prediction.
De: Instruction decode.
Iss: Register read and instruction issue.
Sh: Shifter stage.
ALU: Main integer operation calculation.
Sat: Pipeline stage to enable saturation of integer results.
WBex: Write back of data from the multiply or main execution pipelines.
MAC1: First stage of the multiply-accumulate pipeline.
MAC2: Second stage of the multiply-accumulate pipeline.
MAC3: Third stage of the multiply-accumulate pipeline.
ADD: Address generation stage.
DC1: First stage of Data Cache access.
DC2: Second stage of Data Cache access.
WBls: Write back of data from the Load Store Unit.

By overlapping the various stages of operation, the MP11 CPU maximizes the clock rate achievable to execute each instruction. It delivers a throughput approaching one instruction for each cycle.

The Fetch stages can hold up to four instructions, where branch prediction is performed on instructions ahead of execution of earlier instructions.

The Issue and Decode stages can contain any instruction in parallel with a predicted branch.

The Execute, Memory, and Write stages can contain a predicted branch, an ALU or multiply instruction, a load/store multiple instruction, and a coprocessor instruction in parallel execution.

อ้างอิง:

:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360e/I1002919.html

:http://www.7-cpu.com/cpu/ARM11.html

http://www.gsmarena.com/samsung_galaxy_pocket_s5300-4612.php

ปิยวรรณ บุญพยอม

วันพฤหัสบดีที่ 20 มีนาคม พ.ศ. 2557

Pipeline

Samsung Galaxy Pocket

ARM11

Pipeline

1.7. Pipeline stages