aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--tcg/README119
-rw-r--r--tcg/TODO31
2 files changed, 64 insertions, 86 deletions
diff --git a/tcg/README b/tcg/README
index 9764c03ff3..b03432e23a 100644
--- a/tcg/README
+++ b/tcg/README
@@ -16,14 +16,18 @@ from the host, although it is never the case for QEMU.
A TCG "function" corresponds to a QEMU Translated Block (TB).
-A TCG "temporary" is a variable only live in a given
-function. Temporaries are allocated explicitly in each function.
+A TCG "temporary" is a variable only live in a basic
+block. Temporaries are allocated explicitly in each function.
-A TCG "global" is a variable which is live in all the functions. They
-are defined before the functions defined. A TCG global can be a memory
-location (e.g. a QEMU CPU register), a fixed host register (e.g. the
-QEMU CPU state pointer) or a memory location which is stored in a
-register outside QEMU TBs (not implemented yet).
+A TCG "local temporary" is a variable only live in a function. Local
+temporaries are allocated explicitly in each function.
+
+A TCG "global" is a variable which is live in all the functions
+(equivalent of a C global variable). They are defined before the
+functions defined. A TCG global can be a memory location (e.g. a QEMU
+CPU register), a fixed host register (e.g. the QEMU CPU state pointer)
+or a memory location which is stored in a register outside QEMU TBs
+(not implemented yet).
A TCG "basic block" corresponds to a list of instructions terminated
by a branch instruction.
@@ -32,11 +36,11 @@ by a branch instruction.
3.1) Introduction
-TCG instructions operate on variables which are temporaries or
-globals. TCG instructions and variables are strongly typed. Two types
-are supported: 32 bit integers and 64 bit integers. Pointers are
-defined as an alias to 32 bit or 64 bit integers depending on the TCG
-target word size.
+TCG instructions operate on variables which are temporaries, local
+temporaries or globals. TCG instructions and variables are strongly
+typed. Two types are supported: 32 bit integers and 64 bit
+integers. Pointers are defined as an alias to 32 bit or 64 bit
+integers depending on the TCG target word size.
Each instruction has a fixed number of output variable operands, input
variable operands and always constant operands.
@@ -44,14 +48,12 @@ variable operands and always constant operands.
The notable exception is the call instruction which has a variable
number of outputs and inputs.
-In the textual form, output operands come first, followed by input
-operands, followed by constant operands. The output type is included
-in the instruction name. Constants are prefixed with a '$'.
+In the textual form, output operands usually come first, followed by
+input operands, followed by constant operands. The output type is
+included in the instruction name. Constants are prefixed with a '$'.
add_i32 t0, t1, t2 (t0 <- t1 + t2)
-sub_i64 t2, t3, $4 (t2 <- t3 - 4)
-
3.2) Assumptions
* Basic blocks
@@ -62,9 +64,8 @@ sub_i64 t2, t3, $4 (t2 <- t3 - 4)
- Basic blocks start after the end of a previous basic block, at a
set_label instruction or after a legacy dyngen operation.
-After the end of a basic block, temporaries at destroyed and globals
-are stored at their initial storage (register or memory place
-depending on their declarations).
+After the end of a basic block, the content of temporaries is
+destroyed, but local temporaries and globals are preserved.
* Floating point types are not supported yet
@@ -100,7 +101,7 @@ optimizations:
is suppressed.
- A liveness analysis is done at the basic block level. The
- information is used to suppress moves from a dead temporary to
+ information is used to suppress moves from a dead variable to
another one. It is also used to remove instructions which compute
dead results. The later is especially useful for condition code
optimization in QEMU.
@@ -113,47 +114,6 @@ optimizations:
only the last instruction is kept.
-- A macro system is supported (may get closer to function inlining
- some day). It is useful if the liveness analysis is likely to prove
- that some results of a computation are indeed not useful. With the
- macro system, the user can provide several alternative
- implementations which are used depending on the used results. It is
- especially useful for condition code optimization in QEMU.
-
- Here is an example:
-
- macro_2 t0, t1, $1
- mov_i32 t0, $0x1234
-
- The macro identified by the ID "$1" normally returns the values t0
- and t1. Suppose its implementation is:
-
- macro_start
- brcond_i32 t2, $0, $TCG_COND_EQ, $1
- mov_i32 t0, $2
- br $2
- set_label $1
- mov_i32 t0, $3
- set_label $2
- add_i32 t1, t3, t4
- macro_end
-
- If t0 is not used after the macro, the user can provide a simpler
- implementation:
-
- macro_start
- add_i32 t1, t2, t4
- macro_end
-
- TCG automatically chooses the right implementation depending on
- which macro outputs are used after it.
-
- Note that if TCG did more expensive optimizations, macros would be
- less useful. In the previous example a macro is useful because the
- liveness analysis is done on each basic block separately. Hence TCG
- cannot remove the code computing 't0' even if it is not used after
- the first macro implementation.
-
3.4) Instruction Reference
********* Function call
@@ -241,6 +201,10 @@ t0=t1|t2
t0=t1^t2
+* not_i32/i64 t0, t1
+
+t0=~t1
+
********* Shifts
* shl_i32/i64 t0, t1, t2
@@ -428,3 +392,34 @@ to apply more optimizations because more registers will be free for
the generated code.
The exception model is the same as the dyngen one.
+
+6) Recommended coding rules for best performance
+
+- Use globals to represent the parts of the QEMU CPU state which are
+ often modified, e.g. the integer registers and the condition
+ codes. TCG will be able to use host registers to store them.
+
+- Avoid globals stored in fixed registers. They must be used only to
+ store the pointer to the CPU state and possibly to store a pointer
+ to a register window. The other uses are to ensure backward
+ compatibility with dyngen during the porting a new target to TCG.
+
+- Use temporaries. Use local temporaries only when really needed,
+ e.g. when you need to use a value after a jump. Local temporaries
+ introduce a performance hit in the current TCG implementation: their
+ content is saved to memory at end of each basic block.
+
+- Free temporaries and local temporaries when they are no longer used
+ (tcg_temp_free). Since tcg_const_x() also creates a temporary, you
+ should free it after it is used. Freeing temporaries does not yield
+ a better generated code, but it reduces the memory usage of TCG and
+ the speed of the translation.
+
+- Don't hesitate to use helpers for complicated or seldom used target
+ intructions. There is little performance advantage in using TCG to
+ implement target instructions taking more than about twenty TCG
+ instructions.
+
+- Use the 'discard' instruction if you know that TCG won't be able to
+ prove that a given global is "dead" at a given program point. The
+ x86 target uses it to improve the condition codes optimisation.
diff --git a/tcg/TODO b/tcg/TODO
index 9189926106..5ca35e9f26 100644
--- a/tcg/TODO
+++ b/tcg/TODO
@@ -1,32 +1,15 @@
-- test macro system
+- Add new instructions such as: andnot, ror, rol, setcond, clz, ctz,
+ popcnt.
-- test conditional jumps
+- See if it is worth exporting mul2, mulu2, div2, divu2.
-- test mul, div, ext8s, ext16s, bswap
-
-- generate a global TB prologue and epilogue to save/restore registers
- to/from the CPU state and to reserve a stack frame to optimize
- helper calls. Modify cpu-exec.c so that it does not use global
- register variables (except maybe for 'env').
-
-- fully convert the x86 target. The minimal amount of work includes:
- - add cc_src, cc_dst and cc_op as globals
- - disable its eflags optimization (the liveness analysis should
- suffice)
- - move complicated operations to helpers (in particular FPU, SSE, MMX).
-
-- optimize the x86 target:
- - move some or all the registers as globals
- - use the TB prologue and epilogue to have QEMU target registers in
- pre assigned host registers.
+- Support of globals saved in fixed registers between TBs.
Ideas:
- Move the slow part of the qemu_ld/st ops after the end of the TB.
-- Experiment: change instruction storage to simplify macro handling
- and to handle dynamic allocation and see if the translation speed is
- OK.
-
-- change exception syntax to get closer to QOP system (exception
+- Change exception syntax to get closer to QOP system (exception
parameters given with a specific instruction).
+
+- Add float and vector support.