Hyper Threading is a technology from Intel which consists of two logical processor within the one Physical processor. Even though it is logical processor after booting, Operating system will see this logical as separate physical processor. Due to this reasons, operating system can run 2 threads parallel.
If you consider multiple physical processor performance, maximum performance gain using two physical processor is only 33% over a single physical processor. Four physical processor will delivers not more than 60% compare to one physical processor. Thus, it is critical to use the resource within a single Processor maximum.
Normally, application will make use of about 35% of internal processor execution resources. Hyper threading will enable better processor usage and achieve 50% of processor utilization. Result of this is, it will increase 30% of performance compare to Non-Hyper threading processor.
Each logical processor has separate:
- Architecture state like General purpose registers, Control registers, APIC register and some Machine state registers
- Executes its own code (Thread)
- It can be interrupted and halted separately
Each logical processor in a Physical processor shared:
- Execution engine
- Cache (This is implementation specific it can share or have a separate copy of cache for each logical processor)
- System interface bus
In the following figure, it has total 6 threads of multiple applications. 1 physical CPU without HT support will execute all these 6 thread one by one. If HT is enabled in a single Physical processor, then there are 2 logical processor and both will share 3 threads each other (approximately). Even though Operating system has option to schedule any thread on any logical processor.
Detecting Hyper Threading
CPUID instruction is used to detect and identifying how many logical processor exists within physical processor.
Before detecting HT, we need to check it’s a Intel system or not by using following code:
mov eax, 0
Result will be stored in registers in EAX, EBX, ECX and EDX. EAX value tells maximum input CPUID instruction can take. This is required to do next operations. EBX, ECX, EDX will have value of “GenuineIntel” string (This is vendor ID).
To issue further command to check HT enabled or not, we need to check the highest number returns in earlier code. To check HT is enabled or not:
mov eax, 1
Result will be returned in EAX, EBX, ECX and EDX. Here related to HT registers are EBX and EDX. If Bit 28 in enabled in EDX, then it represents that processor supports HT. Bit 16 to 23 shows how many logical processors are exists inside physical processor. Bit 24 to 31 APIC ID which is unique for each logical processor, using this APIC (Advanced Programmable Interrupt Controller) operating system can halt or interrupt separate without disturbing other logical processor which is executing another thread.
Operating system has option for not using logical processors or HT. In this situation operating system will be using only first logical processor and will not aware of HT is available. Windows 2000 is not aware of HT. Windows XP+ and the Linux kernel 2.4.0+ supports HT.
CPUID instruction along with 1 as parameter, need to issue only once for each physical processor.
To detect cache, you need to use following code:
mov ecx, 0
mov eax, 4
This code we need to call in iterative method till it returns 0 in cache field. ECX we need to increment since, there are more than one cache within a processor. Return values will be in register EAX, EBX and ECX. This will provide Cache level, Cache type and other information.
Since, logical processor in HT shares same execution resources, it is important to follow certain guidelines to dispatch threads to gain overall efficiency:
- In a multiple processor with HT enabled system, it is better to dispatch multiple threads of a same process (application) on same physical processor instead of distributing on different physical processor separate. Since, it shares same execution engine.
- Using processor affinity, assign a thread to a specific physical processor. There is a chance that processor cache might still contain thread’s code and data when it id re-dispatched to the same processor. Operating system can dispatch to any logical processor within same physical package, it can still take advantage coz it is sharing same cache.Creating threads with too much global data sharing from multiple threads will reduce application running speed, this will not make the system better. Large responsibility is on the application developers to understand about threads and build the application and sharing data between different threads not just hardware and operating system.