It looks like you're creating a new thread every time you render the scene and wait for it to terminate itself? Don't do that. Creating threads and terminating them costs a significant amount of time (it involves the O.S. having to set up a stack, initialize registers, perform some thread scheduling setup, etc).
You'll get far better results by creating the thread once and just letting it process tasks in an infinite loop. When there is no work, suspend the thread until new tasks arrive. The basic code could look something like this:
HANDLE secondary_thread;
HANDLE scene_ready; // Event for notifying that the scene is ready to be rendered
volatile bool notify = false; // Flag for notifying that the secondary thread is done rendering
unsigned long __stdcall thread_routine(void *parameters)
{
while(true)
{
WaitForSingleObject(scene_ready, INFINITE);
render_scene(SECOND); // Render the second half of the scene
notify = true;
}
return 0;
}
void main_loop()
{
scene_ready = CreateEvent(0, FALSE, FALSE, 0);
secondary_thread = CreateThread(0, 0, thread_routine, 0, 0, 0);
while(not_exiting)
{
prepare_scene();
SetEvent(scene_ready); // Let the secondary thread render the second half of the scene
render_scene(FIRST); // Render the first half of the scene in the main thread
while(!notify) {}; // Wait for the secondardy thread to finish
notify = false;
}
TerminateThread(secondary_thread, 0); // Forcibly stop the secondary thread
CloseHandle(scene_ready);
CloseHandle(secondary_thread);
}
Note however that this doesn't scale well beyond two cores. You always have to wait for the slowest thread, and as Reedbeta noted you do the vertex processing again on each thread.
Architecting your code to scale well to a large number of cores is a complicated topic and still under much research. I highly recommend getting
a good book on muti-core programming before you enter that tricky territory...