OIT using Linked Lists - 4
[Previous post]
3. Implement Sorting and rendering pass
In the second pass, all fragments in the linked list at each pixel are sorted by depth value, then the color is blended in order of depth.
3-a. Implement shaders.
Create new hlsl files for the sorting and rendering pass. In the pixel shader, copy all fragments in the list to a temporary array first, and then sort them. You need a vertex shader as well to render a screen quad.
StructuredBuffer<SFragmentLink> FragmentLinkSRV : register(t0);
Buffer StartOffsetSRV : register(t1);
struct QuadVS_Output
{
float4 pos : SV_POSITION;
};
float4 SortFragmentsPS( QuadVS_Output _input ) : SV_Target0
{
uint uIndex = (uint)_input.pos.y * g_nFrameWidth + (uint)_input.pos.x;
// Read and store linked list data to the tempolary buffer.
SFragment aData[32];
int anIndex[32];
uint uNumFragment = 0;
uint uNext = StartOffsetSRV[uIndex];
while ( uNext != 0xFFFFFFFF ) {
SFragmentLink element = FragmentLinkSRV[uNext];
aData[uNumFragment] = element.fragment;
anIndex[uNumFragment] = uNumFragment;
++uNumFragment;
uNext = element.uNext;
}
uint N2 = 1 << (int)(ceil(log2(uNumFragment)));
// fill initial data
for(int i = uNumFragment; i < N2; i++)
{
anIndex[i] = i;
aData[i].fDepth = 1.1f;
}
// Bitonic sort. copied from OIT_CS.hlsl
for( int k = 2; k >1; j > 0 ; j = j>>1 )
{
for( int i = 0; i i )
{
float dixj = aData[ anIndex[ ixj ] ].fDepth;
if ( ( i&k ) == 0 && di > dixj )
{
int temp = anIndex[ i ];
anIndex[ i ] = anIndex[ ixj ];
anIndex[ ixj ] = temp;
}
if ( ( i&k ) != 0 && di < dixj )
{
int temp = anIndex[ i ];
anIndex[ i ] = anIndex[ ixj ];
anIndex[ ixj ] = temp;
}
}
}
}
}
// Output the final result to the frame buffer
// Accumulate fragments into final result
float4 result = 0.0f;
for( int x = uNumFragment-1; x >= 0; x-- )
{
uint uColor = aData[ anIndex[ x ] ].uColor;
float4 color;
color.r = ( (uColor >> 0) & 0xFF ) / 255.0f;
color.g = ( (uColor >> 8) & 0xFF ) / 255.0f;
color.b = ( (uColor >> 16) & 0xFF ) / 255.0f;
color.a = ( (uColor >> 24) & 0xFF ) / 255.0f;
result = lerp( result, color, color.a );
}
result.a = 1.0f;
return result;
}
The start offset buffer is referred as a uint buffer.
Since OIT11 is not taking opaque primitives into account, the initial value of blending is a clear color, black. To make it practical, read a back color texture as the initial value of blending. As you can see this code, there is a limit number of fragments per pixel (i.e. 32 fragments per pixel). It is necessary to insert safety code in the pixel shader to avoid array overrun.
3-b. Add Shader Resource Views
In the second pass, the two buffers are referred as Shader Resource Views. Add following code in OIT::OnD3D11ResizedSwapChain.
// Create Shader Resource Views
D3D11_SHADER_RESOURCE_VIEW_DESC descSRV;
descSRV.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
descSRV.Buffer.FirstElement = 0;
descSRV.Format = DXGI_FORMAT_UNKNOWN;
descSRV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8;
V_RETURN( pDevice->CreateShaderResourceView( m_pFragmentLinkBuffer, &descSRV, &m_pFragmentLinkSRV ) );
descSRV.Format = DXGI_FORMAT_R32_UINT;
descSRV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height;
V_RETURN( pDevice->CreateShaderResourceView( m_pStartOffsetBuffer, &descSRV, &m_pStartOffsetSRV ) );
In addition, create a vertex buffer and input layout for rendering screen quad in OIT::OnD3D11CreateDevice.
3-c. Implement Sorting and rendering function.
Set the vertex buffer, input layout and shader resource views and render the screen quad in OIT::SortAndRender.
ID3D11ShaderResourceView* ppSRVs[] = {
m_pFragmentLinkSRV,
m_pStartOffsetSRV,
};
pD3DContext->PSSetShaderResources( 0, sizeof(ppSRVs)/sizeof(ppSRVs[0]), ppSRVs );
pD3DContext->VSSetShader( m_pSortAndRenderVS, NULL, 0 );
pD3DContext->PSSetShader( m_pSortAndRenderPS, NULL, 0 );
// Draw a screen quad by a large triangle.
pD3DContext->IASetInputLayout( m_pIL );
UINT uStrides = sizeof( SQuadVertex );
UINT uOffsets = 0;
pD3DContext->IASetVertexBuffers( 0, 1, &m_pVB, &uStrides, &uOffsets );
pD3DContext->IASetIndexBuffer( NULL, DXGI_FORMAT_R32_UINT, 0 );
pD3DContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP );
pD3DContext->Draw( 3, 0 );
// Unbind SRVs
ID3D11ShaderResourceView* ppSRVNULL[] = {
NULL,
NULL,
};
pD3DContext->PSSetShaderResources( 0, sizeof(ppSRVs)/sizeof(ppSRVs[0]), ppSRVNULL );
4. Source code
Here is the source code.
[download zip]
You can compile and run the Linked List version of the OIT11 sample by overwriting all of the source code to the OIT11 directory.
5. Conclusion
Linked List OIT is a fast technique and very easy to implement. However, there are several points to improve in my implementation. For example, it supports only one blending mode and no anti aliasing. I will try to make it more practical for video games in the future.
