OIT using Linked Lists - 4

 カテゴリー: OIT — Kaori @ 5 月 4th, 2010

[Previous post]


3. Implement Sorting and rendering pass

In the second pass, all fragments in the linked list at each pixel are sorted by depth value, then the color is blended in order of depth.

3-a. Implement shaders.

Create new hlsl files for the sorting and rendering pass. In the pixel shader, copy all fragments in the list to a temporary array first, and then sort them. You need a vertex shader as well to render a screen quad.


StructuredBuffer<SFragmentLink> FragmentLinkSRV : register(t0);
Buffer StartOffsetSRV : register(t1);

struct QuadVS_Output
{
    float4 pos : SV_POSITION;
};


float4 SortFragmentsPS( QuadVS_Output _input ) : SV_Target0
{
    uint uIndex = (uint)_input.pos.y * g_nFrameWidth + (uint)_input.pos.x;

    // Read and store linked list data to the tempolary buffer.
    SFragment aData[32];
    int anIndex[32];
    uint uNumFragment = 0;    
    uint uNext = StartOffsetSRV[uIndex];
    
    while ( uNext != 0xFFFFFFFF ) {
        SFragmentLink element = FragmentLinkSRV[uNext];

        aData[uNumFragment] = element.fragment;
        anIndex[uNumFragment] = uNumFragment;
        ++uNumFragment;
        uNext = element.uNext;
    }


    uint N2 = 1 << (int)(ceil(log2(uNumFragment)));
    // fill initial data
    for(int i = uNumFragment; i < N2; i++)
    {
        anIndex[i] = i;
        aData[i].fDepth = 1.1f;
    }

    // Bitonic sort. copied from OIT_CS.hlsl
    for( int k = 2; k >1; j > 0 ; j = j>>1 )
        {
            for( int i = 0; i i )
                {
                    float dixj = aData[ anIndex[ ixj ] ].fDepth;
                    if ( ( i&k ) == 0 && di > dixj )
                    {
                        int temp = anIndex[ i ];
                        anIndex[ i ] = anIndex[ ixj ];
                        anIndex[ ixj ] = temp;
                    }
                    if ( ( i&k ) != 0 && di < dixj )
                    {
                        int temp = anIndex[ i ];
                        anIndex[ i ] = anIndex[ ixj ];
                        anIndex[ ixj ] = temp;
                    }
                }
            }
        }
    }

    // Output the final result to the frame buffer
    // Accumulate fragments into final result
    float4 result = 0.0f;
    for( int x = uNumFragment-1; x >= 0; x-- )
    {
        uint uColor = aData[ anIndex[ x ] ].uColor;
        float4 color;
        color.r = ( (uColor >> 0) & 0xFF ) / 255.0f;
        color.g = ( (uColor >> 8) & 0xFF ) / 255.0f;
        color.b = ( (uColor >> 16) & 0xFF ) / 255.0f;
        color.a = ( (uColor >> 24) & 0xFF ) / 255.0f;
        result = lerp( result, color, color.a );
    }
    result.a = 1.0f;

    return result;
}


The start offset buffer is referred as a uint buffer.
Since OIT11 is not taking opaque primitives into account, the initial value of blending is a clear color, black. To make it practical, read a back color texture as the initial value of blending. As you can see this code, there is a limit number of fragments per pixel (i.e. 32 fragments per pixel). It is necessary to insert safety code in the pixel shader to avoid array overrun.


3-b. Add Shader Resource Views

In the second pass, the two buffers are referred as Shader Resource Views. Add following code in OIT::OnD3D11ResizedSwapChain.

// Create Shader Resource Views
D3D11_SHADER_RESOURCE_VIEW_DESC descSRV;
descSRV.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
descSRV.Buffer.FirstElement = 0;

descSRV.Format = DXGI_FORMAT_UNKNOWN;
descSRV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8;
V_RETURN( pDevice->CreateShaderResourceView( m_pFragmentLinkBuffer, &descSRV, &m_pFragmentLinkSRV ) );

descSRV.Format = DXGI_FORMAT_R32_UINT;
descSRV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height;
V_RETURN( pDevice->CreateShaderResourceView( m_pStartOffsetBuffer, &descSRV, &m_pStartOffsetSRV ) );


In addition, create a vertex buffer and input layout for rendering screen quad in OIT::OnD3D11CreateDevice.


3-c. Implement Sorting and rendering function.

Set the vertex buffer, input layout and shader resource views and render the screen quad in OIT::SortAndRender.



ID3D11ShaderResourceView* ppSRVs[] = {
    m_pFragmentLinkSRV,
    m_pStartOffsetSRV,
};
pD3DContext->PSSetShaderResources( 0, sizeof(ppSRVs)/sizeof(ppSRVs[0]), ppSRVs );

pD3DContext->VSSetShader( m_pSortAndRenderVS, NULL, 0 );
pD3DContext->PSSetShader( m_pSortAndRenderPS, NULL, 0 );


// Draw a screen quad by a large triangle.
pD3DContext->IASetInputLayout( m_pIL );
UINT uStrides = sizeof( SQuadVertex );
UINT uOffsets = 0;
pD3DContext->IASetVertexBuffers( 0, 1, &m_pVB, &uStrides, &uOffsets );
pD3DContext->IASetIndexBuffer( NULL, DXGI_FORMAT_R32_UINT, 0 );
pD3DContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP );

pD3DContext->Draw( 3, 0 );

// Unbind SRVs
ID3D11ShaderResourceView* ppSRVNULL[] = {
    NULL,
    NULL,
};
pD3DContext->PSSetShaderResources( 0, sizeof(ppSRVs)/sizeof(ppSRVs[0]), ppSRVNULL );



4. Source code

Here is the source code.
[download zip]
You can compile and run the Linked List version of the OIT11 sample by overwriting all of the source code to the OIT11 directory.


5. Conclusion

Linked List OIT is a fast technique and very easy to implement. However, there are several points to improve in my implementation. For example, it supports only one blending mode and no anti aliasing. I will try to make it more practical for video games in the future.


 OIT using Linked Lists - 3

 カテゴリー: OIT — Kaori @ 5 月 4th, 2010

[Previous post]


2. Implement Linked List Creation


The OIT11 uses one 2D Texture and three buffers. Linked List OIT requires two buffers, one is a structured buffer which is called "Fragment Link Buffer". The other is a uint buffer which is called "Start Offset Buffer".

- Fragment Link Buffer
contains all fragments. Each fragment has a color, depth value and the index of the next fragment in a linked list. If there is no next fragment, the index will be a magic value, 0xffffffff in this case.
The declaration of a framgent in the pixel shader is written as follows. Color is packed to a uint value.


struct SFragment {
    uint uColor;
    float fDepth;
};

struct SFragmentLink {
    SFragment fragment;
    uint uNext;
};


- Start Offset Buffer
contains the index of the first fragment of a linked list at each pixel. It is initialized by a magic value before rendering at every frame.


2-a. Implement shader

Modify OIT_PS.hlsl and implement fragment link buffer creation.
Declare the buffers as follows.


// Fragment And Link Buffer
RWStructuredBuffer FLBuffer<SFragmentLink> : register( u0 );
// Start Offset Buffer
RWByteAddressBuffer StartOffsetBuffer : register( u1 );

Then implement the entry point function. It can be almost same as the code in the slide.

[earlydepthstencil]
void StoreFragments( SceneVS_Output input )
{
    uint x = input.pos.x;
    uint y = input.pos.y;

    // Create fragment data.
    uint4 ucolor = saturate( input.color ) * 255;
    SFragmentLink element;
    element.fragment.uColor = (ucolor.x) | (ucolor.y << 8) | (ucolor.z << 16) | (ucolor.a << 24);
    element.fragment.fDepth = input.pos.z;

    // Increment and get current pixel count.
    uint uPixelCount = FLBuffer.IncrementCounter();

    // Read and update Start Offset Buffer.
    uint uIndex = y * g_nFrameWidth + x;
    uint uStartOffsetAddress = 4 * uIndex;
    uint uOldStartOffset;
    StartOffsetBuffer.InterlockedExchange(
        uStartOffsetAddress, uPixelCount, uOldStartOffset );

    // Store fragment link.
    element.uNext = uOldStartOffset;
    FLBuffer[uPixelCount] = element;
}

Note that the byte address buffer must be accessed by byte size.


2-b. Add Buffers and UAVs

Add two buffers and their UAVs. The following code is implemented in OIT::OnD3D11ResizedSwapChain function.

// Create Fragment and Link buffer.
descBuf.StructureByteStride = sizeof(float) + sizeof(BYTE) * 4 * 2;
descBuf.ByteWidth = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8 * descBuf.StructureByteStride;
descBuf.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
V_RETURN( pDevice->CreateBuffer( &descBuf, NULL, &m_pFragmentLinkBuffer ));

// Create Start Offset buffer
descBuf.StructureByteStride = 4 * sizeof(BYTE);
descBuf.ByteWidth = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * descBuf.StructureByteStride;
descBuf.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
V_RETURN( pDevice->CreateBuffer( &descBuf, NULL, &m_pStartOffsetBuffer ));

// Create Unordered Access Views
D3D11_UNORDERED_ACCESS_VIEW_DESC descUAV;
descUAV.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
descUAV.Buffer.FirstElement = 0;

descUAV.Format = DXGI_FORMAT_UNKNOWN;
descUAV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8;
descUAV.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_COUNTER;
V_RETURN( pDevice->CreateUnorderedAccessView( m_pFragmentLinkBuffer, &descUAV, &m_pFragmentLinkUAV ) );

descUAV.Format = DXGI_FORMAT_R32_TYPELESS;
descUAV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height;
descUAV.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
V_RETURN( pDevice->CreateUnorderedAccessView( m_pStartOffsetBuffer, &descUAV, &m_pStartOffsetUAV ) );

Note that descBuf.BindFlags contains D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE as the original code.

The fragment link buffer must contain all fragments. I specified the same size as the deep frame buffer in OIT11 sample (i.e. 8x screen size). Structured buffers must be created with D3D11_RESOURCE_MISC_BUFFER_STRUCTURED.

The start offset buffer is a screen-sized buffer and the UAV is used as a Byte address buffer. Specify D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS when create.

Specify D3D11_BUFFER_UAV_FLAG_COUNTER so that the UAV of the fragment link buffer has counter support. UAVs for structured buffers must be created with DXGI_FORMAT_UNKNOWN format.


2-c. Implement function

Implement OIT::CreateFragmentAndLink function. It is easy to do it by modifying OIT::FillDeepBuffer because you can use the same constant buffer as the original code.
Clear the start offset buffer by a magic value before rendering.

// Clear the start offset buffer by magic value.
static const UINT clearValueUINT[1] = { 0xffffffff };
pD3DContext->ClearUnorderedAccessViewUint( m_pStartOffsetUAV, clearValueUINT );

// Bind UAVs.
// No render target is required.
ID3D11UnorderedAccessView* pUAVs[] = {
m_pFragmentLinkUAV,
m_pStartOffsetUAV,
};
// Initialize the counter value.
UINT anInitIndices[] = { 0, 0 };
pD3DContext->OMSetRenderTargetsAndUnorderedAccessViews( 0, NULL, pDSV, 0, sizeof(pUAVs)/sizeof(pUAVs[0]), pUAVs, anInitIndices );

// Set Pixel Shader and shader constants.
pD3DContext->PSSetShader( m_pCreateFragmentLinkPS, NULL, 0 );

HRESULT hr;
D3D11_MAPPED_SUBRESOURCE MappedResource;
V( pD3DContext->Map( m_pPS_CB, 0, D3D11_MAP_WRITE_DISCARD, 0, &MappedResource ) );
PS_CB* pPS_CB = ( PS_CB* )MappedResource.pData;
pPS_CB->nFrameWidth = m_nFrameWidth;
pPS_CB->nFrameHeight = m_nFrameHeight;
pD3DContext->Unmap( m_pPS_CB, 0 );
pD3DContext->PSSetConstantBuffers( 0, 1, &m_pPS_CB );

pScene->D3D11Render( mWorldViewProjection, pD3DContext );

// Unbind UAVs.
ID3D11UnorderedAccessView* pUAVsNULL[] = { NULL, NULL, NULL, NULL };
pD3DContext->OMSetRenderTargetsAndUnorderedAccessViews( 0, NULL, pDSV, 0, sizeof(pUAVs)/sizeof(pUAVs[0]), pUAVsNULL, NULL );

Don't forget to initialize the counter value when setting the UAVs.
If you specify [earlydepthstencil] to the shader, you have to disable deth write before rendering. Otherwise some fragments will be rejected by depth test.

[ To be continued... ]


 OIT using Linked Lists - 2

 カテゴリー: OIT — Kaori @ 4 月 29th, 2010

[Previous post]

I show my implementation here. Refer to the slide and OIT11 sample code for details of each technique. The slide is very useful to understand how linked lists works and it also contains code. You can implement easily after you read the slide.


1. Modify Rendering Flow

In usual rendering, pixels which pass depth test are drawn to a frame buffer directly. In OIT, transparent primitives are not rendered to the frame buffer directly. Instead, all fragments are stored to a large buffer. Each fragment has a color and depth value and belongs to a pixel in the screen. There are multiple fragments per pixel where multiple transparent polygons overlaps. After rendering all transparent primitives, all fragments at a pixel are sorted by its depth and then the blended color is drawn to the frame buffer.


The OIT11 sample stores the fragments to a large buffer in order of pixel address.
For instance, the fragments are stored as follows.
Fragment 1, 2, 3, .... of the pixel[0]
Fragment 1, 2, ... of the pixel[1]

To store fragments like this, before storing fragments, it is necessary to know how many fragments each pixel has and where the fragment should be stored in the large buffer.
The OIT11 sample has 4 phases. You can see the entire phases in OIT::Render.

// Create a count of the number of fragments at each pixel location
CreateFragmentCount( pD3DContext, pScene, mWorldViewProjection, pRTV, pDSV );

// Create a prefix sum of the fragment counts. Each pixel location will hold
// a count of the total number of fragments of every preceding pixel location.
CreatePrefixSum( pD3DContext );

// Fill in the deep frame buffer with depth and color values. Use the prefix
// sum to determine where in the deep buffer to place the current fragment.
FillDeepBuffer( pD3DContext, pRTV, pDSV, pScene, mWorldViewProjection );

// Sort and render the fragments. Use the prefix sum to determine where the
// fragments for each pixel reside.
SortAndRenderFragments( pD3DContext, pDevice, pRTV );

The first and third phases are implemented by pixel shaders and the others by compute shaders.

On the other hand, Linked List OIT stores fragments in order of being drawn. So it requires only 2 passes. Instead of count the number of fragments, it stores fragments as linked lists. All fragments which belong to a certain pixel are in a linked list.
OIT::Render can be rewritten as follows:

// Linked List creation.
CreateFragmentLink( pD3DContext, pRTV, pDSV, pScene, mWorldViewProjection );

// Sort and render the fragments.
SortAndRenderFragments( pD3DContext, pDevice, pRTV );

Both passes are implemented by a pixel shader. The primitives are drawn in the first pass and the second pass is performed by rendering a screen quad.
Linked List OIT does not use Compute shader.


[To be continued...]


 OIT using Linked Lists

 カテゴリー: OIT — Kaori @ 4 月 21st, 2010

There was a presentation about Order Independent Transparency using Linked Lists at GDC2010.

GDC 2010: OIT and GI using DX11 linked lists (Nick Thibieroz & Holger Grün)
You can download the slide here.
http://developer.amd.com/documentation/presentations/Pages/default.aspx


I noticed the DirectX OIT11 sample was too slow but Mecha Demo for ATI Radeon™ HD 5000 Series seemed to be fast.
So I tried to modify the DirectX sample and implement linked lists instead.


Result

The original OIT11 runs at 11.3 fps on RADEON HD 5800.


After modification, it increase to around 1800 fps.
Linked list OIT is 160x faster!




I will post my implementation soon.




[To be continued...]

ホットワード カテゴリー quad Format NULL
割引クーポンまとめ情報 - クー割