Linked Listを利用したOIT

 カテゴリー: OIT(Japanese) — Kaori @ 5 月 4th, 2010

GDC 2010にてLinked List を利用したOrder Independent Transparency の紹介がありました。

スライドはこちら。
GDC 2010: OIT and GI using DX11 linked lists (Nick Thibieroz & Holger Grün)
http://developer.amd.com/documentation/presentations/Pages/default.aspx

Order Independent Transparency (以後OIT)というのは、半透明の描画順ソートを行う技術です。通常のZテストとアルファブレンディングで描画順のソートを完全に正しく行うのには限界があり、実質不可能です。たとえばポリゴンが他のポリゴンに突き抜けている場合などはかなり無理です。しかし、OITを利用すればそのような状態でも正しく半透明ポリゴンを重ねて描画することができます。

DirectX SDKサンプルにもOIT11というサンプルがあり、ポリゴンが突き抜けているような状態でも正しくブレンドすることができています。しかし残念ながら非常に重くて使い物になりません。一方、同じOITを行うATI のRadeon™ HD 5000 シリーズの「Mecha」デモはOIT11に比べてはるかに複雑なシーンなのにわりとさくさく動いています。GDCで紹介されたOITの技術はMechaデモで使用しているものと同じ、とういことで、OIT11サンプルを改造してLinked Listを実装し、速度を比較してみることにしました。


実装結果

オリジナルのOIT11サンプルはRADEON HD5800上で11.3fpsで動いています。


Linked List OITを実装後、fpsは1800まであがりました。
実に約160倍です。DirectXのサンプルはあまり参考にしないほうがいいですね…


実装方法はおいおい投稿していきます。

[つづく]


 OIT using Linked Lists - 4

 カテゴリー: OIT — Kaori @ 5 月 4th, 2010

[Previous post]


3. Implement Sorting and rendering pass

In the second pass, all fragments in the linked list at each pixel are sorted by depth value, then the color is blended in order of depth.

3-a. Implement shaders.

Create new hlsl files for the sorting and rendering pass. In the pixel shader, copy all fragments in the list to a temporary array first, and then sort them. You need a vertex shader as well to render a screen quad.


StructuredBuffer<SFragmentLink> FragmentLinkSRV : register(t0);
Buffer StartOffsetSRV : register(t1);

struct QuadVS_Output
{
    float4 pos : SV_POSITION;
};


float4 SortFragmentsPS( QuadVS_Output _input ) : SV_Target0
{
    uint uIndex = (uint)_input.pos.y * g_nFrameWidth + (uint)_input.pos.x;

    // Read and store linked list data to the tempolary buffer.
    SFragment aData[32];
    int anIndex[32];
    uint uNumFragment = 0;    
    uint uNext = StartOffsetSRV[uIndex];
    
    while ( uNext != 0xFFFFFFFF ) {
        SFragmentLink element = FragmentLinkSRV[uNext];

        aData[uNumFragment] = element.fragment;
        anIndex[uNumFragment] = uNumFragment;
        ++uNumFragment;
        uNext = element.uNext;
    }


    uint N2 = 1 << (int)(ceil(log2(uNumFragment)));
    // fill initial data
    for(int i = uNumFragment; i < N2; i++)
    {
        anIndex[i] = i;
        aData[i].fDepth = 1.1f;
    }

    // Bitonic sort. copied from OIT_CS.hlsl
    for( int k = 2; k >1; j > 0 ; j = j>>1 )
        {
            for( int i = 0; i i )
                {
                    float dixj = aData[ anIndex[ ixj ] ].fDepth;
                    if ( ( i&k ) == 0 && di > dixj )
                    {
                        int temp = anIndex[ i ];
                        anIndex[ i ] = anIndex[ ixj ];
                        anIndex[ ixj ] = temp;
                    }
                    if ( ( i&k ) != 0 && di < dixj )
                    {
                        int temp = anIndex[ i ];
                        anIndex[ i ] = anIndex[ ixj ];
                        anIndex[ ixj ] = temp;
                    }
                }
            }
        }
    }

    // Output the final result to the frame buffer
    // Accumulate fragments into final result
    float4 result = 0.0f;
    for( int x = uNumFragment-1; x >= 0; x-- )
    {
        uint uColor = aData[ anIndex[ x ] ].uColor;
        float4 color;
        color.r = ( (uColor >> 0) & 0xFF ) / 255.0f;
        color.g = ( (uColor >> 8) & 0xFF ) / 255.0f;
        color.b = ( (uColor >> 16) & 0xFF ) / 255.0f;
        color.a = ( (uColor >> 24) & 0xFF ) / 255.0f;
        result = lerp( result, color, color.a );
    }
    result.a = 1.0f;

    return result;
}


The start offset buffer is referred as a uint buffer.
Since OIT11 is not taking opaque primitives into account, the initial value of blending is a clear color, black. To make it practical, read a back color texture as the initial value of blending. As you can see this code, there is a limit number of fragments per pixel (i.e. 32 fragments per pixel). It is necessary to insert safety code in the pixel shader to avoid array overrun.


3-b. Add Shader Resource Views

In the second pass, the two buffers are referred as Shader Resource Views. Add following code in OIT::OnD3D11ResizedSwapChain.

// Create Shader Resource Views
D3D11_SHADER_RESOURCE_VIEW_DESC descSRV;
descSRV.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
descSRV.Buffer.FirstElement = 0;

descSRV.Format = DXGI_FORMAT_UNKNOWN;
descSRV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8;
V_RETURN( pDevice->CreateShaderResourceView( m_pFragmentLinkBuffer, &descSRV, &m_pFragmentLinkSRV ) );

descSRV.Format = DXGI_FORMAT_R32_UINT;
descSRV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height;
V_RETURN( pDevice->CreateShaderResourceView( m_pStartOffsetBuffer, &descSRV, &m_pStartOffsetSRV ) );


In addition, create a vertex buffer and input layout for rendering screen quad in OIT::OnD3D11CreateDevice.


3-c. Implement Sorting and rendering function.

Set the vertex buffer, input layout and shader resource views and render the screen quad in OIT::SortAndRender.



ID3D11ShaderResourceView* ppSRVs[] = {
    m_pFragmentLinkSRV,
    m_pStartOffsetSRV,
};
pD3DContext->PSSetShaderResources( 0, sizeof(ppSRVs)/sizeof(ppSRVs[0]), ppSRVs );

pD3DContext->VSSetShader( m_pSortAndRenderVS, NULL, 0 );
pD3DContext->PSSetShader( m_pSortAndRenderPS, NULL, 0 );


// Draw a screen quad by a large triangle.
pD3DContext->IASetInputLayout( m_pIL );
UINT uStrides = sizeof( SQuadVertex );
UINT uOffsets = 0;
pD3DContext->IASetVertexBuffers( 0, 1, &m_pVB, &uStrides, &uOffsets );
pD3DContext->IASetIndexBuffer( NULL, DXGI_FORMAT_R32_UINT, 0 );
pD3DContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP );

pD3DContext->Draw( 3, 0 );

// Unbind SRVs
ID3D11ShaderResourceView* ppSRVNULL[] = {
    NULL,
    NULL,
};
pD3DContext->PSSetShaderResources( 0, sizeof(ppSRVs)/sizeof(ppSRVs[0]), ppSRVNULL );



4. Source code

Here is the source code.
[download zip]
You can compile and run the Linked List version of the OIT11 sample by overwriting all of the source code to the OIT11 directory.


5. Conclusion

Linked List OIT is a fast technique and very easy to implement. However, there are several points to improve in my implementation. For example, it supports only one blending mode and no anti aliasing. I will try to make it more practical for video games in the future.


 OIT using Linked Lists - 3

 カテゴリー: OIT — Kaori @ 5 月 4th, 2010

[Previous post]


2. Implement Linked List Creation


The OIT11 uses one 2D Texture and three buffers. Linked List OIT requires two buffers, one is a structured buffer which is called "Fragment Link Buffer". The other is a uint buffer which is called "Start Offset Buffer".

- Fragment Link Buffer
contains all fragments. Each fragment has a color, depth value and the index of the next fragment in a linked list. If there is no next fragment, the index will be a magic value, 0xffffffff in this case.
The declaration of a framgent in the pixel shader is written as follows. Color is packed to a uint value.


struct SFragment {
    uint uColor;
    float fDepth;
};

struct SFragmentLink {
    SFragment fragment;
    uint uNext;
};


- Start Offset Buffer
contains the index of the first fragment of a linked list at each pixel. It is initialized by a magic value before rendering at every frame.


2-a. Implement shader

Modify OIT_PS.hlsl and implement fragment link buffer creation.
Declare the buffers as follows.


// Fragment And Link Buffer
RWStructuredBuffer FLBuffer<SFragmentLink> : register( u0 );
// Start Offset Buffer
RWByteAddressBuffer StartOffsetBuffer : register( u1 );

Then implement the entry point function. It can be almost same as the code in the slide.

[earlydepthstencil]
void StoreFragments( SceneVS_Output input )
{
    uint x = input.pos.x;
    uint y = input.pos.y;

    // Create fragment data.
    uint4 ucolor = saturate( input.color ) * 255;
    SFragmentLink element;
    element.fragment.uColor = (ucolor.x) | (ucolor.y << 8) | (ucolor.z << 16) | (ucolor.a << 24);
    element.fragment.fDepth = input.pos.z;

    // Increment and get current pixel count.
    uint uPixelCount = FLBuffer.IncrementCounter();

    // Read and update Start Offset Buffer.
    uint uIndex = y * g_nFrameWidth + x;
    uint uStartOffsetAddress = 4 * uIndex;
    uint uOldStartOffset;
    StartOffsetBuffer.InterlockedExchange(
        uStartOffsetAddress, uPixelCount, uOldStartOffset );

    // Store fragment link.
    element.uNext = uOldStartOffset;
    FLBuffer[uPixelCount] = element;
}

Note that the byte address buffer must be accessed by byte size.


2-b. Add Buffers and UAVs

Add two buffers and their UAVs. The following code is implemented in OIT::OnD3D11ResizedSwapChain function.

// Create Fragment and Link buffer.
descBuf.StructureByteStride = sizeof(float) + sizeof(BYTE) * 4 * 2;
descBuf.ByteWidth = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8 * descBuf.StructureByteStride;
descBuf.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
V_RETURN( pDevice->CreateBuffer( &descBuf, NULL, &m_pFragmentLinkBuffer ));

// Create Start Offset buffer
descBuf.StructureByteStride = 4 * sizeof(BYTE);
descBuf.ByteWidth = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * descBuf.StructureByteStride;
descBuf.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
V_RETURN( pDevice->CreateBuffer( &descBuf, NULL, &m_pStartOffsetBuffer ));

// Create Unordered Access Views
D3D11_UNORDERED_ACCESS_VIEW_DESC descUAV;
descUAV.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
descUAV.Buffer.FirstElement = 0;

descUAV.Format = DXGI_FORMAT_UNKNOWN;
descUAV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height * 8;
descUAV.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_COUNTER;
V_RETURN( pDevice->CreateUnorderedAccessView( m_pFragmentLinkBuffer, &descUAV, &m_pFragmentLinkUAV ) );

descUAV.Format = DXGI_FORMAT_R32_TYPELESS;
descUAV.Buffer.NumElements = pBackBufferSurfaceDesc->Width * pBackBufferSurfaceDesc->Height;
descUAV.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
V_RETURN( pDevice->CreateUnorderedAccessView( m_pStartOffsetBuffer, &descUAV, &m_pStartOffsetUAV ) );

Note that descBuf.BindFlags contains D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE as the original code.

The fragment link buffer must contain all fragments. I specified the same size as the deep frame buffer in OIT11 sample (i.e. 8x screen size). Structured buffers must be created with D3D11_RESOURCE_MISC_BUFFER_STRUCTURED.

The start offset buffer is a screen-sized buffer and the UAV is used as a Byte address buffer. Specify D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS when create.

Specify D3D11_BUFFER_UAV_FLAG_COUNTER so that the UAV of the fragment link buffer has counter support. UAVs for structured buffers must be created with DXGI_FORMAT_UNKNOWN format.


2-c. Implement function

Implement OIT::CreateFragmentAndLink function. It is easy to do it by modifying OIT::FillDeepBuffer because you can use the same constant buffer as the original code.
Clear the start offset buffer by a magic value before rendering.

// Clear the start offset buffer by magic value.
static const UINT clearValueUINT[1] = { 0xffffffff };
pD3DContext->ClearUnorderedAccessViewUint( m_pStartOffsetUAV, clearValueUINT );

// Bind UAVs.
// No render target is required.
ID3D11UnorderedAccessView* pUAVs[] = {
m_pFragmentLinkUAV,
m_pStartOffsetUAV,
};
// Initialize the counter value.
UINT anInitIndices[] = { 0, 0 };
pD3DContext->OMSetRenderTargetsAndUnorderedAccessViews( 0, NULL, pDSV, 0, sizeof(pUAVs)/sizeof(pUAVs[0]), pUAVs, anInitIndices );

// Set Pixel Shader and shader constants.
pD3DContext->PSSetShader( m_pCreateFragmentLinkPS, NULL, 0 );

HRESULT hr;
D3D11_MAPPED_SUBRESOURCE MappedResource;
V( pD3DContext->Map( m_pPS_CB, 0, D3D11_MAP_WRITE_DISCARD, 0, &MappedResource ) );
PS_CB* pPS_CB = ( PS_CB* )MappedResource.pData;
pPS_CB->nFrameWidth = m_nFrameWidth;
pPS_CB->nFrameHeight = m_nFrameHeight;
pD3DContext->Unmap( m_pPS_CB, 0 );
pD3DContext->PSSetConstantBuffers( 0, 1, &m_pPS_CB );

pScene->D3D11Render( mWorldViewProjection, pD3DContext );

// Unbind UAVs.
ID3D11UnorderedAccessView* pUAVsNULL[] = { NULL, NULL, NULL, NULL };
pD3DContext->OMSetRenderTargetsAndUnorderedAccessViews( 0, NULL, pDSV, 0, sizeof(pUAVs)/sizeof(pUAVs[0]), pUAVsNULL, NULL );

Don't forget to initialize the counter value when setting the UAVs.
If you specify [earlydepthstencil] to the shader, you have to disable deth write before rendering. Otherwise some fragments will be rejected by depth test.

[ To be continued... ]

ホットワード 利用 カテゴリー 紹介 スライド
割引クーポンまとめ情報 - クー割