Getting a stream to a resource without copying the contents into memory.

If you have ever added a resource file to your project through the Project Properties > Resources tab, then you have used a ResourceManager. The resource manager allows you to access resources in your application simply through a static resource manager. This line of code should look familiar.

string message = Properties.Resources.MyDialogText;

This is not just convenient, it's just plain easy. The problem is that when something is just plain easy, there are often tradoffs, and here there is a big one. Let's assume we have a binary resource (a file) that is 7764480 bytes (7.40 megabytes) called 'Bootstrapper.exe'. Now normally you would get the resource just by doing byte[] bytes = Properties.Resources.Bootstrapper. What happens here is you are invoking an instance of ResourceManager that returns the bytes to our Bootstrapper.exe file. The tradoff of such an easy way of getting the resource, is that it just created a copy of the data into memory, adding 7.40 megabytes to our application's Working Set memory. Let's look at a bare-bone sample application.

namespace ResourceManagerSample
{
    using System;

    internal static class Program
    {
        public static void Main(string[] args) {
            Byte[] bytes = Properties.Resources.Bootstrapper;
        }
    }
}

Stepping through this code and stopping on line 11, just before the statement is executed, the application's working set is 47,964 bytes. Execute the statement by stepping to the end of the method body, and the working set just increased from 47,964 bytes to 63,364 bytes. What happened is the ResourceManager just returned us a copy of the binary resource contents. If you were working with a binary resource that was say, a gigabyte in size, your application's working set memory would have just increased by a gigabyte.

Let's take a look at why this happens by looking at the internals of the ResourceManager class. The two main methods we are interested in are Object GetObject(String, CultureInfo, Boolean) and UnmanagedMemoryStream GetStream(String, CultureInfo).

private Object GetObject(String name, CultureInfo culture, bool wrapUnmanagedMemStream) {
    ...
    Object value = rs.GetObject(name, _ignoreCase);
    ... 
}

I stripped the method down to the important line, which is rs.GetObject(name, _ignoreCase);. What happens here is the ResourceManager makes a call to ResourceSet.GetObject, which is what actually gets the resource as a System.Object. The internals of the ResourceSet are not that interesting either, because all it does is get the object from a HashTable.

public virtual Object GetObject(String name, bool ignoreCase)
{
    Hashtable copyOfTable = Table;  // Avoid a race with Dispose
    if (copyOfTable == null) 
        throw new ObjectDisposedException(null, Environment.GetResourceString("ObjectDisposed_ResourceSet"));
    if (name==null) 
        throw new ArgumentNullException("name"); 

    Object obj = copyOfTable[name]; 
    ...
    return caseTable[name];
}

As you can see, the public call in ResourceManager simply calls into ResourceSet which returns the actual resource as an object. Because its returning the actual object, memory is allocated for the resource, thus increasing our working set. I won't show the internals of ResourceManager.GetStream because all that does is actually make a call to GetObject as well, so it boils down to the same thing. The difference is that GetObject can return just an object (a System.String for example), or it can return an UnmanagedMemoryStream. GetStream obviously only returns a stream.

Intrinsically the problem is that ResourceManager creates a heavy memory footprint, and it isn't flexible enough to grab only a pointer to the resource instead of an actual instance copy. We can get around this problem by using embedded resources.

Embedded resources are embedded into the assembly manifest at compile-time and accessible directly through the assembly. Working with ResX resource files, resources are still embedded but are accessible through a ResourceManager which I already discussed. ResX resources are not available directly through the assembly such as calling Assembly.GetManifestResourceNames. In either case, resources still get embedded to the assembly at compile-time, but each is accessible in a different way.

With avoiding using a ResX resource file and ResourceManager, you can set the Build Action of the resource by right-clicking the file, and setting Build Action in the properties Window to Embedded Resource.

It's important that you do not use a ResX resource file and set the resource to embedded. This will result in you having the same resource embedded twice, increasing the size of your assembly 2x.

With embedded resources, we can access them by using the Assembly class, and calling Assembly.GetManifestResourceNames() and Assembly.GetManifestResourceStream(String).

private static UnmanagedMemoryStream GetResourceStream(String name) {
    Assembly assembly = Assembly.GetExecutingAssembly();

    string[] resources = assembly.GetManifestResourceNames();
    string resource = resources.SingleOrDefault(r => r.EndsWith(name, StringComparison.CurrentCultureIgnoreCase));

    if (resource == null) {
        throw new System.ArgumentException("The specified resource does not exist in the assembly manifest.", "name");
    }

    return (UnmanagedMemoryStream)assembly.GetManifestResourceStream(resource);
}

It's important to note how embedded resource names are stored in the manifest. When you set the build action on a file in your project to Embedded Resource, a resource name is created in the manifest that uniquely identifies the resource. The name will take on the project and foldier hierarchy and is in the format of a namespace. If the Bootstrapper.exe resource is in the root directory of the project named RresourceManagerSample, the resource identifier will be ResourceManagerSample.Bootstrapper.exe. And if you had it in a subfolder of the project called Resources it would be ResourceManagerSample.Resources.Bootstrapper.exe.

In the example method I wrote it takes a System.String name parameter, and checks the end of each resource string for a match, and throws a System.ArgumentException if the resource does not exist. The rest of the method is simple, because it just returns an UnmanagedMemoryStream by calling assembly.GetManifestResourceStream().

The whole point of this is that by using embedded resources and reflection, we get the resource as an UnmanagedMemoryStream without having the entire resource copied into memory. Before when using a ResourceManager remember that it would copy the entire resource into memory. This method does not do that, and looking at the internals of Assembly.GetManifestResourceStream() tells us why.

internal unsafe virtual Stream GetManifestResourceStream(String name, ref StackCrawlMark stackMark, bool skipSecurityCheck) 
{ 
    ulong length = 0;
    byte* pbInMemoryResource = GetResource(name, out length, ref stackMark, skipSecurityCheck); 

    if (pbInMemoryResource != null) {
        if (length > Int64.MaxValue) 
            throw new NotImplementedException(Environment.GetResourceString("NotImplemented_ResourcesLongerThan2^63"));
        return new UnmanagedMemoryStream(pbInMemoryResource, (long)length, (long)length, FileAccess.Read, true); 
    }

    return null; 
}

You can immediately see that instead of obtaining an instance copy of the resource, you get a nice pointer to the resource instead. Looking further into the implementation you can see that the call to GetResource actually makes a call to an external method called _GetResource.

[MethodImplAttribute(MethodImplOptions.InternalCall)]
private unsafe extern byte* _GetResource(String resourceName, out ulong length,
                                        ref StackCrawlMark stackMark, 
                                        bool skipSecurityCheck);

If you don't know what extern does, it is easily explained in 10.6.7 of the C# specification.

When a method declaration includes an extern modifier, that method is said to be an external method. External methods are implemented externally, typically using a language other than C#.

Unfortunately this means I cannot show you the source to _GetResource, but its likely to be in native C++. The most important aspect of GetManifestResourceStream() is that it returns a pointer to the resource, rather than a instance copy like ResourceManager does. Because of this, the contents of our Bootstrapper.exe resource need not be fully loaded and the magic happens in some external C++ code.

Thanks to such magic under the hood, we can now call our method just as easily as using a ResourceManager.

namespace ResourceManagerSample
{
    using System;
    using System.IO;
    using System.Linq;
    using System.Reflection;

    internal static class Program
    {
        private static UnmanagedMemoryStream GetResourceStream(String name) {
            Assembly assembly = Assembly.GetExecutingAssembly();

            string[] resources = assembly.GetManifestResourceNames();
            string resource = resources.SingleOrDefault(r => r.EndsWith(name, StringComparison.CurrentCultureIgnoreCase));

            if (resource == null) {
                throw new System.ArgumentException("The specified resource does not exist in the assembly manifest.", "name");
            }

            return (UnmanagedMemoryStream)assembly.GetManifestResourceStream(resource);
        }

        public static void Main(string[] args) {
            UnmanagedMemoryStream stream = GetResourceStream("Bootstrapper.exe");
        }
    }
}

Except now stepping through the code, you can step through to the end of the method body and see that the UnmanagedMemoryStream is instantiated and to the appropriate size, but our working set memory has not allocated anything extra. How we can allocate memory by reading data from our stream as needed.

public static void Main(string[] args) {
    UnmanagedMemoryStream stream = GetResourceStream("Bootstrapper.exe");

    byte[] buffer = new byte[stream.Length];

    stream.Read(buffer, 0, buffer.Length);
}

Stepping through the code and stopping at stream.Read(buffer, 0, buffer.Length); just before the statement executes will show that our working set has remained the same. Executing the statement reads the contents into our buffer allocating memory as its read. Only reading from the stream will increase our working set as we need the resource bytes. This is much more efficient than using a ResourceManager, and not hard to implement.

The question now is when to use a ResourceManager with ResX files, and when to use embedded resources. I've not benchmarked one or the other in terms of speed performance, but the whole point was about memory footprint. You should definitely use a ResourceManager if your resources are small, short lived, and you need the ease of being able to access them directly through the ResourceManager class, and by calling GetObject and GetStream. If your resources are very large and your application's working set memory might be an issue, you will want to use embedded resources and reflection.

Generally in a project you will use one or the other, and seldom both types of resources, but there isn't anything wrong with using both types. It all depends on your requirements, and neither or solves every problem. The trick is picking which one is best for your requirements, just like picking an appropriate programming language to solve a problem.

7 Comments

  1. me

    “As you can see, the public call in ResourceManager simply calls into ResourceSet which returns the actual resource as an object. Because its returning the actual object, memory is allocated for the resource”

    I don’t understand why it’s copying the object ? It’s just taking it from the hashtable ?

  2. Pawel

    Nice article, but the problem with memory taken by resources still exists.

    var stream = GetResourceStream(assembly, file);
    int iBeg = 0;
    int Delta = 1000000;
    byte[] buffer = new byte[Delta];

    while (iBeg < resss.Length)
    {
    long len = (iBeg + Delta < stream .Length) ? Delta : stream.Length – iBeg;
    stream.Read(buffer, 0, (int)len );
    iBeg += (int) len;
    }

    stream.Dispose();

    (…)

    AppDomain.Unload(_appDomainForDll);

    after all stream.Read() call memory working Set increase by "Delta" bytes.
    This memory isn't realeased after stream.Dispose() and AppDomain.Unload() – in this app domain respurce dll i loaded.

    Why .Net is alocationg and keeping this memory ???

    1. David Anderson (Post author)

      Correct, the memory will be allocated after reading it into the stream. If you’re memory is not being reclaimed through garbage collection after disposing the stream, then likely you have something holding onto a reference to that resource and the garbage collector cannot reclaim it. Double check your code and make sure you are not holding any references to it in any code execution path after disposing of your resources.

      1. Pawel

        Theoretically you are right 😉 I simplified my code to minimum (exactly is to what you can see above) and the problem still exists.

    2. Karl

      It’s probably deferring that memory to a Generation 2 garbage collection due to the byte array being place on the large object heap – try an array size like 32,000 instead of 1,000,000:

      int Delta = 320000; //1000000;
      byte[] buffer = new byte[Delta];

  3. Pingback: [C#] Extract a resource file to disk or System.IO.Stream | Zack Loveless

  4. Pingback: Brad Smith's Coding Blog » Blog Archive » Getting a stream to a resource without copying the contents into memory

Leave a Comment