DiskLruCache源码分析

Posted on 2016-05-04

一个在文件系统上使用确定数量空间的缓存，每个cache entry包含一个字符串key和一个固定数字值。每个key必须匹配正则表达式[a-z0-9_-]{1,64}. 值表示为字节序列，可以通过流或者文件访问。每个值的长度在0和Integer.MAX_VALUE之间。

缓存将数据存储在文件系统的一个文件夹中。缓存对这个文件夹必须是独占的，缓存可以删除或者覆盖文件夹中的文件。多个进程在同一时间使用同一个缓存会导致错误。

缓存限制了存储到文件系统的字节数量。当存储的字节数超出了上限，缓存会在后台删除一些entries直到符合上限。这个上限并不严格：在等待文件被删除的过程中，缓存可以暂时超限。这个上限并不包含文件系统开销或者缓存日志，所以对存储空间敏感的应用应该设置一个保守的上限。

这个类会对一些I/O错误容错。如果文件丢失，对应的entries会从缓存中丢掉。当在缓存中写入值的时候发生了错误，edit操作会没有任何提示失败。调用者需要处理其他问题，捕获IOException并正确处理。

既然是LRU Cache，那么怎么少得了LinkedHashMap这个数据结构呢？

1 2	private final LinkedHashMap<String, Entry> lruEntries = new LinkedHashMap<String, Entry>(0, 0.75f, true);

LinkedHashMap设置accessOrder为true, 遍历的下一个元素就是Least Recently Used Element. 注意到Map里存放的Entry, 那么什么是Entry呢?

Entry

private final class Entry {
	private final String key;
		
	/** Lengths of this entry's files. */
	private final long[] lengths;
		
	/** True if this entry has ever been published. */
	private boolean readable;
		
	/** The ongoing edit or null if this entry is not being edited. */
	private Editor currentEditor;
		
	/** The sequence number of the most recently committed edit to this entry. */
	private long sequenceNumber;
		
	private Entry(String key) {
		this.key = key;
		this.lengths = new long[valueCount];
	}
	...
		
	public File getCleanFile(int i) {
		return new File(directory, key + "." + i);
	}
	
	public File getDirtyFile(int i) {
		return new File(directory, key + "." + i + ".tmp");
	}
}

Entry对象一个key可以对应多个本地磁盘文件，采用key + “.” + i的形式保存要存储的文件，乍一想我好像并没有这样奇葩的需求啊，同一个文件我为什么要分开存储？找不到使用场景还真难以理解作者的设计意图。

OkHttp对磁盘缓存的用法彻底揭开了我的疑惑，对每一个response分两个文件存储，以.0和.1的结尾的文件名区分，分别存储响应的头部和实体部分，方便读取和处理(点击查看OkHttp缓存处理). getCleanFile和getDirtyFile表示以索引数字结尾的文件，也就是最后缓存写入的成员变量文件。

所以lengths成员变量表示一个key对应的多个文件的长度，readable成员变量表示文件是否可读，在完成一次成功的编辑（编辑代表磁盘存储操作）这个变量会被置为true, currentEditor成员变量表示当前entry是否正在被编辑，sequenceNumber成员变量表示最近一次编辑的序号。

DiskLruCache如何存储文件到磁盘

可以看下Universal Image Loader是如何往磁盘中存储图片的：

public boolean save(String imageUri, InputStream imageStream, IoUtils.CopyListener listener) throws IOException {
	DiskLruCache.Editor editor = cache.edit(getKey(imageUri));
	if (editor == null) {
		return false;
	}

	OutputStream os = new BufferedOutputStream(editor.newOutputStream(0), bufferSize);
	boolean copied = false;
	try {
		copied = IoUtils.copyStream(imageStream, os, listener, bufferSize);
	} finally {
		IoUtils.closeSilently(os);
		if (copied) {
			editor.commit();
		} else {
			editor.abort();
		}
	}
	return copied;
}

首先根据图片的uri生成一个合法的key, 调用edit方法获取一个DiskLruCache.Editor对象，通过editor获取一个输出流，读取输入流中的字节数据写入到输出流，最后调用commit方法完成日志相关的操作。可以看出，Editor负责的是准备和善后工作，工具类IoUtils真正将数据写入到了缓存。

Editor

public final class Editor {
	private final Entry entry;
	private final boolean[] written;
	private boolean hasErrors;
	private boolean committed;
	...
	/**
	 * Returns a new unbuffered output stream to write the value at
	 * {@code index}. If the underlying output stream encounters errors
	 * when writing to the filesystem, this edit will be aborted when
	 * {@link #commit} is called. The returned output stream does not throw
	 * IOExceptions.
	 */
	public OutputStream newOutputStream(int index) throws IOException {
		synchronized (DiskLruCache.this) {
			if (entry.currentEditor != this) {
				throw new IllegalStateException();
			}
			if (!entry.readable) {
				written[index] = true;
			}
			File dirtyFile = entry.getDirtyFile(index);
			FileOutputStream outputStream;
			try {
				outputStream = new FileOutputStream(dirtyFile);
			} catch (FileNotFoundException e) {
				// Attempt to recreate the cache directory.
				directory.mkdirs();
				try {
					outputStream = new FileOutputStream(dirtyFile);
				} catch (FileNotFoundException e2) {
					// We are unable to recover. Silently eat the writes.
					return NULL_OUTPUT_STREAM;
				}
			}
			return new FaultHidingOutputStream(outputStream);
		}
	}
}

Editor负责编辑Entry对象的值，调用newOutputStream创建输出流时，将written成员变量置为true, 表示entry可写，在对输出流写入数据时发生错误时会将标记变量hasErrors置为true, 完成一次编辑操作将committed置为true.

创建输出流检查当前Editor是否正确，否则抛出异常，创建一个dirty文件，指定输出流写入的目标文件就是dirty文件，最后创建出一个隐藏错误的输出流，或者当创建输出流失败时返回一个do nothing的NULL_OUTPUT_STREAM.

DiskLruCache如何读取缓存

同样，先看Universal Image Loader是如何读缓存的：

@Override
public File get(String imageUri) {
	DiskLruCache.Snapshot snapshot = null;
	try {
		snapshot = cache.get(getKey(imageUri));
		return snapshot == null ? null : snapshot.getFile(0);
	} catch (IOException e) {
		L.e(e);
		return null;
	} finally {
		if (snapshot != null) {
			snapshot.close();
		}
	}
}

调用DiskLruCache的get方法获取对应key的一个Snapshot（快照）对象，然后根据对应的index获取File对象。

Snapshot

/** A snapshot of the values for an entry. */
public final class Snapshot implements Closeable {
	private final String key;
	private final long sequenceNumber;
	private File[] files;
	private final InputStream[] ins;
	private final long[] lengths;
	...
}

Snapshot就是一个entry的值的快照，包含成员变量key, sequenceNumber, 缓存文件files及文件长度，缓存文件的输入流。通过get方法可以获取到对应key的一个Snapshot.

/**
 * Returns a snapshot of the entry named {@code key}, or null if it doesn't
 * exist is not currently readable. If a value is returned, it is moved to
 * the head of the LRU queue.
 */
public synchronized Snapshot get(String key) throws IOException {
	checkNotClosed();
	validateKey(key);
	Entry entry = lruEntries.get(key);
	if (entry == null) {
		return null;
	}

	if (!entry.readable) {
		return null;
	}

	// Open all streams eagerly to guarantee that we see a single published
	// snapshot. If we opened streams lazily then the streams could come
	// from different edits.
	File[] files = new File[valueCount];
	InputStream[] ins = new InputStream[valueCount];
	try {
		File file;
		for (int i = 0; i < valueCount; i++) {
			file = entry.getCleanFile(i);
			files[i] = file;
			ins[i] = new FileInputStream(file);
		}
	} catch (FileNotFoundException e) {
		// A file must have been deleted manually!
		for (int i = 0; i < valueCount; i++) {
			if (ins[i] != null) {
				Util.closeQuietly(ins[i]);
			} else {
				break;
			}
		}
		return null;
	}
	...
	return new Snapshot(key, entry.sequenceNumber, files, ins, entry.lengths);
}

首先做一般性检查（日志writer是否关闭，key是否合法），获取entry的clean文件，打开输入流，发布一个snapshot对象，调用getFile或者getInputStream可以获取到文件或者输入流对象。

DiskLruCache如何移除缓存

/**
 * Drops the entry for {@code key} if it exists and can be removed. Entries
 * actively being edited cannot be removed.
 *
 * @return true if an entry was removed.
 */
public synchronized boolean remove(String key) throws IOException {
	checkNotClosed();
	validateKey(key);
	Entry entry = lruEntries.get(key);
	if (entry == null || entry.currentEditor != null) {
		return false;
	}

	for (int i = 0; i < valueCount; i++) {
		File file = entry.getCleanFile(i);
		if (file.exists() && !file.delete()) {
			throw new IOException("failed to delete " + file);
		}
		size -= entry.lengths[i];
		fileCount--;
		entry.lengths[i] = 0;
	}

	...
	lruEntries.remove(key);
	...
	return true;
}

移除一条entry, 正在被编辑的entry不可移除，删除clean文件，更新缓存大小size和文件数目fileCount.

围绕着Cache对象，其实就三个操作edit（将文件存到磁盘），get（从磁盘读取缓存文件），remove（从磁盘移除缓存）。但是还不够，如何避免缓存错误？如何在程序重新启动时恢复缓存数据（包括LRU顺序）？

日志

DiskLruCache引入了日志journal解决了上述问题。

使用一个名字为”journal”的日志文件。日志文件看起来像这样
libcore.io.DiskLruCache
1
100
2

CLEAN 3400330d1dfc7f3f7f4b8d4d803dfcf6 832 21054
DIRTY 335c4c6028171cfddfbaae1a9c313c52
CLEAN 335c4c6028171cfddfbaae1a9c313c52 3934 2342
REMOVE 335c4c6028171cfddfbaae1a9c313c52
DIRTY 1ab96a171faeeee38496d8b330771a7a
CLEAN 1ab96a171faeeee38496d8b330771a7a 1600 234
READ 335c4c6028171cfddfbaae1a9c313c52
READ 3400330d1dfc7f3f7f4b8d4d803dfcf6

前五行组成了日志文件头，分别是字符串常量”libcore.io.DiskLruCache”, cache版本，应用程序版本，value数目和一个空行。

接下来的行是记录了缓存entry的状态。每行包含用空格分隔的值：状态，key, 可选的指定状态值。
o DIRTY 记录一条entry正被创建或更新。每一个成功的DIRTY操作应该跟随着CLEAN或REMOVE操作
否则需要删除临时文件
o CLEAN 记录一条entry被成功发布或者成功读取。CLEAN行跟随着每个值的长度大小。
o READ 记录LRU访问顺序
o REMOVE 记录被删除的entry

当发生cache操作的时候就会添加日志操作记录，时不时通过删除冗余行整理压缩日志。在压缩过程中会使用临时文件”journal.tmp”; 在打开cache的时候就会删除这个临时文件。

前面的分析过程都略过了日志文件的操作，下面来分析通过操作日志是如何解决上述问题的。

edit操作，最后会向日志写入一条DIRTY操作记录

journalWriter.append(DIRTY + ‘ ‘ + key + ‘\n’);

get操作，最后会向日志写入一条READ操作记录

journalWriter.append(READ + ‘ ‘ + key + ‘\n’);

remove操作，最后会向日志写入一条REMOVE操作记录

journalWriter.append(REMOVE + ‘ ‘ + key + ‘\n’);

同时在edit操作结束后，接下来的commit操作会添加CLEAN或REMOVE操作记录

journalWriter.append(CLEAN + ‘ ‘ + key + ‘\n’);

如果磁盘缓存文件大小超过上限或者文件数目超过上限，或者日志冗余行超过上限就会把清理文件和重建日志的任务cleanupCallable提交到线程池中执行。

if (size > maxSize || fileCount > maxFileCount || journalRebuildRequired()) {
	executorService.submit(cleanupCallable);
}

...

private final Callable<Void> cleanupCallable = new Callable<Void>() {
	public Void call() throws Exception {
		synchronized (DiskLruCache.this) {
			if (journalWriter == null) {
				return null; // Closed.
			}
			trimToSize();
			trimToFileCount();
			if (journalRebuildRequired()) {
				rebuildJournal();
				redundantOpCount = 0;
			}
		}
		return null;
	}
};

以上这些对日志的操作避免了缓存错误。
调用open创建DiskLruCache，会逐行读取日志信息，将状态为CLEAN/DIRTY/READ的记录恢复为entry，添加到LinkedHashMap集合中。调用processJournal删除混乱的entry，DIRTY记录行后面不是跟着CLEAN或者REMOVE记录行的属于混乱的记录。这样就在内存中恢复了上次访问状态的视图。

参考资料

DiskLruCahe的实现