The basis of the CPU monoliths is the Von Neumann architecture, dating back the time of valve based monsters. Memory expensive, computational units expensive, a 4bit adder took 36 germanium transistors.... Now in these enlightened times both are cheap and an octal a penny so does the Von Neumann approach make best use of the abundance of resource?
What does the architecture say? In essence, get data from memory to a central processor, process it, put it back. No matter how many cpu in parallel, virtual threads, layered memory stacks L1, L2, L3..., the forecasting, the method is the same , get, process, put. The processor is king and data the servent.
Suppose we turn this on it's head. Take a Processor to the data? Now we cannot call it a Central Processor Unit but instead a MPU (Multiple Processor Unit). Now the data is static and we move a processor to deal with the data as required. Instead using a bus to sequence data to multiple cores of a CPU we direct multiple processors to different zones of data.
We can imagine an instruction being: connect processor alpha to data set a,b,c,d.. and do operation X on this data, leaving the results in a1, b1... Here we are continually moving the data position and using garbage collection to reuse redundant memory space.
The operation does not have to be just an addition, division etc but a single cycle on a complex data set and would use of multiple processing units simultaneously. Note: that here we are not talking about switching i7 cores (because 99% of an i7 CPU has got nothing to do with processing). We are talking about possibly millions of processors embedded in clusters of static L1 memory. If each processing unit can handle 8 pieces of data (say 64 bits) and we employ a cluster of 16 processor units, then there are 16 * 8 = 128 of 64 bit bus to quad word memory slots with a 16 fanout. Not that difficult surely and static for zero refresh time.
Now each of these clusters would also be grouped in clusters on the data output side so any computation would be seen as a cloud of data floating around a memory mesh. On completion of the calculation the result or dynamic state would be broadcast (for example by an optical multiplexor) to a consolidator.
Current programming is dedicated to defining a sequential data flow. An inverse Von Neumann machine implies that programming would become a definition of processeor connections It would like having a memory that is static and all the variables unique and global.